Recursively delete directories unless a specific file is present

There are several ways to do this, but my Google-fu may be weak because it took me much too long to figure this out.

I want to recursively delete directories with a specific name (or names) within directory structure, UNLESS the matched directory contains a sentinel file.

In my case I want to make a C# directory structure “cleaner-than-clean” by removing all ‘bin’ and ‘obj’ directories, leaving just the user-generated files behind.  This is pretty easy to achieve:

#!/bin/env bash

dir=/path/to/project

find $dir -type d \
    \( -name 'bin' -o -name 'obj' \)
    -print

This says “find things under $dir that are directories (-type d) and are named either ‘bin’ (-name 'bin') or (-o) named ‘obj’ (-name 'obj').  The parentheses force the two -name statements to be considered as a single condition, so the effect is to return true if either item matches. If the final result is true then print the path.

Notice that I’ve escaped (\) the parentheses because I’m using bash. Most UNIX shells do require these to be escaped, but yours may not. I’ve also terminated each line by escaping it. A single-line command may be spread over several lines this way, making it easier to read.

‘bin’ is also the conventional name for a directory of non-build executables, like helper scripts.  I do have some, including this cleaner-than-clean cleaning script that I’m working out, and don’t want to delete those by accident. The above command would find them, if they were in the directory tree.

find allows you to prune (-prune) the search tree, ignoring selective directories, according to certain criteria but it doesn’t support the concept of peeking into sub-directories. Bummer.

You may, however, execute independent commands (-exec) and use the results of those commands to affect find‘s parameters, including -prune. We can exec the test command, which can tell us if our sentinel file exists.

#!/bin/env bash

dir=/path/to/project
sentinel=.keep

find "$basedir" \
    -type d \
    \( -name bin -o -name obj \) \
    ! -exec test -e "{}/$sentinel" ';' \
    -print

The new line executes test to see if the current path ({}) contains a file called $sentinel (I’ve defined $sentinel to be .keep but any filename will do), which returns true if it exists. The line is negated (!) so if the sentinel is found further actions are skipped.

The final step is to actually delete the directory. We call rm -Rf (-R = recursive, -f = force) because we just want the whole thing gone, no questions asked. The trailing plus (+) tells find that rm can accept multiple paths in a single call, rather than calling rm once for each path.

#!/bin/env bash

dir=/path/to/project
sentinel=.keep

find "$basedir" \
    -type d \
    \( -name bin -o -name obj \) \
    ! -exec test -e "{}/$sentinel" ';' \
    -print \
    -exec rm -Rf '{}' \+

Transferring Large Files

Linux has an impressive tool set, if you know how to use it.  The  philosophy of using simple tools that do one job (but do it well) with the ability to chain commands together using pipes creates a powerful system.

Everyone has to transfer large files across the network on occasion.  scp is an easy choice most of the time, but if you’re working with small or old machines the CPU will be a bottleneck due to encryption.

There are several alternatives to scp, if you don’t need encryption.  These aren’t safe on the open internet but should be acceptable on private networks.  TFTP and rsync come to mind, but they have their limitations.

  • tftp is generally limited to 4 gig files
  • rsync either requires setting up an rsync service, or piping through ssh

My new personal favorite is netcat-as-a-server.  It’s a little more complicated to set up than scp or ftp but wins for overall simplicity and speed of transfer.

netcat doesn’t provide much output, so we’ll put it together with pv (pipeviewer) to tattle on bytes read and written.

First, on the sending machine (the machine with the file), we’ll set up netcat to listen on port 4200, and pv will give us progress updates:
pv -pet really.big.file | nc -q 1 -l -p 4200

  • pv -p prints a progress bar, -e displays ETA, -t enables the elapsed time
  • nc -q 1 quits 1 second after EOF, -l 4200 listens on port 4200

Without the -q switch, the sender will have to be killed with control-c or similar.

On the receiver (the machine that wants the file) netcat will read all bytes until the sender disconnects:
nc file.server.net 4200 | pv -b > really.big.file

  • nc will stream all bytes from file.server.net, port 4200
  • -b turns on the byte counter

Once the file is done transferring, both sides will shut down.