Perl’s Open3, Re-Explained

I recently went spelunking into a core Perl module that I previously knew nothing about, IPC::Open3.  After fifteen years of developing in Perl I finally had a reason to use it.

If you’re reading this, it’s probably because you went looking for information on how to use open3 because the module’s documentation is bad.  I mean it’s really, really terrible.

Not only will you not know how to use open3 after reading the docs, you may become convinced that maybe open3 isn’t the module that you need, or maybe it would work but you’d just be better off looking for something else because this is too damn hard to use.

Fear not, intrepid reader, because if I can figure it out so can you.  But I will try to save you some of the leg work I went through. There’s precious little information scattered online, because this isn’t a popular package.  My loss is your gain, hopefully this helps you.

Why IPC::Open3?

When Would I Use IPC::Open3?

open3 is used when you need to open three pipes to another process.  That might be obvious from the name as well as the package’s synopsis:

$pid = open3( \*CHLD_IN,
              \*CHLD_OUT,
              \*CHLD_ERR,
              'some cmd and args',
              'optarg', ...
            );

Why would you do that?  The most obvious situation is when you want to control STDIN, STDOUT, and STDERR simultaneously.  The example I provide below, which is not contrived by the way but adapted from real production code, does exactly that.

There Are Lots Of Modules To Make This Easier, Why Should I Use IPC::Open3?

IPC::Open3 is part of the Perl core.  There’s a lot to be said for using a library that’s already installed and doesn’t have external dependencies vs. pulling in someone’s write-once-read-never Summer of Code academic project.

In addition, the modules that I found only served to hide the complexity of Open3, but they did it badly and didn’t really remove much code compared to what I came up with.

What Else Do I Need?

One of the things that’s not obvious from the Open3 docs are that you’re not going to use IPC::Open3 by itself.  You need a couple of other packages (also part of core) in order to use it effectively.

How I Used IPC::Open3

In our example, we’re going to fork a separate process (using open3) to encrypt a file stream using gpg.  gpg will accept a stream of data, encrypt it, and output to a stream.  We also want to capture errors sent to STDERR.

In a terminal, using bash, this would be really easy: gpg --decrypt < some_file > some_file.pgp 2>err.out

We could do all of this in Perl by writing temporary files, passing special file handle references into gpg as arguments, and capturing STDERR the old fashioned way, all using a normal open().  But where’s the fun in that?

First, lets use the packages we’ll need:

use IO::Handle;
use IO::Select;
use IPC::Open3;

IO::Handle allows us to operate on handles using object methods.  I don’t typically use it, but this code really appreciates it.  IO::Select does the same for select, but it helps even more than IO::Handle here.

use constant INPUT_BUF_SZ  => 2**12;
use constant OUTPUT_BUF_SZ => 2**20;

You might want to experiment to find the best buffer sizes.  The input buffer should not be larger than the pipe buffer on your particular system, else you’ll block trying to put two pounds of bytes into a one pound buffer.

Now, using IO::Handle we’ll create file handles for the stdin, stdout, and stderr that our forked process will read and write to:

my ( $in,
     $out,
     $err,
   ) = ( IO::Handle->new,
         IO::Handle->new,
         IO::Handle->new
       );

Call open3, which (like fork) gives us the PID of our new process.

Note: If we don’t call waitpid later on we’ll create a zombie after we’re done.

my $pid = open3( $in, $out, $err, '/usr/bin/gpg', @gpg_options );

if ( !$pid ) {
    die "failed to open pipe to gpg";
}

One of the features of IO::Select is that it allows us to find out when a handle is blocked. This is important when the output stream is dependent on the input stream, and each stream depends on a pipe of limited size.

We’re going to repeatedly loop over the handles, looking for a stream that is active, and read/write a little bit before continuing to loop.  We do this until both our input and output is exhausted.  It’s pretty likely that they’ll be exhausted at different times, i.e. we’ll be done with the input sometime before we’re done with the output.

As we exhaust each handle we remove it from the selection of possible handles, so that the main loop terminates naturally.

The value passed to can_write and can_read is the number of seconds to wait for the handle to be ready.  Non-zero timeouts cause a noticeable delay, while not setting it at all will cause us to block until the handle is ready, so for now we’ll leave it at zero.

# $unencrypted_fh and $encrypted_fh should be defined as
# handles to real files

my $sel = IO::Select->new;

$sel->add( $in, $out, $err );

# loop until we don't have any handles left

while ( my @handles = ( $self->handles) ) {
    # read until there's nothing left
    #
    # write in small chunks so we don't overfill the buffer
    # and accidentally cause the pipe to block, which will
    # block us
    while ( my @ready = ( $sel->can_write(0) ) ) {
        for my $fh ( @ready ) {
            if ( $fh == $in ) {
                # read a small chunk from your source data
                my $read = read( $unencrypted_fh,
                                 my $bytes,
                                 INPUT_BUF_SZ,
                               );

                # and write it to our forked process
                #
                # if we're out of bytes to read, close the
                # handle
                if ( !$read ) {
                    $sel->remove( $fh );
                    $fh->close;
                }
                else {
                    syswrite( $fh, $bytes );
                }
            }
            else {
                die "unexpected filehandle for input";
            }
        }
    }

    while ( my @ready = ( $sel->can_read(0) ) ) {
        # fetch the contents of STDOUT and send it to the
        # destination
        for my $fh ( @ready ) {
            # this buffer can be much larger, though in the
            # case of gpg it will generally be much smaller
            # than the input was. The process will block if
            # the output pipe is full, so you want to pull as
            # much out as you can.

            my $read = sysread( $fh, my $bytes, OUTPUT_BUF_SZ );

            if ( !$read ) {
                $sel->remove( $fh );
                $fh->close;
            }
            elsif ( $fh == $out ) {
                # $encrypted_fh is whatever we're throwing output
                # into

                syswrite( $encrypted_fh, $bytes ) if $read;
            }
            elsif ( $fh == $err ) {
                print STDERR $bytes;
            }
            else {
                die "unexpected filehandle for output";
            }
        }
    }

    # IO::Handle won't complain if we close a handle that's
    # already closed
    $sel->remove( $in ); $in->close;
    $sel->remove( $out ); $out->close;
    $sel->remove( $err ); $err->close;

    waitpid( $pid, 0 );
}

That’s actually about it.

I keep my buffer for input small, as pipe buffers tend to be small.  If you overload your pipe your program will hang indefinitely (or until an alarm goes off, if you set one).  4096 bytes seems to be the limit, though your own limit may be different.  When in doubt, be conservative and go smaller.

The output buffer can afford to be bigger, up to the limit of available memory (but don’t do that).  In our example of encryption gpg will consume much more than it produces, so a larger buffer doesn’t really buy you anything but if we were decrypting it would be the reverse and a larger buffer would help immensely.

Komodo IDE Headaches

I’m slowly coming around to the idea that an IDE might be useful for PHP/Symfony projects (still not convinced about other languages and frameworks) and I’m currently trying out ActiveState’s Komodo IDE 10 on Linux.

It looks great but it’s… buggy.  One day in and I’m already getting frustrated with it.

  • The preference file doesn’t appear to be saved until the application is closed, if the application crashes it’s not clear that your changes will be saved.  This might be a safety feature, but probably not, because…
  • At least some preferences don’t take effect until the application is closed.  Not the ones that you’re warned about like checking remote files for changes, but other ones like ‘Allow file contents to override Tab and Indentation settings’ (which itself is unreliable since at least 2011).
  • When changing preferences, there is more than one place to change: Edit / Preferences, Edit / Current File Preferences, and Project / Project Preferences (the last is not under the Edit menu).
  • The cursor blinks by default (which is super annoying when you’re moving the cursor around the screen) and there isn’t an explicit option to disable it.  You have to create a custom JavaScript script that executes at every file open.
  • The toolbar icons are heavily styled, making their use opaque and the tooltips mandatory reading.
  • It has already crashed while closing — which, per the above, I’m doing a lot.

It’s not all bad, there are some really nice features:

  • Vi keybindings, so things like ‘A’ to start appending to the current line, or ‘/’ to search the current file, are really nice to someone who uses vim every day.
  • I do appreciate the ability to script things
  • The syntax highlighting and coloring seems more reliable (i.e. harder to confuse) than average.
  • The installation to a local directory was painless, and an icon properly shows up in the applications menu (I use Mate).  The default installation dir is to your home directory instead of /usr/local (which is the right thing to do for trial software, imho).

I want to like this editor, I really do, but it’s just going downhill as I work with it more.  At $250 per license it’s hard to justify the expense to my boss unless I really like it.

Beautiful Code

The mark of a master programmer is someone who writes code that a novice can debug.
— attribution unknown

I read this quote, or something very similar to it, a long time time ago when I was just starting out.

I take the idea behind that quote to mean that master programmers have the experience to find the simplest solutions, which are easier to understand, but they also make their code easier to read so errors stand out.

It came back to me while reading a novice’s request for help in debugging something.  The example was a mess, with lots of extra activity, but it was also dense and poorly formatted.  The very simple bug was hard to see because of the sheer amount of code and the inconsistent formatting.

I strive to find simple solutions to the code I write, but I also strive to make my code neatly formatted and well-spaced.  I generally limit my lines to ~78 characters; I vertically align related operators; I leave space around operators.  This goes hand-in-hand with simple code: short functions that only do one thing; effective naming of things; do the least possible.  Together these generally make code that is both robust and easy to maintain.

I think of formatting to be like engineering a bridge.  Dense code is like big thick columns, steel plates, and stone architecture — it gets the job done, but it looks so heavy.  The best bridges are light and airy, full of empty space, yet they are stronger and more resilient.

PS: if you know this quote, and know who said it first, please drop me a line so that I can attribute it properly!

Rate Yourself From 1-10

This… is genius.

In technical programming interviews a common (terrible) question that interviewers may ask is, “rate yourself from 1-10 on x”, where x=one or more programming languages.  I’ve been asked that myself, but I’ve never seen what 1-10 would actually correspond to until now.  It’s a very fuzzy measure and most everybody (from junior to senior) seems to rate themselves about a 7.

Without further ado:

  • 10 – Wrote the book on it (there must be a book)
  • 9 – Could have written the book, but didn’t.
  • 8 – Deep understanding of corner cases and esoteric features.
  • 7 – Understanding and (appropriate) usage of most lesser known features.
  • 6 – Can develop large programs and deploy new systems from scratch.
  • 5 – Can develop/deploy larger programs/systems using all basic (w/o book) and more esoteric features (some w/ book, some without)
  • 4 – Can develop/deploy medium programs/systems using all basic (w/o book) and a few esoteric features (w/ book). Understands enough about internals to do nontrivial troubleshooting.
  • 3 – Can utilize basic features without much help, manage a small installation competently.
  • 2 – can write hello world without looking at a book, kind of figure out how a system works, if necessary.
  • 1 – Can read programs, make small changes to existing programs, or make adjustments to already installed systems, w/book handy.
  • 0 – No experience.

Credit goes to /u/icydocking for providing the list on reddit.

GCC Tuning

File this under “things that should be obvious but I just found out about”.  GCC will tell give you optimal flags for your processor.  To wit:

echo "" | gcc -march=native -v -E - 2>&1 | grep cc1

Stick the results into your make file or command-line call to GCC and your executable should be as optimized for your processor as GCC can make it.

You could, of course, always use --march=native  and forget all that but that doesn’t work so well if you’re cross-compiling.

Found Some Code…

# check for missing fields
if (!$var1) {
        return 'missing_var1';
} elsif (!$var2) {
        return 'missing_var2';
} elsif (! $hash->{val1}) {
        return 'missing_hash_val1';
} elsif (! $hash->{val2}) {
        return 'missing_hash_val2';
} elsif (! $hash->{val3}) {
        return 'missing_val3';
} elsif (! $hash->{val4}) {
        return 'missing_hash_val4';
} elsif (! $hash->{val5}) {
        return 'missing_hash_val5';
} elsif (! $hash->{val6}) {
        return 'missing_hash_val6';
} elsif (! $hash->{val7}) {
        return 'missing_hash_val7';
} elsif (! $hash->{val7}) {
        return 'missing_hash_val8';
} elsif (! $hash->{val9}) {
        return 'missing_hash_val9';
} elsif (! $hash->{val10}) {
        return 'missing_hash_val10';
} elsif (!$var3) {
        return 'missing_var3';
} elsif (!$var4) {
        return 'missing_val4';
}

[identifiers changed to protect the innocent, but otherwise this is verbatim]

Gee, I wonder if this could be simplified?  I really don’t see the pattern…

Sadly, I see code like this much to often.  Most of the time it’s old and the developer(s) moved on long ago.  I stopped reading TheDailyWTF regularly because it hits too close to home nowadays.