Perl’s Open3, Re-Explained

I recently went spelunking into a core Perl module that I previously knew nothing about, IPC::Open3.  After fifteen years of developing in Perl I finally had a reason to use it.

If you’re reading this, it’s probably because you went looking for information on how to use open3 because the module’s documentation is bad.  I mean it’s really, really terrible.

Not only will you not know how to use open3 after reading the docs, you may become convinced that maybe open3 isn’t the module that you need, or maybe it would work but you’d just be better off looking for something else because this is too damn hard to use.

Fear not, intrepid reader, because if I can figure it out so can you.  But I will try to save you some of the leg work I went through. There’s precious little information scattered online, because this isn’t a popular package.  My loss is your gain, hopefully this helps you.

Why IPC::Open3?

When Would I Use IPC::Open3?

open3 is used when you need to open three pipes to another process.  That might be obvious from the name as well as the package’s synopsis:

$pid = open3( \*CHLD_IN,
              \*CHLD_OUT,
              \*CHLD_ERR,
              'some cmd and args',
              'optarg', ...
            );

Why would you do that?  The most obvious situation is when you want to control STDIN, STDOUT, and STDERR simultaneously.  The example I provide below, which is not contrived by the way but adapted from real production code, does exactly that.

There Are Lots Of Modules To Make This Easier, Why Should I Use IPC::Open3?

IPC::Open3 is part of the Perl core.  There’s a lot to be said for using a library that’s already installed and doesn’t have external dependencies vs. pulling in someone’s write-once-read-never Summer of Code academic project.

In addition, the modules that I found only served to hide the complexity of Open3, but they did it badly and didn’t really remove much code compared to what I came up with.

What Else Do I Need?

One of the things that’s not obvious from the Open3 docs are that you’re not going to use IPC::Open3 by itself.  You need a couple of other packages (also part of core) in order to use it effectively.

How I Used IPC::Open3

In our example, we’re going to fork a separate process (using open3) to encrypt a file stream using gpg.  gpg will accept a stream of data, encrypt it, and output to a stream.  We also want to capture errors sent to STDERR.

In a terminal, using bash, this would be really easy: gpg --decrypt < some_file > some_file.pgp 2>err.out

We could do all of this in Perl by writing temporary files, passing special file handle references into gpg as arguments, and capturing STDERR the old fashioned way, all using a normal open().  But where’s the fun in that?

First, lets use the packages we’ll need:

use IO::Handle;
use IO::Select;
use IPC::Open3;

IO::Handle allows us to operate on handles using object methods.  I don’t typically use it, but this code really appreciates it.  IO::Select does the same for select, but it helps even more than IO::Handle here.

use constant INPUT_BUF_SZ  => 2**12;
use constant OUTPUT_BUF_SZ => 2**20;

You might want to experiment to find the best buffer sizes.  The input buffer should not be larger than the pipe buffer on your particular system, else you’ll block trying to put two pounds of bytes into a one pound buffer.

Now, using IO::Handle we’ll create file handles for the stdin, stdout, and stderr that our forked process will read and write to:

my ( $in,
     $out,
     $err,
   ) = ( IO::Handle->new,
         IO::Handle->new,
         IO::Handle->new
       );

Call open3, which (like fork) gives us the PID of our new process.

Note: If we don’t call waitpid later on we’ll create a zombie after we’re done.

my $pid = open3( $in, $out, $err, '/usr/bin/gpg', @gpg_options );

if ( !$pid ) {
    die "failed to open pipe to gpg";
}

One of the features of IO::Select is that it allows us to find out when a handle is blocked. This is important when the output stream is dependent on the input stream, and each stream depends on a pipe of limited size.

We’re going to repeatedly loop over the handles, looking for a stream that is active, and read/write a little bit before continuing to loop.  We do this until both our input and output is exhausted.  It’s pretty likely that they’ll be exhausted at different times, i.e. we’ll be done with the input sometime before we’re done with the output.

As we exhaust each handle we remove it from the selection of possible handles, so that the main loop terminates naturally.

The value passed to can_write and can_read is the number of seconds to wait for the handle to be ready.  Non-zero timeouts cause a noticeable delay, while not setting it at all will cause us to block until the handle is ready, so for now we’ll leave it at zero.

# $unencrypted_fh and $encrypted_fh should be defined as
# handles to real files

my $sel = IO::Select->new;

$sel->add( $in, $out, $err );

# loop until we don't have any handles left

while ( my @handles = ( $sel->handles) ) {
    # read until there's nothing left
    #
    # write in small chunks so we don't overfill the buffer
    # and accidentally cause the pipe to block, which will
    # block us
    while ( my @ready = ( $sel->can_write(0) ) ) {
        for my $fh ( @ready ) {
            if ( $fh == $in ) {
                # read a small chunk from your source data
                my $read = read( $unencrypted_fh,
                                 my $bytes,
                                 INPUT_BUF_SZ,
                               );

                # and write it to our forked process
                #
                # if we're out of bytes to read, close the
                # handle
                if ( !$read ) {
                    $sel->remove( $fh );
                    $fh->close;
                }
                else {
                    syswrite( $fh, $bytes );
                }
            }
            else {
                die "unexpected filehandle for input";
            }
        }
    }

    while ( my @ready = ( $sel->can_read(0) ) ) {
        # fetch the contents of STDOUT and send it to the
        # destination
        for my $fh ( @ready ) {
            # this buffer can be much larger, though in the
            # case of gpg it will generally be much smaller
            # than the input was. The process will block if
            # the output pipe is full, so you want to pull as
            # much out as you can.

            my $read = sysread( $fh, my $bytes, OUTPUT_BUF_SZ );

            if ( !$read ) {
                $sel->remove( $fh );
                $fh->close;
            }
            elsif ( $fh == $out ) {
                # $encrypted_fh is whatever we're throwing output
                # into

                syswrite( $encrypted_fh, $bytes ) if $read;
            }
            elsif ( $fh == $err ) {
                print STDERR $bytes;
            }
            else {
                die "unexpected filehandle for output";
            }
        }
    }

    # IO::Handle won't complain if we close a handle that's
    # already closed
    $sel->remove( $in ); $in->close;
    $sel->remove( $out ); $out->close;
    $sel->remove( $err ); $err->close;

    waitpid( $pid, 0 );
}

That’s actually about it.

I keep my buffer for input small, as pipe buffers tend to be small.  If you overload your pipe your program will hang indefinitely (or until an alarm goes off, if you set one).  4096 bytes seems to be the limit, though your own limit may be different.  When in doubt, be conservative and go smaller.

The output buffer can afford to be bigger, up to the limit of available memory (but don’t do that).  In our example of encryption gpg will consume much more than it produces, so a larger buffer doesn’t really buy you anything but if we were decrypting it would be the reverse and a larger buffer would help immensely.