{"id":1711,"date":"2017-10-11T22:57:32","date_gmt":"2017-10-12T02:57:32","guid":{"rendered":"https:\/\/blog.jonesling.us\/?p=1711"},"modified":"2024-02-19T16:15:00","modified_gmt":"2024-02-19T21:15:00","slug":"perls-open3-re-explained","status":"publish","type":"post","link":"https:\/\/blog.jonesling.us\/?p=1711","title":{"rendered":"Perl&#8217;s Open3, Re-Explained"},"content":{"rendered":"<p>I recently went spelunking into a core Perl module that I previously knew nothing about, <a href=\"http:\/\/perldoc.perl.org\/IPC\/Open3.html\">IPC::Open3<\/a>.\u00a0 After fifteen years of developing in Perl I finally had a reason to use it.<\/p>\n<p>If you&#8217;re reading this, it&#8217;s probably because you went looking for information on how to use open3 because the module&#8217;s documentation is bad.\u00a0 I mean it&#8217;s really, really terrible.<\/p>\n<p>Not only will you not know how to use open3 after reading the docs, you may become convinced that maybe open3 isn&#8217;t the module that you need, or maybe it would work but you&#8217;d just be better off looking for something else because this is too damn hard to use.<\/p>\n<p>Fear not, intrepid reader, because if I can figure it out so can you.\u00a0 But I will try to save you some of the leg work I went through. There&#8217;s precious little information scattered online, because this isn&#8217;t a popular package.\u00a0 My loss is your gain, hopefully this helps you.<\/p>\n<h1>Why IPC::Open3?<\/h1>\n<h2>When Would I Use IPC::Open3?<\/h2>\n<p><code>open3<\/code> is used when you need to open three pipes to another process.\u00a0 That might be obvious from the name as well as the package&#8217;s synopsis:<\/p>\n<pre class=\"prettyprint linenums language-perl prettyprinted\">$pid = open3( \\*CHLD_IN,\n              \\*CHLD_OUT,\n              \\*CHLD_ERR,\n              'some cmd and args',\n              'optarg', ...\n            );<\/pre>\n<p>Why would you do that?\u00a0 The most obvious situation is when you want to control STDIN, STDOUT, and STDERR simultaneously.\u00a0 The example I provide below, which is not contrived by the way but adapted from real production code, does exactly that.<\/p>\n<h2>There Are Lots Of Modules To Make This Easier, Why Should I Use IPC::Open3?<\/h2>\n<p>IPC::Open3 is part of the Perl core.\u00a0 There&#8217;s a lot to be said for using a library that&#8217;s already installed and doesn&#8217;t have external dependencies vs. pulling in someone&#8217;s write-once-read-never Summer of Code academic project.<\/p>\n<p>In addition, the modules that I found only served to hide the complexity of Open3, but they did it badly and didn&#8217;t really remove much code compared to what I came up with.<\/p>\n<h2>What Else Do I Need?<\/h2>\n<p>One of the things that&#8217;s not obvious from the Open3 docs are that you&#8217;re not going to use IPC::Open3 by itself.\u00a0 You need a couple of other packages (also part of core) in order to use it effectively.<\/p>\n<h1>How I Used IPC::Open3<\/h1>\n<p>In our example, we&#8217;re going to fork a separate process (using open3) to encrypt a file stream using <a href=\"https:\/\/gnupg.org\/\">gpg<\/a>.\u00a0 gpg will accept a stream of data, encrypt it, and output to a stream.\u00a0 We also want to capture errors sent to STDERR.<\/p>\n<p>In a terminal, using bash, this would be really easy: <code>gpg --decrypt &lt; some_file &gt; some_file.pgp 2&gt;err.out<\/code><\/p>\n<p>We could do all of this in Perl by writing temporary files, passing special file handle references into gpg as arguments, and capturing STDERR the old fashioned way, all using a normal open().\u00a0 But where&#8217;s the fun in that?<\/p>\n<p>First, lets <code>use<\/code> the packages we&#8217;ll need:<\/p>\n<pre class=\"prettyprint linenums language-perl prettyprinted\">use IO::Handle;\nuse IO::Select;\nuse IPC::Open3;<\/pre>\n<p><a href=\"http:\/\/perldoc.perl.org\/IO\/Handle.html\">IO::Handle<\/a> allows us to operate on handles using object methods.\u00a0 I don&#8217;t typically use it, but this code really appreciates it.\u00a0 <a href=\"http:\/\/perldoc.perl.org\/IO\/Select.html\">IO::Select<\/a> does the same for <a href=\"http:\/\/perldoc.perl.org\/functions\/select.html\">select<\/a>, but it helps even more than IO::Handle here.<\/p>\n<pre class=\"prettyprint linenums language-perl prettyprinted\">use constant INPUT_BUF_SZ  =&gt; 2**12;\nuse constant OUTPUT_BUF_SZ =&gt; 2**20;<\/pre>\n<p>You might want to experiment to find the best buffer sizes.\u00a0 The input buffer should not be larger than the pipe buffer on your particular system, else you&#8217;ll block trying to put two pounds of bytes into a one pound buffer.<\/p>\n<p>Now, using IO::Handle we&#8217;ll create file handles for the stdin, stdout, and stderr that our forked process will read and write to:<\/p>\n<pre>my ( $in,\n     $out,\n     $err,\n   ) = ( IO::Handle-&gt;new,\n         IO::Handle-&gt;new,\n         IO::Handle-&gt;new\n       );<\/pre>\n<p>Call <code>open3<\/code>, which (like <a href=\"http:\/\/perldoc.perl.org\/functions\/fork.html\">fork<\/a>) gives us the PID of our new process.<\/p>\n<p>Note: If we don&#8217;t call <code><a href=\"http:\/\/perldoc.perl.org\/functions\/waitpid.html\">waitpid<\/a><\/code> later on we&#8217;ll create a zombie after we&#8217;re done.<\/p>\n<pre>my $pid = open3( $in, $out, $err, '\/usr\/bin\/gpg', @gpg_options );\n\nif ( !$pid ) {\n    die \"failed to open pipe to gpg\";\n}<\/pre>\n<p>One of the features of IO::Select is that it allows us to find out when a handle is blocked. This is important when the output stream is dependent on the input stream, and each stream depends on a pipe of limited size.<\/p>\n<p>We&#8217;re going to repeatedly loop over the handles, looking for a stream that is active, and read\/write a little bit before continuing to loop.\u00a0 We do this until both our input and output is exhausted.\u00a0 It&#8217;s pretty likely that they&#8217;ll be exhausted at different times, i.e. we&#8217;ll be done with the input sometime before we&#8217;re done with the output.<\/p>\n<p>As we exhaust each handle we remove it from the selection of possible handles, so that the main loop terminates naturally.<\/p>\n<p>The value passed to <code>can_write<\/code> and <code>can_read<\/code> is the number of seconds to wait for the handle to be ready.\u00a0 Non-zero timeouts cause a noticeable delay, while not setting it at all will cause us to block until the handle is ready, so for now we&#8217;ll leave it at zero.<\/p>\n<pre># $unencrypted_fh and $encrypted_fh should be defined as\n# handles to real files\n\nmy $sel = IO::Select-&gt;new;\n\n$sel-&gt;add( $in, $out, $err );\n\n# loop until we don't have any handles left\n\nwhile ( my @handles = ( $sel-&gt;handles) ) {\n    # read until there's nothing left\n    #\n    # write in small chunks so we don't overfill the buffer\n    # and accidentally cause the pipe to block, which will\n    # block us\n    while ( my @ready = ( $sel-&gt;can_write(0) ) ) {\n        for my $fh ( @ready ) {\n            if ( $fh == $in ) {\n                # read a small chunk from your source data\n                my $read = read( $unencrypted_fh,\n                                 my $bytes,\n                                 INPUT_BUF_SZ,\n                               );\n\n                # and write it to our forked process\n                #\n                # if we're out of bytes to read, close the\n                # handle\n                if ( !$read ) {\n                    $sel-&gt;remove( $fh );\n                    $fh-&gt;close;\n                }\n                else {\n                    syswrite( $fh, $bytes );\n                }\n            }\n           \u00a0else {\n                die \"unexpected filehandle for input\";\n            }\n        }\n    }\n\n    while ( my @ready = ( $sel-&gt;can_read(0) ) ) {\n        # fetch the contents of STDOUT and send it to the\n        # destination\n        for my $fh ( @ready ) {\n            # this buffer can be much larger, though in the\n            # case of gpg it will generally be much smaller\n            # than the input was. The process will block if\n            # the output pipe is full, so you want to pull as\n            # much out as you can.\n\n            my $read = sysread( $fh, my $bytes, OUTPUT_BUF_SZ );\n\n            if ( !$read ) {\n                $sel-&gt;remove( $fh );\n                $fh-&gt;close;\n            }\n            elsif ( $fh == $out ) {\n                # $encrypted_fh is whatever we're throwing output\n                # into\n\n                syswrite( $encrypted_fh, $bytes ) if $read;\n            }\n            elsif ( $fh == $err ) {\n                print STDERR $bytes;\n            }\n            else {\n                die \"unexpected filehandle for output\";\n            }\n        }\n    }\n\n    # IO::Handle won't complain if we close a handle that's\n    # already closed\n    $sel-&gt;remove( $in ); $in-&gt;close;\n    $sel-&gt;remove( $out ); $out-&gt;close;\n    $sel-&gt;remove( $err ); $err-&gt;close;\n\n    waitpid( $pid, 0 );\n}<\/pre>\n<p>That&#8217;s actually about it.<\/p>\n<p>I keep my buffer for input small, as pipe buffers tend to be small.\u00a0 If you overload your pipe your program will hang indefinitely (or until an alarm goes off, if you set one).\u00a0 4096 bytes seems to be the limit, though your own limit may be different.\u00a0 When in doubt, be conservative and go smaller.<\/p>\n<p>The output buffer can afford to be bigger, up to the limit of available memory (but don&#8217;t do that).\u00a0 In our example of encryption gpg will consume much more than it produces, so a larger buffer doesn&#8217;t really buy you anything but if we were decrypting it would be the reverse and a larger buffer would help immensely.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I recently went spelunking into a core Perl module that I previously knew nothing about, IPC::Open3.\u00a0 After fifteen years of developing in Perl I finally had a reason to use it. If you&#8217;re reading this, it&#8217;s probably because you went looking for information on how to use open3 because the module&#8217;s documentation is bad.\u00a0 I &hellip; <a href=\"https:\/\/blog.jonesling.us\/?p=1711\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Perl&#8217;s Open3, Re-Explained&#8221;<\/span><\/a><\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_crdt_document":"","wprm-recipe-roundup-name":"","wprm-recipe-roundup-description":"","advanced_seo_description":"","jetpack_seo_html_title":"","jetpack_seo_noindex":false,"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[32],"tags":[151,391,394,392,393,390,389],"class_list":["post-1711","post","type-post","status-publish","format-standard","hentry","category-programming-2","tag-dad-needs-to-stop-bringing-work-home","tag-fixing-bad-documentation","tag-gpg","tag-iohandle","tag-ioselect","tag-ipcopen3","tag-perl"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p4o3FW-rB","jetpack-related-posts":[],"jetpack_likes_enabled":true,"_links":{"self":[{"href":"https:\/\/blog.jonesling.us\/index.php?rest_route=\/wp\/v2\/posts\/1711","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.jonesling.us\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.jonesling.us\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.jonesling.us\/index.php?rest_route=\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.jonesling.us\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1711"}],"version-history":[{"count":13,"href":"https:\/\/blog.jonesling.us\/index.php?rest_route=\/wp\/v2\/posts\/1711\/revisions"}],"predecessor-version":[{"id":3379,"href":"https:\/\/blog.jonesling.us\/index.php?rest_route=\/wp\/v2\/posts\/1711\/revisions\/3379"}],"wp:attachment":[{"href":"https:\/\/blog.jonesling.us\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1711"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.jonesling.us\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1711"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.jonesling.us\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1711"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}