NiFi HTTP Service

I’m attempting to set up an HTTP server in NiFi to accept uploads and process them on-demand.  This gets tricky because I want to submit the files using an existing web application that will not be served from NiFi, which leads to trouble with XSS (Cross-Site Scripting) and setting up CORS (Cross Origin Resource Sharing [1]).

The trouble starts with just trying to PUT or POST a simple file.  The error in Firefox reads:

Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource (Reason: CORS header 'Access-Control-Allow-Origin' missing).

You can serve up the Javascript that actually performs the upload from NiFi and side-step XSS, but you may still run into trouble with CORS.  You’ll have trouble even if NiFi and your other web server live on the same host (using different ports, of course), as they’re considered different hosts for the purposes of XSS prevention.

handlehttpresponse screen shot
HandleHttpResponse processor config

To make this work, you’ll need to enable specific headers in the HandleHttpResponse processor.  Neither the need to set some headers, nor the headers that need to be set, are documented by NiFi at this time (so far as I can tell).

  1. Open the configuration of the HandleHttpResponse processor
  2. Add the following headers and values as properties and values, but see below for notes regarding the values
    Access-Control-Allow-Origin: *
    
    Access-Control-Allow-Methods: PUT, POST, GET, OPTIONS
    
    Access-Control-Allow-Headers: Accept, Accept-Encoding, Accept-Language, Connection, Content-Length, Content-Type, DNT, Host, Referer, User-Agent, Origin, X-Forwarded-For

You may want to review the value for Access-Control-Allow-Origin, as the wildcard may allow access to unexpected hosts.  If your server is public-facing (why would you do that with NiFi?) then you certainly don’t want a wildcard here.  The wildcard makes configuration much simpler if NiFi is strictly interior-facing, though.

The specific values to set for Access-Control-Allow-Methods depend on what you’re doing.  You’ll probably need OPTIONS for most cases.  I’m serving up static files so I need GET, and I’m receiving uploads that may or may not be chunked, so I need POST and PUT.

The actual headers needed for Access-Control-Allow-Headers is a bit variable.  A wildcard is not an acceptable value here, so you’ll have to list every header you need separately — and there are a bunch of possible headers.  See [3] for an explanation and a fairly comprehensive list of possible headers.  Our list contains a small subset that covers our basic test cases; your mileage may vary.

You may also want to set up a RouteOnAttribute processor to ignore OPTIONS requests (${http.method:equals('OPTIONS')}), otherwise you might see a bunch of zero-byte files in your flow.

References:

[1] https://developer.mozilla.org/en-US/docs/Web/HTTP/Access_control_CORS

[2] http://stackoverflow.com/questions/24371734/firefox-cors-request-giving-cross-origin-request-blocked-despite-headers

[3] http://stackoverflow.com/questions/13146892/cors-access-control-allow-headers-wildcard-being-ignored

“ERROR: … failed to process… ” in NiFi

I was greeted by a few cryptic things in NiFi this morning during my morning check-in.

  1. A PutSQL processor was reporting an error:
    "ERROR: PutSQL[id=$UUID>]failed to process due to java.lang.IndexOutOfBoundsException: Index: 1, Size: 1; rolling back session: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1"
  2. There were no recent errors counted in the LogAttribute counter we set up to record errors;
  3. The Tasks/Time count in the PutSQL processor was though the roof, despite the errors and lack of successes.

Needless to say, the processor was all bound up and a number of tasks were queued.  Not a good start to my day.

I checked the data provenance and didn’t see anything remarkable about the backed-up data.  The error message suggests (to me) that the first statement parameter is at fault, and that parameter happened to be a date (which has been problematic for me in NiFi with a MySQL backend).  Neither that value, nor the rest of the values, were remarkable or illegal for the fields they’re going into.

It wasn’t until I spent some time looking over the source data that I saw the problem: there is a duplicate key in the data.  This error is NiFi’s way of complaining about it.

In our case the underlying table doesn’t have good keys, or a good structure in general, and I’m planning to replace it soon anyway, but updating the primary keys to allow the duplicate data (because it IS valid data, despite the table design) has solved the issue.

NiFi Build Error

I’m testing NiFi out on my local Gentoo installation to prepare for an implementation at work, and after a rather lengthy build/test process (“ten minutes” my fanny) ran into this error:

$ mvn clean install
[INFO] Scanning for projects...
...
'Script Engine' validated against 'ECMAScript' is invalid because Given value not found in allowed set 'Groovy, lua, python, ruby'

This error left me scratching my head.  Nothing related to JavaScript/ECMAScript dependencies were mentioned anywhere.  How would you get it, anyway?  Webkit, I suppose…

Sudden epiphany: this is a new Gentoo installation, and this program, including the build script, is running Java.  Gentoo doesn’t install Sun Oracle’s Java by default, but instead comes with IcedTea out of the box.  It’s acceptable for some simple uses, but is buggy for any complex. (Minecraft is a great example where it just doesn’t work.)  I haven’t used Java for anything yet, so I hadn’t installed the JDK yet.  The build instructions specify JDK 1.7 or higher, but I didn’t think anything of it because I’m used to just having it installed.

echo "dev-java/oracle-jdk-bin Oracle-BCLA-JavaSE" \
  >> /etc/portage/package.license/file
emerge -av dev-java/oracle-jdk-bin
...
$ mvn clean install
[INFO] Scanning for projects...
...
[INFO] BUILD SUCCESS

Finally!