The framework ‘Microsoft.NETCore.App’, version ‘6.0.0’ was not found.

New Linux box, old home directory.

Attempting to execute dotnet ef database update repeatedly failed with the error, “The framework ‘Microsoft.NETCore.App’, version ‘6.0.0’ was not found.” Individual dotnet commands (dotnet --version, dotnet build, etc) were working, which became very confusing.

Google was not my friend today, the error as a search term produced lots of noise, some issues from github that indicated old bugs, and red herrings.

I finally stumbled across the problem: the value of the environmental variable DOTNET_ROOT was wrong. The value was /opt/dotnet-sdk-bin-5.0, but the installed version was 6.0.

Version 5.0 had been installed initially, but I upgraded it during the same login session. While /etc/env.d/90dotnet-sdk-bin-6.0 was installed properly, and used the correct value, it would not take effect until I logged out and/or rebooted.

Shame on Microsoft for their terrible, uninformative errors.

Recursively delete directories unless a specific file is present

There are several ways to do this, but my Google-fu may be weak because it took me much too long to figure this out.

I want to recursively delete directories with a specific name (or names) within directory structure, UNLESS the matched directory contains a sentinel file.

In my case I want to make a C# directory structure “cleaner-than-clean” by removing all ‘bin’ and ‘obj’ directories, leaving just the user-generated files behind.  This is pretty easy to achieve:

#!/bin/env bash

dir=/path/to/project

find $dir -type d \
    \( -name 'bin' -o -name 'obj' \)
    -print

This says “find things under $dir that are directories (-type d) and are named either ‘bin’ (-name 'bin') or (-o) named ‘obj’ (-name 'obj').  The parentheses force the two -name statements to be considered as a single condition, so the effect is to return true if either item matches. If the final result is true then print the path.

Notice that I’ve escaped (\) the parentheses because I’m using bash. Most UNIX shells do require these to be escaped, but yours may not. I’ve also terminated each line by escaping it. A single-line command may be spread over several lines this way, making it easier to read.

‘bin’ is also the conventional name for a directory of non-build executables, like helper scripts.  I do have some, including this cleaner-than-clean cleaning script that I’m working out, and don’t want to delete those by accident. The above command would find them, if they were in the directory tree.

find allows you to prune (-prune) the search tree, ignoring selective directories, according to certain criteria but it doesn’t support the concept of peeking into sub-directories. Bummer.

You may, however, execute independent commands (-exec) and use the results of those commands to affect find‘s parameters, including -prune. We can exec the test command, which can tell us if our sentinel file exists.

#!/bin/env bash

dir=/path/to/project
sentinel=.keep

find "$basedir" \
    -type d \
    \( -name bin -o -name obj \) \
    ! -exec test -e "{}/$sentinel" ';' \
    -print

The new line executes test to see if the current path ({}) contains a file called $sentinel (I’ve defined $sentinel to be .keep but any filename will do), which returns true if it exists. The line is negated (!) so if the sentinel is found further actions are skipped.

The final step is to actually delete the directory. We call rm -Rf (-R = recursive, -f = force) because we just want the whole thing gone, no questions asked. The trailing plus (+) tells find that rm can accept multiple paths in a single call, rather than calling rm once for each path.

#!/bin/env bash

dir=/path/to/project
sentinel=.keep

find "$basedir" \
    -type d \
    \( -name bin -o -name obj \) \
    ! -exec test -e "{}/$sentinel" ';' \
    -print \
    -exec rm -Rf '{}' \+

Linux, Solaris, Windows

Linux: Because rebooting is for adding hardware

Solaris: Because you don’t need to reboot to add hardware

Windows: Because rebooting is for adding hardware, adding software, regularly scheduled downtime, and should also be done on a daily basis to keep the machine running.

[attribution unknown]

It’s the Little Things

Small things make me happy.

I run a local Active Directory domain on my home network with a Samba back-end.¹ Over the past few weeks I’ve been building out a second domain controller, but I didn’t have 100% replication – it replicated AD and DNS, but not DHCP.²

After a short outage yesterday (due to an update) I decided that this had to change.  So I:

  • followed the instructions,
  • realized that the instructions were out of date,
  • figured out the correct procedure,
  • completed my setup, and
  • submitted a revision to the wiki.

It’s a small step, but I’m such a nerd that I’m riding high – one, because I’ve scratched an itch and have redundancy in my domain; and two, that I’ve visibly contributed something useful to open source (small as it may be).


¹ For along time it was powered by a single Raspberry Pi, but keeping that up to date became a struggle because it’s a little too low-powered.  But that’s all another story.

² This isn’t a completely useless situation.  It’s much easier to recover from a domain-controller crash if you still have a standing domain controller.  (A solo-domain-controller recovery is much more complicated recovery.)

CNAMEs in Samba

I’m documenting something that wasn’t easy to uncover.

TL;DR – if you want to create a CNAME in Samba to replace an existing DNS record, you must delete the A record first.

Background

I have an Active Directory domain running on Samba.  I’ve had an underpowered file server, simply called ‘files’, for a while.  I finally had a chance to upgrade it to some newer hardware with a rather large SSD.

Since this, like all my home projects, is a side-project that takes several days to complete I chose to build the new server (‘concord’) and get it running while leaving ‘files’ in-place.

I like to have servers named after their roles, because it makes things easy, but we have a lot more computers than formal roles in the house.  We’ve finally settled on a naming convention: Windows names are places in Washington, Apple products are from California, and Linux products are from Massachusetts.  (I am aware that Unix was birthed in New Jersey but… Ew.  At least X came from MIT, that’s good enough for me.)

I also have a number of dependencies on the name ‘files’ including, most crucially, my own brain.  Muscle memory is hard to overcome (“ls /net/files/… damn ^H/net/concord/…”) and I don’t want to relearn a server name.

That left me with three problems to solve: follow the naming standard, use a “taken” name for the server, and build said server while the needed name is still available on the network.

The obvious answer is to use CNAMEs.  I planned to set up ‘files’ as an alias to ‘concord’.  Similar practice would carry us forward through an indefinite number of role-swaps in the future.

After copying all of our data from ‘files’ to ‘concord’ I confidently shut ‘files’ down and added my CNAME.  This is where things went wrong.

The Problem

After shutting ‘files’ down, I started by creating the CNAME:

dc1 # samba-tool dns add 192.168.1.2 ad.jonesling.us files CNAME concord.ad.jonesling.us -U administrator
Password for [AD\administrator]: ******
Record added successfully

That’s all well and good.  Let’s test it out from another computer:

natick $ nslookup
> files
Server:     dc1
Address:    2001:470:1f07:583:44a:52ff:fe4a:8cee#53

Name:   files.ad.jonesling.us
Address: 192.168.1.153
files.ad.jonesling.us   canonical name = concord.ad.jonesling.us.

Crap.  That’s the correct canonical name, but the wrong IP address – it’s ‘files’ old IP address.

Some googling uncovered someone with a similar issue back in 2012, but they “solved” it by creating static A records instead.  That’s not a great solution, certainly not what I want.

I thought about it for a few minutes.  I got a success message, but was the record actually created?  How can I tell?  What happens if I insert it again?

dc1 # samba-tool dns add 192.168.1.2 ad.jonesling.us files CNAME concord.ad.jonesling.us -U administrator
Password for [AD\administrator]: ******

ERROR(runtime): uncaught exception - (9711, 'WERR_DNS_ERROR_RECORD_ALREADY_EXISTS')
  File "/usr/lib/python3.7/site-packages/samba/netcmd/__init__.py", line 186, in _run
    return self.run(*args, **kwargs)
  File "/usr/lib/python3.7/site-packages/samba/netcmd/dns.py", line 945, in run
    raise e
  File "/usr/lib/python3.7/site-packages/samba/netcmd/dns.py", line 941, in run
    0, server, zone, name, add_rec_buf, None)

Well, it was inserted somewhere, that much is clear.

What happens if I dig it?  nslookup gave us a canonical address, but I want to see the actual DNS record.  Maybe it contains a clue.

First, lets dig the CNAME:

dc1 # dig @dc1 files.ad.jonesling.us IN CNAME

; <<>> DiG 9.14.8 <<>> @dc1 files.ad.jonesling.us IN CNAME
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 10370
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: 7a0aa65a623d5d3bdbdc39075f2eff9d5b81dbd9ed05c9d0 (good)
;; QUESTION SECTION:
;files.ad.jonesling.us. IN CNAME

;; ANSWER SECTION:
files.ad.jonesling.us. 900 IN CNAME concord.ad.jonesling.us.

;; Query time: 8 msec
;; SERVER: 192.168.1.2#53(192.168.1.2)
;; WHEN: Sat Aug 08 15:40:13 EDT 2020
;; MSG SIZE rcvd: 100

I’ve bolded the line that shows the alias.  That looks right.

But what about ‘files’?

dc1 # dig @dc1 files.ad.jonesling.us

; <<>> DiG 9.14.8 <<>> @dc1 files.ad.jonesling.us
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 42296
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: 0352365b5c07ecdace1ebf3c5f2effa6da5d32bfe9002b32 (good)
;; QUESTION SECTION:
;files.ad.jonesling.us. IN A

;; ANSWER SECTION:
files.ad.jonesling.us. 3600 IN A 192.168.1.153

;; Query time: 8 msec
;; SERVER: 192.168.1.2#53(192.168.1.2)
;; WHEN: Sat Aug 08 15:40:22 EDT 2020
;; MSG SIZE rcvd: 94

Ah.  That looks like a conflict.  Both records exist, and one has primacy over the other.

‘files’ was assigned an address via DHCP, I never gave it a static address, so I didn’t expect that I would need to delete anything.  But if I think about it, I realized that Samba doesn’t know that ‘files’ isn’t coming back.  (That makes me wonder what kind of graveyard DNS becomes, with friends’ phones and laptops popping in from time to time.)

So, can we delete the old A record, and what happens if we do?

The Solution

We delete the address.  It looks like it’s working:

dc1 # samba-tool dns delete 192.168.1.2 ad.jonesling.us files A 192.168.1.153 -U administrator
Password for [AD\administrator]:
Record deleted successfully

Was that the problem all along?

dc1 # dig @dc1 files.ad.jonesling.us

; <<>> DiG 9.14.8 <<>> @dc1 files.ad.jonesling.us
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 38286
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: 1610fb8ec07db8e3a43976ed5f2effdffeb142b30ca93848 (good)
;; QUESTION SECTION:
;files.ad.jonesling.us. IN A

;; ANSWER SECTION:
files.ad.jonesling.us. 900 IN CNAME concord.ad.jonesling.us.
concord.ad.jonesling.us. 3600 IN A 192.168.1.82

;; Query time: 15 msec
;; SERVER: 192.168.1.2#53(192.168.1.2)
;; WHEN: Sat Aug 08 15:41:20 EDT 2020
;; MSG SIZE rcvd: 116

That looks pretty good!

Securing WordPress: The Basics

This is the first in an occasional series of documents on WordPress.


WordPress is ubiquitous but fragile.  There are few alternatives that provide the easy posting, wealth of plugins, and integration of themes, while also being (basically) free to use.

It’s also a nerve-wracking exercise in keeping bots and bad actors out.  Some of the historical security holes are legendary.  It doesn’t take long to find someone who experienced a site where the comments section was bombed by a spammer, or even outright defacement.  (I will reluctantly raise my own hand, having experienced both in years past.)

Most people that use WordPress nowadays rely on 3rd parties to host it.  This document isn’t for them; hosted security is mostly outside of your control.  That’s generally a good thing: professionals are keeping you up to date and covered by best practices.

The rest of us muddle through security and updates in piece-meal fashion, occasionally stumbling over documents like this one.

Things To Look Out For

As a rule, good server hygiene demands that you keep an eye on your logs.  Tools like goaccess help you analyze usage, but nothing beats a peek at the raw logs for noticing issues cropping up.

The Good Bots

Sleepy websites like mine show a high proportion of “good” bots like Googlebot, compared to human traffic.  They’re doing good things like crawling (indexing) your site.

In my case they are the primary visitor base to my site, generating hundreds or even thousands of individual requests per day.  Hopefully your own WordPress site has a better visitor-to-bot ratio than mine.

We don’t want to block these guys from their work, they’re actually helpful.

The Bad Bots

You’ll also see bad bots, possibly lots of them.  Most are attempting to guess user credentials so they can post things on your WordPress site.

Some are fairly up-front about it:

...
132.232.47.138 [07:51:14] "POST /xmlrpc.php HTTP/1.1"
132.232.47.138 [07:51:14] "POST /xmlrpc.php HTTP/1.1"
132.232.47.138 [07:51:15] "POST /xmlrpc.php HTTP/1.1"
132.232.47.138 [07:51:16] "POST /xmlrpc.php HTTP/1.1"
132.232.47.138 [07:51:16] "POST /xmlrpc.php HTTP/1.1"
132.232.47.138 [07:51:18] "POST /xmlrpc.php HTTP/1.1"
...

They’ll hammer your server like that for hours.

Blocking their individual IP addresses at the firewall is devastatingly effective… for about five minutes.  Another bot from another IP will pop up soon.  Blocking individual IPs is a game of whack-a-mole.

Some are part of a “slow” botnet, hitting the same page from unique a IP address each time.  These are part of the large botnets you read about.

83.149.124.238 [05:01:06] "GET /wp-login.php HTTP/1.1" 200
83.149.124.238 [05:01:06] "POST /wp-login.php HTTP/1.1" 200
188.163.45.140 [05:03:38] "GET /wp-login.php HTTP/1.1" 200
188.163.45.140 [05:03:39] "POST /wp-login.php HTTP/1.1" 200
90.150.96.222 [05:04:30] "GET /wp-login.php HTTP/1.1" 200
90.150.96.222 [05:04:32] "POST /wp-login.php HTTP/1.1" 200
178.89.251.56 [05:04:42] "GET /wp-login.php HTTP/1.1" 200
178.89.251.56 [05:04:43] "POST /wp-login.php HTTP/1.1" 200

These are more insidious: patient and hard to spot on a heavily-trafficked blog.

Keeping WordPress Secure

You (hopefully) installed WordPress to a location outside of your “htdocs” document tree.  If not, you should fix that right away!  (Consider this “security tip #0” because without this you’re basically screwed.)

Security tip #1 is to make sure auto updates are enabled.  The slight risk of a botched release being automatically applied is much lower than that of having an critical security patch that is applied too late.

Like medieval door locks on your front door, there is little security advantage to running old software.

Once an exploit is patched, the prior releases are vulnerable as people deconstruct the patch and reverse-engineer the exploit(s) – assuming a exploit wasn’t published before the patch was released.

Locking WordPress Down

Your Apache configuration probably contains a section similar to this:

<Directory "/path/to/wordpress">
    ...
    Require all granted
    ...
</Directory>

We’re going to add some items between <Directory></Directory> tags to restrict access to the most vulnerable pieces.

You Can’t Attack Things You Can’t Reach

We’ll start by invoking the Principle of Least Privilege: people should only be able to do the things they must do, and nothing more.

xmlrpc.php is an API for applications to talk to WordPress.  Unfortunately it doesn’t carry extra security, so if you’re a bot it’s great to hammer with your password guesses – you won’t be blocked, and no one will be alerted.

Most people don’t need it.  Unless you know you need it, you should disable it completely.

<Directory "/path/to/wordpress">
    ...
    <Files xmlrpc.php>
        <RequireAll>
            Require all denied
        </RequireAll>
    </Files>
</Directory>

There are WordPress plugins that purport to “disable” xmlrpc.php, but they deny access from within WordPress.  That means that you’ve still paid a computational price for executing xmlrpc.php, which can be steeper than you expect, and you’re still at risk of exploitable bugs within it.  Denying access to it at the server level is much safer.

You Can’t Log In If You Can’t Reach the Login Page

This next change will block anyone from outside your LAN from logging in.  That means that if you’re away from home you won’t be able to log in, either, without tunneling back home.

<Directory "/path/to/wordpress">
    ...
    <Files wp-login.php>
        <RequireAll>
            Require all granted
            # remember that X-Forwarded-For may contain multiple
            # addresses, don't just search for ^192...
            Require expr %{HTTP:X-Forwarded-For} =~ /\b192\.168\.1\./
        </RequireAll>
    </Files>
</Directory>

If you’re not using a public-facing proxy, and don’t need to look at X-Forwarded-For, you can simplify this a little:

<Directory "/path/to/wordpress">
    ...
    <Files wp-login.php>
        <RequireAll>
            Require all granted
            Require ip 192.168.1
        </RequireAll>
    </Files>
</Directory>

This will prevent 3rd parties from signing up on your blog and submitting comments.  This may be important to you.

Restart Apache

After inserting these blocks, you should execute Apache’s ‘configtest’ followed by reload:

$ sudo apache2ctl configtest
apache2      | * Checking apache2 configuration ...     [ ok ]
$ sudo apache2ctl reload
apache2      | * Gracefully restarting apache2 ...      [ ok ]

Now test your changes from outside your network:

xmlrpc.php forbidden

Apache’s access log should show a ‘403’ (Forbidden) status:

... "GET /xmlrpc.php HTTP/1.1" 403 ...

And just like that, you’ve made your WordPress blog a lot more secure.

Interestingly, by making just these changes on my own site the attacks immediately dropped off by 90%.  I guess that the better-written bots realized that I’m not a good target anymore and stopped wasting their time, preferring lower-hanging fruit.

Failed to retrieve directory listing

filezilla connection log with "failed to retrieve directory listing" error
Filezilla’s opaque error

I occasionally run a local vsftp daemon on my development machine for testing.  I don’t connect to it directly — it’s used to back up unit tests that need an FTP connection.  No person connects to it, least of all me, and the scripts that do connect are looking at small, single-use directories.

I needed to test a new feature: FTPS, aka FTP with SSL (Not to be confused with SFTP, a very different beast.)  Several of our vendors will be requiring it soon; frankly, I’m surprised they haven’t required it sooner.  But I digress.

To start this phase of the project I needed to make sure that my local vsftp daemon supports FTPS so that I can run tests against it.  So I edit /etc/vsftpd/vsftpd.conf to add some lines to my config, and restart:

rsa_cert_file=/etc/ssl/private/vsftpd.pem
rsa_private_key_file=/etc/ssl/private/vsftpd.pem
ssl_enable=YES

But Filezilla bombs with an opaque error message:

Status: Resolving address of localhost
Status: Connecting to 127.0.0.1:21...
Status: Connection established, waiting for welcome message...
Status: Initializing TLS...
Status: Verifying certificate...
Status: TLS connection established.
Status: Logged in
Status: Retrieving directory listing...
Command: PWD
Response: 257 "/home/dad" is the current directory
Command: TYPE I
Response: 200 Switching to Binary mode.
Command: PASV
Response: 227 Entering Passive Mode (127,0,0,1,249,239).
Command: LIST
Response: 150 Here comes the directory listing.
Error: GnuTLS error -15: An unexpected TLS packet was received.
Error: Disconnected from server: ECONNABORTED - Connection aborted
Error: Failed to retrieve directory listing

I clue in pretty quickly that “GnuTLS error -15: An unexpected TLS packet was received” is actually a red herring, so I drop the SSL from the connection and get a different error:

Response: 150 Here comes the directory listing.
Error: Connection closed by server
Error: Failed to retrieve directory listing

Huh, that’s not particularly helpful, shame on you Filezilla.  I drop down further to a command-line FTP client to get the real error:

$ ftp localhost
Connected to localhost.
220 (vsFTPd 3.0.3)
Name (localhost:dad): 
530 Please login with USER and PASS.
530 Please login with USER and PASS.
SSL not available
331 Please specify the password.
Password:
230 Login successful.
Remote system type is UNIX.
Using binary mode to transfer files.
ftp> ls
200 PORT command successful. Consider using PASV.
150 Here comes the directory listing.
421 Service not available, remote server has closed connection
ftp> quit

Ah.  Now we’re getting somewhere.

A quick perusal turned up a stackexchange answer with the assertion that “the directory causing this behaviour had too many files in it (2,666).”  My own directory is much smaller, about a hundred files.  According to this bug report, however, the real maximum may be as few as 32 files.  It’s not clear to me whether this is a kernel bug, a vsftpd bug, or just a bad interaction between recent kernels and vsftpd.

Happily, there is a work-around: add “seccomp_sandbox=NO” to vsftpd.conf.

Since vsftpd’s documentation is spare, and actual examples are hard to come by, here’s my working config:

listen=YES
local_enable=YES
write_enable=YES
chroot_local_user=YES
allow_writeable_chroot=YES
seccomp_sandbox=NO
ssl_enable=YES
rsa_cert_file=/etc/ssl/private/vsftpd.pem
rsa_private_key_file=/etc/ssl/private/vsftpd.pem

vim, screen, and bracketed paste mode

A little while back an update was introduced, somewhere, that has been driving me nuts.  I didn’t record exactly when it happened or what changed.  I suppose it doesn’t matter now.

The behavior wasn’t easy to pin down at first since it was the confluence of several things: 1) pasting 2) into vim while 3) using a non-xterm terminal like mate-terminal and 4) inside a screen session.

The behavior exhibits in several ways:

  • Pastes appear to be incomplete, or (more correctly) some number of characters at the beginning of the paste go “missing” and actually become commands to vim
  • Pastes are complete but they’re bracketed with \e[200~content\e[201~
    • some people report 0~content1~ instead, but it appears to be the same phenomenon

What’s going on?  It’s a feature called “bracketed paste mode”.  You can google it read up on it, it has some utility.  As far as I can tell it’s related to readline.  But more importantly, there is a fix.

Add this to your ~/.vimrc:

" fix bracketed paste mode
if &term =~ "screen"
  let &t_BE = "\e[?2004h"
  let &t_BD = "\e[?2004l"
  exec "set t_PS=\e[200~"
  exec "set t_PE=\e[201~"
endif

source: https://vimhelp.appspot.com/term.txt.html#xterm-bracketed-paste

WordPress Error: cURL error 6: Couldn’t resolve host ‘dashboard.wordpress.com’

Background:

I maintain a WordPress blog that uses Jetpack’s Stats package.

Issue:

We started getting this error message when opening the ‘Stats’ page:

We were unable to get your stats just now. Please reload this page to try again. If this error persists, please contact support. In your report please include the information below.

User Agent: 'Mozilla/5.0 (X11; Linux x86_64; rv:54.0) Gecko/20100101 Firefox/54.0'
Page URL: 'https://blog.server.tld/wp-admin/admin.php?page=stats&noheader'
API URL: 'https://dashboard.wordpress.com/wp-admin/index.php?noheader=true&proxy&page=stats&blog=XXX&charset=UTF-8&color=fresh&ssl=1&j=1:5.0&main_chart_only'
http_request_failed: 'cURL error 6: Couldn't resolve host 'dashboard.wordpress.com''

The entire Stats block in the Dashboard was empty, and the little graph that shows up in the Admin bar on the site was empty as well.

Other errors noticed:

RSS Error: WP HTTP Error: cURL error 6: Couldn't resolve host 'wordpress.org'
RSS Error: WP HTTP Error: cURL error 6: Couldn't resolve host 'planet.wordpress.org'

These errors were in the WordPress Events and News section, which was also otherwise empty.

This whole thing was ridiculous on it’s face, as the hosts could all be pinged successfully from said server.

I checked with Jetpack’s support, per the instructions above, and got a non-response of “check with your host.”  Well, this isn’t being run on a hosting service so you’re telling me to ask myself.  Thanks for the help anyway.

Resolution:

The machine in question had just upgraded PHP, but Apache had not been restarted yet. The curl errors don’t make much sense, but since when does anything in PHP make sense?

It was kind of a “duh!” moment when I realized that could be the problem.  Restarting Apache seems to have solved it.

NiFi HTTP Service

I’m attempting to set up an HTTP server in NiFi to accept uploads and process them on-demand.  This gets tricky because I want to submit the files using an existing web application that will not be served from NiFi, which leads to trouble with XSS (Cross-Site Scripting) and setting up CORS (Cross Origin Resource Sharing [1]).

The trouble starts with just trying to PUT or POST a simple file.  The error in Firefox reads:

Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource (Reason: CORS header 'Access-Control-Allow-Origin' missing).

You can serve up the Javascript that actually performs the upload from NiFi and side-step XSS, but you may still run into trouble with CORS.  You’ll have trouble even if NiFi and your other web server live on the same host (using different ports, of course), as they’re considered different hosts for the purposes of XSS prevention.

handlehttpresponse screen shot
HandleHttpResponse processor config

To make this work, you’ll need to enable specific headers in the HandleHttpResponse processor.  Neither the need to set some headers, nor the headers that need to be set, are documented by NiFi at this time (so far as I can tell).

  1. Open the configuration of the HandleHttpResponse processor
  2. Add the following headers and values as properties and values, but see below for notes regarding the values
    Access-Control-Allow-Origin: *
    
    Access-Control-Allow-Methods: PUT, POST, GET, OPTIONS
    
    Access-Control-Allow-Headers: Accept, Accept-Encoding, Accept-Language, Connection, Content-Length, Content-Type, DNT, Host, Referer, User-Agent, Origin, X-Forwarded-For

You may want to review the value for Access-Control-Allow-Origin, as the wildcard may allow access to unexpected hosts.  If your server is public-facing (why would you do that with NiFi?) then you certainly don’t want a wildcard here.  The wildcard makes configuration much simpler if NiFi is strictly interior-facing, though.

The specific values to set for Access-Control-Allow-Methods depend on what you’re doing.  You’ll probably need OPTIONS for most cases.  I’m serving up static files so I need GET, and I’m receiving uploads that may or may not be chunked, so I need POST and PUT.

The actual headers needed for Access-Control-Allow-Headers is a bit variable.  A wildcard is not an acceptable value here, so you’ll have to list every header you need separately — and there are a bunch of possible headers.  See [3] for an explanation and a fairly comprehensive list of possible headers.  Our list contains a small subset that covers our basic test cases; your mileage may vary.

You may also want to set up a RouteOnAttribute processor to ignore OPTIONS requests (${http.method:equals('OPTIONS')}), otherwise you might see a bunch of zero-byte files in your flow.

References:

[1] https://developer.mozilla.org/en-US/docs/Web/HTTP/Access_control_CORS

[2] http://stackoverflow.com/questions/24371734/firefox-cors-request-giving-cross-origin-request-blocked-despite-headers

[3] http://stackoverflow.com/questions/13146892/cors-access-control-allow-headers-wildcard-being-ignored