I was greeted by a few cryptic things in NiFi this morning during my morning check-in.
- A PutSQL processor was reporting an error:
"ERROR: PutSQL[id=$UUID>]failed to process due to java.lang.IndexOutOfBoundsException: Index: 1, Size: 1; rolling back session: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1"
- There were no recent errors counted in the LogAttribute counter we set up to record errors;
- The Tasks/Time count in the PutSQL processor was though the roof, despite the errors and lack of successes.
Needless to say, the processor was all bound up and a number of tasks were queued. Not a good start to my day.
I checked the data provenance and didn’t see anything remarkable about the backed-up data. The error message suggests (to me) that the first statement parameter is at fault, and that parameter happened to be a date (which has been problematic for me in NiFi with a MySQL backend). Neither that value, nor the rest of the values, were remarkable or illegal for the fields they’re going into.
It wasn’t until I spent some time looking over the source data that I saw the problem: there is a duplicate key in the data. This error is NiFi’s way of complaining about it.
In our case the underlying table doesn’t have good keys, or a good structure in general, and I’m planning to replace it soon anyway, but updating the primary keys to allow the duplicate data (because it IS valid data, despite the table design) has solved the issue.
Hello there,
What did you eventually do with the above error. I am having the same error and have been scratching my head to figure out what’s wrong..It is also random at times….
Any insights?
Many thanks!
In my case it was a duplicate key in the database, but it’s possible that other database errors would give the same error. I had the opportunity to change the database keys rather than changing the data itself, but my situation was special.
My next step, if I hadn’t seen the problem yet, would have been writing a test harness to loop over the original raw data and try inserting it into a copy of the production database to see if any errors crop up.
The ConvertJSONToSQL and PutSQL processors aren’t very helpful with their errors messages or the logging they provide. PutSQL in particular will show you the parameterized statement it ran, but not the values for the parameters which makes it difficult to figure out which value(s) might be causing a problem on long runs.
I was facing the same issue here. I am pulling data from RSS feeds and I guess the EvaluateXquery is producing duplicate flowfiles and the mergeContent is merging them. Basically, the attributes are being duplicated like I am getting multiple titles, descriptions and pubDates of the same item. I tried having the title as my primary key in the MySQL db to avoid getting in the duplicate data, but the error you mentioned above popped up. If i change the primary key, i am able to get the duplicate data in the table. I hope there was an INSERT IGNORE or a REPLACE DDL command support in the PutSQL processor and it woudl solve this problem. Because, if i store the merged file in my disk using PutFile, I have the option of REPLACE which does the trick. Do you know if there is anything I can do to not create these duplicate data inside the dataflow itself or do I do something different in the DB?
It sounds like your database structure isn’t a good match to the incoming data.
If you can’t restructure the database or redefine the data, you could create a custom processor (or more likely, a script that executes with the ExecuteStreamCommand processor) to check the data against the database and reroute it with the routing processor as needed.