3

Is it possible to handle null (column does not exist in the CSV) using convertRecord or ant other processor?

I have several CSv files coming in daily. But the problem the fields are not same in each CSv, Is there a way I can send 0 value into the database for all those fields which doesn't exist in the CSV. If yes please let me no how? Thank you.

Ex: CSV1 has following field: Age,Name,Address,Value1,Value2,Value3

(Note:Value3 is not present in all CSV everytime, sometime it is present and sometime the column does not exist)

CSV2 will have following: Age,Name,Address,Value1,Value2

And in database The table skeleton is as follows: Age,Name,Address,Value1,Value2,Value3

I don't want to have nulls followed by some value in the database. Is there a way my flow in NiFi can handle "null" and replace it with 0 before inserting into the database in an automated way (everytime if a specific field given in schema is not present in the CSV the flow should assign 0 to that field and ingest everything else in the database)

Please help me achieve this. Thank you!

Shrads
  • 883
  • 19
  • 39

1 Answers1

2

For this case you have to define avro schema with default values in it.

Sample avro schema:

{
"type": "record",
"name": "SQLSchema",
"fields" : [
{"name": "Age", "type": ["null","string"]},
{"name": "Name", "type": ["null","string"]},
{"name": "Address", "type": ["null","int"]},
{"name": "Value1", "type": ["null","int"]},
{"name": "Value2", "type": ["null","int"]},
{"name": "value3", "type": ["null","int"],"default": 0}
]
}

For value3 field we have defined the type as null (or) int and defined default value as 0.

If there is no data for value3 field then it will replace the value as 0.

notNull
  • 30,258
  • 4
  • 35
  • 50
  • Is there a way we can avoid entering that field into the database in order to store space. – Shrads Nov 06 '18 at 01:24
  • @shrads,i'm not sure what do u mean by entering the field into database, As per the question my understanding is **you are thinking to replace null with some other default value**.If you don't want to occupy a space then use **null value** (https://stackoverflow.com/questions/3731172/how-much-size-null-value-takes-in-sql-server) but it depends on `data type` of the column also. – notNull Nov 06 '18 at 02:39