0

I have a custom spark datasource, with data provided from a java library. Some fields are ArrayType, and occasionally are NULL. I've tried setting the array field to None, null, lit(null), Option(null), and probably several other variants, and in every case catalyst throws NPE when attempting to resolve the array field.

As near as I can tell, Catalyst doesn't do a check for null in the toCatalystImpl() method of ArrayConverter (from CatalystTypeConverters.scala). Is this a Catalyst bug, or is there some other null encoding for dataframe ArrayType fields ?

1 Answers1

0

My bad, I found the issue (hacked the wrong piece of code). The final answer seems to be "Option(null)". I tried None and lit(null: ), but both threw exceptions.