I'm completely new to Spark and Scala, trying to work with a data set in Databricks.
I loaded a csv file as data frame. Now, I want see the percentage of null values in each column. Later I want to replace the null values or drop the column, depending on the percentage of null values.
I think R has some packages capable of analyzing null values (e.g. MICE package), but in Spark & Scala I can't find anything similar.
I've been trying to filter the data frame by "null" values, but this doesn't seem to work. Below code just returns the cabins that are not null. Swapping the == by != doesn't help.
train.show()
val train = sqlContext.sql("SELECT * FROM titanic_test")
val filtered = train.filter("Cabin==null")
filtered.show()
Does anyone know a package that could help or know how to fix my above problem, so I can filter manually?