I have a CSV document I'm loading into a SQLContext that contains latitude and longitude columns.
val sqlContext = new org.apache.spark.sql.SQLContext(sc);
val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "false").option("delimiter","\t").schema(customSchema).load(inputFile);
CSV example
metro_code, resolved_lat, resolved_lon
602, 40.7201, -73.2001
I'm trying to figure out the best way to add a new column and calculate the GeoHex for each row. Hashing the lat and long is easy with the geohex package. I think I need to run the parallelize method or I've seen some examples passing a function to withColumn.