I either don't know what I'm looking for or the documentation is lacking. The latter seems to be the case, given this:
"options - options to control how the struct column is converted into a json string. accepts the same options and the json data source."
Great! So, what are my options?
I'm doing something like this:
Dataset<Row> formattedReader = reader
.withColumn("id", lit(id))
.withColumn("timestamp", lit(timestamp))
.withColumn("data", to_json(struct("record_count")));
...and I get this result:
{
"id": "ABC123",
"timestamp": "2018-11-16 20:40:26.108",
"data": "{\"record_count\": 989}"
}
I'd like this (remove back-slashes and quotes from "data"):
{
"id": "ABC123",
"timestamp": "2018-11-16 20:40:26.108",
"data": {"record_count": 989}
}
Is this one of the options by chance? Is there a better guide out there for Spark? The most frustrating part about Spark hasn't been getting it to do what I want, it's been a lack of good information on what it can do.