I have a json string and a different string I'd like to create a dataframe of.
val body = """{
| "time": "2020-07-01T17:17:15.0495314Z",
| "ver": "4.0",
| "name": "samplename",
| "iKey": "o:something",
| "random": {
| "stuff": {
| "eventFlags": 258,
| "num5": "DHM",
| "num2": "something",
| "flags": 415236612,
| "num1": "4004825",
| "seq": 44
| },
| "banana": {
| "id": "someid",
| "ver": "someversion",
| "asId": 123
| },
| "something": {
| "example": "somethinghere"
| },
| "apple": {
| "time": "2020-07-01T17:17:37.874Z",
| "flag": "something",
| "userAgent": "someUserAgent",
| "auth": 12,
| "quality": 0
| },
| "loc": {
| "country": "US"
| }
| },
| "EventEnqueuedUtcTime": "2020-07-01T17:17:59.804Z"
|}
|""".stripMargin
val offset = "10"
I tried
val data = Seq(body, offset)
val columns = Seq("body","offset")
import sparkSession.sqlContext.implicits._
val df = data.toDF(columns:_*)
As well as
val data = Seq(body, offset)
val rdd = sparkSession.sparkContext.parallelize((data))
val dfFromRdd = rdd.toDF("body", "offset")
dfFromRdd.show(20, false)
but for both I get this an error: "value toDF is not a member of org.apache.spark.RDD[String]"
Is there a different way I can create a dataframe that will have one column with my json body data, and another column with my offset string value?
Edit: I've also tried the following:
val offset = "1000"
val data = Seq(body, offset)
val rdd = sparkSession.sparkContext.parallelize((data))
val dfFromRdd = rdd.toDF("body", "offset")
dfFromRdd.show(20, false)
and get an error of column mismatch : "The number of columns doesn't match. Old column names (1): value New column names (2): body, offset"
I dont understand why my data
has the column name of "value"