I'm performing an example of Spark Structure streaming on spark 3.0.0, for this, I'm using twitter data. I've pushed twitter data in Kafka, single records it looks like this
2020-07-21 10:48:19|1265200268284588034|RT @narendramodi: Had an extensive interaction with CEO of @IBM, Mr. @ArvindKrishna. We discussed several subjects relating to technology,…|Hyderabad, India
Here every field is separated by '|' and the fields are
Date time
User ID
Tweet Text
Location
Now reading this message in Spark I got data frame like this
key | value
-----+-------------------------
| 2020-07-21 10:48:19|1265200268284588034|RT @narendramodi: Had an extensive interaction with CEO of @IBM, Mr. @ArvindKrishna. We discussed several subjects relating to technology,…|Hyderabad, India
And according to this answer, I've add following block of code in my App.
split_col = pyspark.sql.functions.split(df['value'], '|')
df = df.withColumn("Tweet Time", split_col.getItem(0))
df = df.withColumn("User ID", split_col.getItem(1))
df = df.withColumn("Tweet Text", split_col.getItem(2))
df = df.withColumn("Location", split_col.getItem(3))
df = df.drop("key")
but it's giving me output like this,
A | B | C | D | E |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+---------+--------+-----+
2020-07-21 10:48:19|1265200268284588034|RT @narendramodi: Had an extensive interaction with CEO of @IBM, Mr. @ArvindKrishna. We discussed several subjects relating to technology,…|Hyderabad, India|2 | 0 | 2 | 0 |
but I want output like this
Tweet Time | User ID | Tweet text | Location |
-----------------------+-------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+-------------------+
2020-07-21 10:48:19 | 1265200268284588034 | RT @narendramodi: Had an extensive interaction with CEO of @IBM, Mr. @ArvindKrishna. We discussed several subjects relating to technology,… | Hyderabad, India |