This question is similar to the one already asked in Pandas here. I am using Google Cloud DataProc clusters for executing a function and hence can't convert them into pandas
.
I would like to convert the following:
+----+----------------------------------+-----+---------+------+--------------------+-------------+
| key| value|topic|partition|offset| timestamp|timestampType|
+----+----------------------------------+-----+---------+------+--------------------+-------------+
|null|["sepal_length","sepal_width",...]| iris| 0| 289|2021-04-11 22:32:...| 0|
|null|["5.0","3.5","1.3","0.3","setosa"]| iris| 0| 290|2021-04-11 22:32:...| 0|
|null|["4.5","2.3","1.3","0.3","setosa"]| iris| 0| 291|2021-04-11 22:32:...| 0|
|null|["4.4","3.2","1.3","0.2","setosa"]| iris| 0| 292|2021-04-11 22:32:...| 0|
|null|["5.0","3.5","1.6","0.6","setosa"]| iris| 0| 293|2021-04-11 22:32:...| 0|
|null|["5.1","3.8","1.9","0.4","setosa"]| iris| 0| 294|2021-04-11 22:32:...| 0|
|null|["4.8","3.0","1.4","0.3","setosa"]| iris| 0| 295|2021-04-11 22:32:...| 0|
+----+----------------------------------+-----+---------+------+--------------------+-------------+
Into something like this:
+--------------+-------------+--------------+-------------+-------+
| sepal_length | sepal_width | petal_length | petal_width | class |
+--------------+-------------+--------------+-------------+-------+
| 5.0 | 3.5 | 1.3 | 0.3 | setosa|
| 4.5 | 2.3 | 1.3 | 0.3 | setosa|
| 4.4 | 3.2 | 1.3 | 0.2 | setosa|
| 5.0 | 3.5 | 1.6 | 0.6 | setosa|
| 5.1 | 3.8 | 1.9 | 0.4 | setosa|
| 4.8 | 3.0 | 1.4 | 0.3 | setosa|
+--------------+-------------+--------------+-------------+-------+
How do I go about doing this? Any help would be greatly appreciated!