I'm using spark 2.3
I have a DataFrame like this (in other situation _c0 may contains 20 inner fields):
_c0 | _c1
-----------------------------
1.1 1.2 4.55 | a
4.44 3.1 9.99 | b
1.2 99.88 10.1 | x
I want to split _c0, and create new DataFrame like this:
col1 |col2 |col3 |col4
-----------------------------
1.1 |1.2 |4.55 | a
4.44 |3.1 |9.99 | b
1.2 |99.88 |10.1 | x
I know how to solve this using getItem():
df = originalDf.rdd.map(lambda x: (re.split(" +",x[0]),x[1])).toDF()
# now, df[0] is a array of string , and df[1] is string
df = df.select(df[0].getItem(0), df[0].getItem(1), df[0].getItem(2), df[1])
But I hoped to find a different way to solve this, because _c0 may contain more than 3 inner column.
Is there a way to use flatMap to generate the df?
Is there a way to insert df[1] as inner field of df[0]?
Is there a way to use df[0].getItem(), so it returns all inner fields?
Is there a simpler way to generate the data-frame?
Any help will be appreciated
Thanks