How to split PySpark dataframe column with separator as dot (.
). To me it doesn't seem to work when I use split
used on a dot.
E.g. column with value abcd.efgh
, should be split into two columns with values abcd
and efgh
.
How to split PySpark dataframe column with separator as dot (.
). To me it doesn't seem to work when I use split
used on a dot.
E.g. column with value abcd.efgh
, should be split into two columns with values abcd
and efgh
.
This is the df
based on your example.
from pyspark.sql import SparkSession, functions as F
spark = SparkSession.builder.getOrCreate()
df = spark.createDataFrame([('abcd.efgh',)], ['c1'])
df.show()
#+---------+
#| c1|
#+---------+
#|abcd.efgh|
#+---------+
For splitting one can use split
like this:
splitCol = F.split('c1', '[.]', 2)
df = df.select(
splitCol[0].alias('c1_0'),
splitCol[1].alias('c1_1'),
)
df.show()
#+----+----+
#|c1_0|c1_1|
#+----+----+
#|abcd|efgh|
#+----+----+