-3

How to split PySpark dataframe column with separator as dot (.). To me it doesn't seem to work when I use split used on a dot.

E.g. column with value abcd.efgh, should be split into two columns with values abcd and efgh.

ZygD
  • 22,092
  • 39
  • 79
  • 102
Indra
  • 1
  • 1
    Does this answer your question? [Split Spark Dataframe string column into multiple columns](https://stackoverflow.com/questions/39235704/split-spark-dataframe-string-column-into-multiple-columns) – mazaneicha Oct 31 '21 at 00:48
  • Please provide enough code so others can better understand or reproduce the problem. – Community Oct 31 '21 at 11:28

1 Answers1

0

This is the df based on your example.

from pyspark.sql import SparkSession, functions as F
spark = SparkSession.builder.getOrCreate()

df = spark.createDataFrame([('abcd.efgh',)], ['c1'])
df.show()
#+---------+
#|       c1|
#+---------+
#|abcd.efgh|
#+---------+

For splitting one can use split like this:

splitCol = F.split('c1', '[.]', 2)
df = df.select(
    splitCol[0].alias('c1_0'),
    splitCol[1].alias('c1_1'),
)
df.show()
#+----+----+
#|c1_0|c1_1|
#+----+----+
#|abcd|efgh|
#+----+----+
ZygD
  • 22,092
  • 39
  • 79
  • 102