Split PySpark dataframe column at the dot

Question

I have tried the below in Pandas and it works. I wondered how I might do it in PySpark?

The input is

news.bbc.co.uk

it should split it at the '.' and hence index should equal:

[['news', 'bbc', 'co', 'uk'], ['next', 'domain', 'name']]

index = df2.domain.str.split('.').tolist()

Does anyone know how I'd do this in spark rather than pandas?

Thanks

Possible duplicate of [Split Contents of String column in PySpark Dataframe](https://stackoverflow.com/questions/41283478/split-contents-of-string-column-in-pyspark-dataframe) and [Splitting a column in pyspark](https://stackoverflow.com/questions/48790246/splitting-a-column-in-pyspark) and [Pyspark Split Columns](https://stackoverflow.com/questions/46835882/pyspark-split-columns?rq=1) — pault, Oct 24 '18 at 14:20

score 16 · Answer 1 · answered Jul 03 '20 at 09:32

16

Using '.' works in a different way. Using it with escape character '\' actually worked.

df = df.withColumn('col_name', F.split(F.col('col_name'), '\.'))

answered Jul 03 '20 at 09:32

1

still doesnt work form me, it's giving result like [, , , , , ] when I tried to split the string = 'a.b.c' – dragonachu Feb 16 '21 at 05:43
Which version of python did you use? @AswinKs – Prabhakaran Vijayan Mar 18 '21 at 16:29
1

With Python 3.5.2 (databricks runtime 5.5 LTS), works with `df.withColumn('col_name', F.split(F.col('col_name'), '[\.]'))`. I guess it is some kind of bug, or me not understanding correctly regular expressions (highly possible.) – Giacomo Sachs Mar 22 '21 at 15:15
1

'[\.]' instead if '\.' worked for me @AswinKs – Akshat Chaturvedi Apr 21 '21 at 07:28

mayank agrawal · Accepted Answer · 2018-10-24T12:50:27.523

1

import pyspark.sql.functions as F

df = df.withColumn('col_name', F.split(F.col('col_name'), '.'))

edited Oct 24 '18 at 12:50

answered Oct 24 '18 at 10:12

mayank agrawal

score 0 · Answer 3 · answered Apr 21 '21 at 07:33

0

df.select(split("col_name", '[\.]'))

or

df.selectExpr("split(col_name, '[\.]')")

answered Apr 21 '21 at 07:33

Akshat Chaturvedi

3 Answers3