How to split one column into 3 columns in python or PySpark

Question

I have:

Customerkeycode
B01:B14:110083

I want:

PlanningCustomerSuperGroupCode, DPGCode, APGCode
BO1,                            B14,     110083

Does this answer your question? [Split Spark dataframe string column into multiple columns](https://stackoverflow.com/questions/39235704/split-spark-dataframe-string-column-into-multiple-columns) — ZygD, Oct 20 '22 at 08:50

Ashish Jain · Answer 1 · 2022-10-20T06:05:03.090

0

import pandas as pd

df = pd.DataFrame(
    {
        "Customerkeycode": [
            "B01:B14:110083",
            "B02:B15:110084"
        ]
    }
)

df['PlanningCustomerSuperGroupCode'] = df['Customerkeycode'].apply(lambda x: x.split(":")[0])
df['DPGCode'] = df['Customerkeycode'].apply(lambda x: x.split(":")[1])
df['APGCode'] = df['Customerkeycode'].apply(lambda x: x.split(":")[2])

df_rep = df.drop("Customerkeycode", axis = 1)

print(df_rep)

   PlanningCustomerSuperGroupCode DPGCode APGCode
0                            B01     B14  110083
1                            B02     B15  110084

edited Oct 20 '22 at 06:05

answered Oct 20 '22 at 05:58

Ashish Jain

447
1
6
20

hi ashish jain i dont required Customerkeycode becase it is source column – KIRAN KUMAR Oct 20 '22 at 05:59
Please check the code now. – Ashish Jain Oct 20 '22 at 06:08

score 0 · Answer 2 · answered Oct 20 '22 at 06:22

0

In pyspark, first split the string into an array, and then use the getItem method to split it into multiple columns.

import pyspark.sql.functions as F

...
cols = ['PlanningCustomerSuperGroupCode', 'DPGCode', 'APGCode']
arr_cols = [F.split('Customerkeycode', ':').getItem(i).alias(cols[i]) for i in range(3)]
df = df.select(*arr_cols)
df.show(truncate=False)

answered Oct 20 '22 at 06:22

过过招

3,722
2
4
11

Instead of `.getItem(i)` you could probably do `[i]` – ZygD Oct 20 '22 at 07:48

score 0 · Answer 3 · answered Oct 20 '22 at 08:10

split into 3 columns by the ':' with column names ['PlanningCustomerSuperGroupCode', 'DPGCode', 'APGCode']

import pyspark.sql.functions as F

df.withColumn('PlanningCustomerSuperGroupCode', F.split(F.col('Customerkeycode'), ':')[0]) \
    .withColumn('DPGCode', F.split(F.col('Customerkeycode'), ':')[1]) \
    .withColumn('APGCode', F.split(F.col('Customerkeycode'), ':')[2]) \
    .drop('Customerkeycode') \
    .show()

How to split one column into 3 columns in python or PySpark

3 Answers3