How to split a dataframe string column into multiple columns?

Question

I have a pandas dataframe. This dataframe consists of a single column. I want to parse this column according to the '&' sign and add the data to the right of the "=" sign as a new column. Examples are below.

The dataframe I have;

                               tags
0                 letter1=A&letter2=B&letter3=C
1                 letter1=D&letter2=E&letter3=F
2                 letter1=G&letter2=H&letter3=I
3                 letter1=J&letter2=K&letter3=L
4                 letter1=M&letter2=N&letter3=O
5                 letter1=P&letter2=R&letter3=S
.                  .
.                  .

dataframe that I want to convert;

     letter1    letter2   letter3
0      A           B         C
1      D           E         F
2      G           H         I
3      J           K         L
4      M           N         O
.
.

I tried to do something with this code snippet.

columnname= df["tags"][0].split("&")[i].split("=")[0]
value =df["tags"][0].split("&")[i].split("=")[1]

But I'm not sure how I can do it for the whole dataframe. I am looking for a faster and stable way.

Thanks in advance,

See here: https://stackoverflow.com/questions/14745022/how-to-split-a-dataframe-string-column-into-two-columns/21296915#21296915 — Raspberry PyCharm, Jun 23 '21 at 13:50

score 3 · Accepted Answer · answered Jun 23 '21 at 14:36

do this..

import pandas as pd

tags = [
    "letter1=A&letter2=B&letter3=C",
    "letter1=D&letter2=E&letter3=F",
    "letter1=G&letter2=H&letter3=I",
    "letter1=J&letter2=K&letter3=L",
    "letter1=M&letter2=N&letter3=O",
    "letter1=P&letter2=R&letter3=S"
]
df = pd.DataFrame({"tags": tags})

df["letter1"] = df["tags"].apply(lambda x: x.split("&")[0].split("=")[-1])
df["letter2"] = df["tags"].apply(lambda x: x.split("&")[1].split("=")[-1])
df["letter3"] = df["tags"].apply(lambda x: x.split("&")[2].split("=")[-1])
df = df[["letter1", "letter2", "letter3"]]
df

score 1 · Answer 2 · answered Jun 23 '21 at 20:43

Split into separate columns, via str.split, using & :

step1 = df.tags.str.split("&", expand=True)

Get the new columns from the first row of step1:

new_columns = step1.loc[0, :].str[:-2].array

Get rid of the letter1= prefix in each column, set the new_columns as the header:

step1.set_axis(new_columns, axis='columns').transform(lambda col: col.str[-1])

  letter1 letter2 letter3
0       A       B       C
1       D       E       F
2       G       H       I
3       J       K       L
4       M       N       O
5       P       R       S

Tanmay Shrivastava · Answer 3 · 2021-06-23T14:23:36.420

0

d=list(df["tags"])
r={}
for i in d:                                            
    for ele in i.split("&"):                           
        if ele.split("=")[0] in r.keys():              
            r[ele.split("=")[0]].append(ele.split("=")[1])
        else:
            r[ele.split("=")[0]]=[]
            r[ele.split("=")[0]].append(ele.split("=")[1])

df = pd.DataFrame({i:pd.Series(r[i]) for i in r})
print (df)

edited Jun 23 '21 at 14:23

answered Jun 23 '21 at 14:18

Tanmay Shrivastava

553
4
9

2

I don't think that is what the PO wants... He is probably searching for the 'pandas' way to do it (which is most likely not looping through every single value of the dataframe) – Jun 23 '21 at 14:27
Please add a description to your answer/code and use consistent coding style. – buhtz Jun 23 '21 at 14:31

score 0 · Answer 4 · answered Jun 23 '21 at 16:22

Using regex

import pandas as pd
import re
tags = [
    "letter1=A&letter2=B&letter3=C",
    "letter1=D&letter2=E&letter3=F",
    "letter1=G&letter2=H&letter3=I",
    "letter1=J&letter2=K&letter3=L",
    "letter1=M&letter2=N&letter3=O",
    "letter1=P&letter2=R&letter3=S"
]

df = pd.DataFrame({"tags": tags})
pattern=re.compile("\=(\w+)") # Look for pattern
df['letter1'], df['letter3'],df["letter2"] = zip(*df["tags"].apply(lambda x: pattern.findall(x)))

Output

                            tags letter1 letter2 letter3
0  letter1=A&letter2=B&letter3=C       A       B       C
1  letter1=D&letter2=E&letter3=F       D       E       F
2  letter1=G&letter2=H&letter3=I       G       H       I
3  letter1=J&letter2=K&letter3=L       J       K       L
4  letter1=M&letter2=N&letter3=O       M       N       O
5  letter1=P&letter2=R&letter3=S       P       R       S

How to split a dataframe string column into multiple columns?

4 Answers4