2

I have a pandas dataframe. This dataframe consists of a single column. I want to parse this column according to the '&' sign and add the data to the right of the "=" sign as a new column. Examples are below.

The dataframe I have;

                               tags
0                 letter1=A&letter2=B&letter3=C
1                 letter1=D&letter2=E&letter3=F
2                 letter1=G&letter2=H&letter3=I
3                 letter1=J&letter2=K&letter3=L
4                 letter1=M&letter2=N&letter3=O
5                 letter1=P&letter2=R&letter3=S
.                  .
.                  .

dataframe that I want to convert;

     letter1    letter2   letter3
0      A           B         C
1      D           E         F
2      G           H         I
3      J           K         L
4      M           N         O
.
.

I tried to do something with this code snippet.

columnname= df["tags"][0].split("&")[i].split("=")[0]
value =df["tags"][0].split("&")[i].split("=")[1]

But I'm not sure how I can do it for the whole dataframe. I am looking for a faster and stable way.

Thanks in advance,

tarik
  • 49
  • 7

4 Answers4

3

do this..

import pandas as pd

tags = [
    "letter1=A&letter2=B&letter3=C",
    "letter1=D&letter2=E&letter3=F",
    "letter1=G&letter2=H&letter3=I",
    "letter1=J&letter2=K&letter3=L",
    "letter1=M&letter2=N&letter3=O",
    "letter1=P&letter2=R&letter3=S"
]
df = pd.DataFrame({"tags": tags})

df["letter1"] = df["tags"].apply(lambda x: x.split("&")[0].split("=")[-1])
df["letter2"] = df["tags"].apply(lambda x: x.split("&")[1].split("=")[-1])
df["letter3"] = df["tags"].apply(lambda x: x.split("&")[2].split("=")[-1])
df = df[["letter1", "letter2", "letter3"]]
df

enter image description here

1

Split into separate columns, via str.split, using & :

step1 = df.tags.str.split("&", expand=True)

Get the new columns from the first row of step1:

new_columns = step1.loc[0, :].str[:-2].array

Get rid of the letter1= prefix in each column, set the new_columns as the header:

step1.set_axis(new_columns, axis='columns').transform(lambda col: col.str[-1])

  letter1 letter2 letter3
0       A       B       C
1       D       E       F
2       G       H       I
3       J       K       L
4       M       N       O
5       P       R       S
sammywemmy
  • 27,093
  • 4
  • 17
  • 31
0
d=list(df["tags"])
r={}
for i in d:                                            
    for ele in i.split("&"):                           
        if ele.split("=")[0] in r.keys():              
            r[ele.split("=")[0]].append(ele.split("=")[1])
        else:
            r[ele.split("=")[0]]=[]
            r[ele.split("=")[0]].append(ele.split("=")[1])

df = pd.DataFrame({i:pd.Series(r[i]) for i in r})
print (df)
  • 2
    I don't think that is what the PO wants... He is probably searching for the 'pandas' way to do it (which is most likely not looping through every single value of the dataframe) –  Jun 23 '21 at 14:27
  • Please add a description to your answer/code and use consistent coding style. – buhtz Jun 23 '21 at 14:31
0

Using regex

import pandas as pd
import re
tags = [
    "letter1=A&letter2=B&letter3=C",
    "letter1=D&letter2=E&letter3=F",
    "letter1=G&letter2=H&letter3=I",
    "letter1=J&letter2=K&letter3=L",
    "letter1=M&letter2=N&letter3=O",
    "letter1=P&letter2=R&letter3=S"
]

df = pd.DataFrame({"tags": tags})
pattern=re.compile("\=(\w+)") # Look for pattern
df['letter1'], df['letter3'],df["letter2"] = zip(*df["tags"].apply(lambda x: pattern.findall(x)))

Output

                            tags letter1 letter2 letter3
0  letter1=A&letter2=B&letter3=C       A       B       C
1  letter1=D&letter2=E&letter3=F       D       E       F
2  letter1=G&letter2=H&letter3=I       G       H       I
3  letter1=J&letter2=K&letter3=L       J       K       L
4  letter1=M&letter2=N&letter3=O       M       N       O
5  letter1=P&letter2=R&letter3=S       P       R       S
mpx
  • 3,081
  • 2
  • 26
  • 56