1

An extension to following question : Split (explode) pandas dataframe string entry to separate rows

df :

    STATE CITY  ALT_NAMES
0   S1    C1    A1@A2
1   S2    C2    A3@A4@A5 

How can I achieve following result:

out_df :

    STATE  CITY  CITY_VAR
0   S1     C1    A1
1   S1     C1    A2
2   S2     C2    A3
3   S2     C2    A4
4   S2     C2    A5

sample data :

    STATE CITY            ALT_NAMES
    FL    FT. MYERS       FORT MYERS@FT MYERS
    FL    NORTH FT MYERS  N.FT.MYERS@N. FORT MYERS@NORTH FORT MYERS
Avinash Clinton
  • 543
  • 1
  • 8
  • 19

2 Answers2

2

For me working:

df = explode(df.assign(ALT_NAMES=df.ALT_NAMES.str.split('@')), 'ALT_NAMES')
print (df)
  STATE CITY ALT_NAMES
0    S1   C1        A1
1    S1   C1        A2
2    S2   C2        A3
3    S2   C2        A4
4    S2   C2        A5

Another pure pandas solution:

df = (df.join(df.pop('ALT_NAMES')
                .str.split('@', expand=True)
                .stack()
                .reset_index(level=1, drop=True)
                .rename('ALT_NAMES'))
        .reset_index(drop=True ))
print (df)
  STATE            CITY         ALT_NAMES
0    FL       FT. MYERS        FORT MYERS
1    FL       FT. MYERS          FT MYERS
2    FL  NORTH FT MYERS        N.FT.MYERS
3    FL  NORTH FT MYERS     N. FORT MYERS
4    FL  NORTH FT MYERS  NORTH FORT MYERS
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • @AvinashClinton - not yet tested, maybe some change is necessary. Need some time. – jezrael Jul 16 '18 at 06:18
  • @AvinashClinton - Added another solution , please check it. – jezrael Jul 16 '18 at 07:36
  • 1
    Thanks a lot!!! Second solution works great for me. With first solution and with the other solutions provided in previous link, I was getting following error : Cannot cast array data from dtype('int64') to dtype('int32') according to the rule 'safe'. Thanks once again!!! – Avinash Clinton Jul 16 '18 at 07:57
1

This is an optimised version for your data.

from itertools import chain
v = df.pop('ALT_NAMES').str.split('@')  

df = pd.DataFrame(
    df.values.repeat(v.str.len(), axis=0), columns=df.columns)
df['ALT_NAMES'] = list(chain.from_iterable(v))

df
  STATE CITY ALT_NAMES
0    S1   C1        A1
1    S1   C1        A2
2    S2   C2        A3
3    S2   C2        A4
4    S2   C2        A5
cs95
  • 379,657
  • 97
  • 704
  • 746