0

I have a dataset with header level rows

header_col1    header_col2    item_col1
A              111            apple | banana
B              222            apple
C              333            pear | grapes
D              444            grapes

I would like to transform this dataset so that each row in the table represents an item instead a header. That is:

header_col1    header_col2    item_col1
A              111            apple
A              111            banana
B              222            apple
C              333            pear
C              333            grapes
D              444            grapes

Is there any simple way to do this with pandas? All I can think of are

yxteh
  • 126
  • 1
  • 8
  • It looks like this might be the answer: https://stackoverflow.com/questions/12680754/split-explode-pandas-dataframe-string-entry-to-separate-rows – Nick W May 19 '21 at 09:30

2 Answers2

0

You can use .str.split() to split the strings into lists and then use .explode() to expand the lists into separate rows (one row per string). Finally, strip out extra white spaces around the strings during split with .str.strip(), as follows:

df['item_col1'] = df['item_col1'].str.split('|')
df = df.explode('item_col1').reset_index()
df['item_col1'] = df['item_col1'].str.strip()


print(df)

   index header_col1  header_col2 item_col1
0      0           A          111     apple
1      0           A          111    banana
2      1           B          222     apple
3      2           C          333      pear
4      2           C          333    grapes
5      3           D          444    grapes
SeaBean
  • 22,547
  • 3
  • 13
  • 25
0
import pandas as pd
data = pd.DataFrame({"header_col1":['A','A','B','C','C','D'],
                     'header_col2':[111,111,222,333,333,444],
                      'item_col1':['apple','banana','apple','pear','grapes','grapes']})
data.explode('header_col1')

Link to explode documentation click here

Mohan
  • 46
  • 3