0

I have a pandas dataframe of the form:

  col1  |  col2              | col3  |
   1      dog$cat                73
   2      Inherited pig/dog      21
   3      Inherited cat          99
   4      Inherited weasel$dog   33

what I want is to remove Inherited from col2, since it is just an irrelevant artifact and break the rest of the elements at the dollar sign. For example a row containing dog$cat should be duplicated, into a row that contains just dog and one that contains just cat. For the above, the desired result is:

  col1  |  col2              | col3  |
   1      dog                    73
   1      cat                    73
   2      pig/dog                21
   3      cat                    99
   4      weasel                 33
   4      dog                    33

How can I do this in a fast way?

Qubix
  • 4,161
  • 7
  • 36
  • 73
  • Possible duplicate of [Pandas column of lists, create a row for each list element](https://stackoverflow.com/questions/27263805/pandas-column-of-lists-create-a-row-for-each-list-element) – shaik moeed Sep 05 '19 at 08:39

2 Answers2

1

Use pandas.Series.str.replace and split, then do pandas.DataFrame.explode:

df['col2'] = df['col2'].str.replace('Inherited ', '', regex=True).str.split('$')
new_df = df.explode('col2')
print(new_df)

Output:

   col1     col2  col3
0     1      dog    73
0     1      cat    73
1     2  pig/dog    21
2     3      cat    99
3     4   weasel    33
3     4      dog    33
Chris
  • 29,127
  • 3
  • 28
  • 51
  • This is a bit strange, but I get a : AttributeError: 'DataFrame' object has no attribute 'explode' error. But pandas dataframes seem to have the attribute explode. Any idea why? – Qubix Sep 05 '19 at 08:26
  • Its intoduced in pandas 0.25. Perhaps you have old version installed? – Chris Sep 05 '19 at 08:27
  • Glad to hear that :) – Chris Sep 05 '19 at 08:32
0

Alternative.

df2 = df.pop('col2')
df.join(pd.DataFrame(df2.str.replace('Inherited','').str.split(r'[$/]').tolist())).set_index(['col1','col3']).stack().reset_index(name='col2').drop('level_2',axis=1)

Output

   col1  col3     col2
0     1    73      dog
1     1    73      cat
2     2    21      pig
3     2    21      dog
4     3    99      cat
5     4    33   weasel
6     4    33      dog
iamklaus
  • 3,720
  • 2
  • 12
  • 21