4

I have a pandas dataframe as shown here:

id    pos      value                sent
1     a/b/c    test/test2/test3     21
2     d/a      test/test5           21

I would like to split (=explode)df['pos'] and df['token'] so that the dataframe looks like this:

id      pos          value         sent
1       a            test          21
1       b            test2         21 
1       c            test3         21 
2       d            test          21
2       a            test5         21

It doesn't work if I split each column and then concat them à la

pos = df.token.str.split('/', expand=True).stack().str.strip().reset_index(level=1, drop=True)

df1 = pd.concat([pos,value], axis=1, keys=['pos','value'])

Any ideas? I'd really appreciate it.

EDIT:

I tried using this solution here : https://stackoverflow.com/a/40449726/4219498

But I get the following error: TypeError: Cannot cast array data from dtype('int64') to dtype('int32') according to the rule 'safe'

I suppose this is a numpy related issue although I'm not sure how this happens. I'm using Python 2.7.14

Mi.
  • 510
  • 1
  • 4
  • 20
  • 2
    check the following page: @piRSquared 's solution can be easily extended to many similar situations like the one you have: https://stackoverflow.com/questions/49923145/pandas-records-with-lists-to-separate-rows/49923384#49923384 – jxc Apr 29 '18 at 01:35
  • Agreed. This solution is easily extendable and solved my problem. – Mi. Apr 29 '18 at 17:49

1 Answers1

3

I tend to avoid the stack magic in favour of building a new dataframe from scratch. This is usually also more efficient. Below is one way.

import numpy as np
from itertools import chain

lens = list(map(len, df['pos'].str.split('/')))

res = pd.DataFrame({'id': np.repeat(df['id'], lens),
                    'pos': list(chain.from_iterable(df['pos'].str.split('/'))),
                    'value': list(chain.from_iterable(df['value'].str.split('/'))),
                    'sent': np.repeat(df['sent'], lens)})

print(res)

   id pos  sent  value
0   1   a    21   test
0   1   b    21  test2
0   1   c    21  test3
1   2   d    21   test
1   2   a    21  test5
jpp
  • 159,742
  • 34
  • 281
  • 339