0

I have a DataFrame with a column Sent_tokenize with a list, like below:

   Book  Chapter                            Text                           Sent_tokenize   Length
0     1        1                        'Sen 1.'                              ['Sen 1.']        1
1     1        1   'Sen 1. Sen 2. Sen 3. Sen 4.'   ['Sen 1.','Sen 2.','Sen 3.','Sen 4.']        4
2     1        1                        'Sen 1.'                              ['Sen 1.']        1
3     1        1                 'Sen 1. Sen 2.'                     ['Sen 1.','Sen 2.']        2
4     1        1                        'Sen 1.'                              ['Sen 1.']        1

I'd like to unpivot that column to get result as below:

   Book  Chapter                            Text     Sent_tokenize   Length
0     1        1                        'Sen 1.'        ['Sen 1.']        1
1     1        1   'Sen 1. Sen 2. Sen 3. Sen 4.'        ['Sen 1.']        4
2     1        1   'Sen 1. Sen 2. Sen 3. Sen 4.'        ['Sen 2.']        4
3     1        1   'Sen 1. Sen 2. Sen 3. Sen 4.'        ['Sen 3.']        4
4     1        1   'Sen 1. Sen 2. Sen 3. Sen 4.'        ['Sen 4.']        4
5     1        1                        'Sen 1.'        ['Sen 1.']        1
6     1        1                 'Sen 1. Sen 2.'        ['Sen 1.']        2
7     1        1                 'Sen 1. Sen 2.'        ['Sen 2.']        2
8     1        1                        'Sen 1.'        ['Sen 1.']        1

I was thinking about doing it with a loop before creating that DataFrame from a list, but maybe there is a quicker solution. Any idea? Thanks in advance!

1 Answers1

4

use DataFrame.explode

df.explode('Sent_tokenize')

   Book  Chapter                         text Sent_tokenize  Length
0     1        1                       Sen 1.        Sen 1.       1
1     1        1  Sen 1. Sen 2. Sen 3. Sen 4.        Sen 1.       4
1     1        1  Sen 1. Sen 2. Sen 3. Sen 4.        Sen 2.       4
1     1        1  Sen 1. Sen 2. Sen 3. Sen 4.        Sen 3.       4
1     1        1  Sen 1. Sen 2. Sen 3. Sen 4.        Sen 4.       4
2     1        1                       Sen 1.        Sen 1.       1
3     1        1                Sen 1. Sen 2.        Sen 1.       2
3     1        1                Sen 1. Sen 2.        Sen 2.       2
4     1        1                       Sen 1.        Sen 1.       1

if you want list:

new_df=df.explode('Sent_tokenize')
new_df['Sent_tokenize']=new_df['Sent_tokenize'].apply(lambda x: [x])
print(new_df)


   Book  Chapter                         text Sent_tokenize  Length
0     1        1                       Sen 1.      [Sen 1.]       1
1     1        1  Sen 1. Sen 2. Sen 3. Sen 4.      [Sen 1.]       4
1     1        1  Sen 1. Sen 2. Sen 3. Sen 4.      [Sen 2.]       4
1     1        1  Sen 1. Sen 2. Sen 3. Sen 4.      [Sen 3.]       4
1     1        1  Sen 1. Sen 2. Sen 3. Sen 4.      [Sen 4.]       4
2     1        1                       Sen 1.      [Sen 1.]       1
3     1        1                Sen 1. Sen 2.      [Sen 1.]       2
3     1        1                Sen 1. Sen 2.      [Sen 2.]       2
4     1        1                       Sen 1.      [Sen 1.]       1
ansev
  • 30,322
  • 5
  • 17
  • 31