1

I have a large dataframe, but to make this easy it looks something like this one below

             A  B  C
0    [a, b, c]  1 22
1       [d, e]  2 45
2       [f, g]  3 32
3       [h, i]  4 64
4 [j, k, l, m]  5 76

Now I have used:

df.explode('A')

To explode the dataframe like this:

    A  B  C
0   a  1 22
0   b  1 22
0   c  1 22
1   d  2 45
1   e  2 45
2   f  3 32
2   g  3 32
3   h  4 64
3   i  4 64
4   j  5 76
4   k  5 76
4   l  5 76
4   m  5 76

And now I want to drop every last row from the exploded column 'A'. This means the code will remove the elements: c, e, g, i, m. The output should look something like this:

    A  B  C
0   a  1 22
0   b  1 22
1   d  2 45
2   f  3 32
3   h  4 64
4   j  5 76
4   k  5 76
4   l  5 76

Any idea how I can do this? (note: it is a very large dataframe so I can't just select the rows manually)

1 Answers1

2

Use boolean indexing with Index.duplicated:

df1 = df.explode('A')
df1 = df1[df1.index.duplicated(keep='last')]
print (df1)
    A  B  C
0   a  1 22
0   b  1 22
1   d  2 45
2   f  3 32
3   h  4 64
4   j  5 76
4   k  5 76
4   l  5 76

Or remove last value of lists first by indexing:

df1 = df.assign(A = df.A.str[:-1]).explode('A')
print (df1)
   A  B   C
0  a  1  22
0  b  1  22
1  d  2  45
2  f  3  32
3  h  4  64
4  j  5  76
4  k  5  76
4  l  5  76

Difference is if one element list(s):

print (df)
           A  B   C
0    [a,b,c]  1  22
1      [d,e]  2  45
2      [f,g]  3  32
3        [h]  4  64
4  [j,k,l,m]  5  76

df1 = df.explode('A')
df1 = df1[df1.index.duplicated(keep='last')]
print (df1)
   A  B   C
0  a  1  22
0  b  1  22
1  d  2  45
2  f  3  32
4  j  5  76
4  k  5  76
4  l  5  76

df1 = df.assign(A = df.A.str[:-1]).explode('A')
print (df1)
     A  B   C
0    a  1  22
0    b  1  22
1    d  2  45
2    f  3  32
3  NaN  4  64
4    j  5  76
4    k  5  76
4    l  5  76
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252