-1

I have a dataframe that originally looked like this:

|student_name|subject                    |
|------------|---------------------------|
|smith       |['maths', 'english']       |
|jones       |['maths', 'english']       |
|alan        |['art', 'maths', 'english']|

I used explode to get the following table:

|student_name|subject|
|------------|-------|
|smith       |maths  |
|smith       |english|
|jones       |maths  |
|jones       |english|
|alan        |art    |
|alan        |maths  |
|alan        |english|

I then reset the index as I want to delete all rows containing the string 'maths'. However, instead of just deleting the rows containing maths it deletes all rows as if they hadn't been exploded/reindexed.

Here's my code:

student_df = pd.DataFrame(data)
student_df = student_df.explode('subject')
student_df = student_df.reset_index(drop=True)
student_df = student_df[student_df["subject"].str.contains("maths") == False]

What am I doing wrong?

Andy
  • 509
  • 2
  • 7

1 Answers1

0

The ideal way to do this is to avoid multiple assignments and to use a pipeline.

A few remarks:

  • You can pass a function/lambda to loc to refer to the dataframe itself.
  • Use ~ to invert the value of str.contains.
  • if you want to check for exact match, do not use str.contains but eq/ne (equal/not equal).
student_df2 = (student_df
 .explode('subject')
 .loc[lambda d: ~d['subject'].str.contains("maths")]
)

output:

  student_name  subject
0        smith  english
1        jones  english
2         alan      art
2         alan  english
mozway
  • 194,879
  • 13
  • 39
  • 75
  • 1
    OP says *However, instead of just deleting the rows containing maths it deletes all rows as if they hadn't been exploded/reindexed.*, your output is exactly the same with what his code produces. – Ynjxsjmh May 10 '22 at 08:19
  • @Ynjxsjmh I read it the other way around, that OP does **not** want to remove all. OP should provide the full explicit output for clarity – mozway May 10 '22 at 08:25