0

I want to delete every row that contains Copy 1 in column copy_nb:

I tried the simple and easy function of pandas Series.str.contains like so:

df=df[~df.copy_nb.str.contains("Copy 1", na=False)]

Unfortunately, it is deleting the columns that contain Copy 1, but also Copy 10, Copy 11, etc.

Here is a sample of the data frame I want to clean:

enter image description here

rahlf23
  • 8,869
  • 4
  • 24
  • 54
SBN
  • 51
  • 3
  • Check the docs for `contains` https://pandas.pydata.org/pandas-docs/version/0.22/generated/pandas.Series.str.contains.html . It is doing exactly what you are asking, as "Copy 11" does indeed contain "Copy 1". Are you looking to remove rows where the column has exactly the value "Copy 1" – danielR9 Apr 25 '19 at 21:33
  • Possible duplicate of [How to drop rows from pandas data frame that contains a particular string in a particular column?](https://stackoverflow.com/questions/28679930/how-to-drop-rows-from-pandas-data-frame-that-contains-a-particular-string-in-a-p) – Edeki Okoh Apr 25 '19 at 21:34
  • You should avoid posting sample dataframes as images, post them as formatted text. – rahlf23 Apr 25 '19 at 22:45

2 Answers2

0

You can select the rows where column copy_nb does not equal the value "Copy 1" like in the example below:

import numpy as np
import pandas as pd

df = pd.DataFrame({
    "copy_nb": [np.nan, np.nan, "Copy 1", "Copy 2"],
    "other_column": [1, 2, 3, 4]
})

print(df)

df_copy1_removed = df.loc[df.copy_nb != "Copy 1", :]  # Here the selection happens

print(df_copy1_removed)
Scriddie
  • 2,584
  • 1
  • 10
  • 17
0

Try the following:

df[~df['copy_nb'].fillna('').str.contains('Copy 1')]
rahlf23
  • 8,869
  • 4
  • 24
  • 54