How do I delete whole rows from a dataframe based on specific criteria using Pandas and RegEx?

Question

I am new to Pandas and am working with a dataset of 8000 rows. Here is a snippet from it:

These are some of the lines. (https://i.stack.imgur.com/8ftng.png) I have imported the file and named it 'df'.

I have been trying to delete every line in the dataset that contains a link in the source text.

Here is my code so far:

def cleanLinks(col):
    if re.search('http\S+', col):
        return index(col)

df = df.drop(df.index[df['source'].apply(cleanLinks)])

I have no idea where to go from here so would greatly appreciate any help.

Did you tried search on stackoverflow for similar problems? Like this for example https://stackoverflow.com/questions/57237193/delete-rows-in-pandas-given-a-regex or https://stackoverflow.com/questions/15325182/how-to-filter-rows-in-pandas-by-regex? — tturbo, Dec 15 '22 at 15:10

score 2 · Accepted Answer · answered Dec 15 '22 at 14:50

2

If I understood you right:

df = df[~df['source'].str.contains('http')]

answered Dec 15 '22 at 14:50

gtomer

1 Answers1