Python Pandas Delete Row If Condition

Question

I am complete beginner in Python. I've imported a CSV file into Python. It is 1618 rows x 1 columns. Essentially, I want to keep 2 recurring rows of data throughout the data frame. I would like to do this by deleting all rows that do not contain the following text:

1) starts with a space and 9 following digits at the beginning of the row (Ex: "123456789")

2) has a row that contains any of the following digits ("2000", "2001", ..., "2020")

So basically, I would be left with two types of rows however amount of times they appear in the data frame:

1) With a space and 9 digits following

2) with any row containing "2000", all the way up to "2020"

Any help writing this would be amazing and greatly appreciated. I am looking to learn more and be able to do all of this independently.

UPDATE: Hey thank you all for the help... I will provide some lines that print from the CSV for clarification:

11 XXXXXX ...

12 NAME: ABC

13 ----------------------------------------------...

14 XXX...

15 123456789 - - .0000 ...

16 -------------------------------------...

17 G52 0000000000000000000000...

18 G53 XXX 09132017 ...

NOTE: Please disregard the strange lines with X's and dashes... the data comes from another program. Line 18 contains the date which would be found by the year "2017", and line 15 contains the beginning space and 9 digits. If any more information would help, feel free to let me know. Thank you!

Please provide a small set of sample data as text that we can copy and paste. Include the corresponding desired result. Check out the guide on [how to make good reproducible pandas examples](https://stackoverflow.com/a/20159305/3620003). — timgeb, Jun 06 '20 at 17:07
Thank you for the help! Just updated the post, I'll take a look at the link. — nick_zam, Jun 07 '20 at 20:49

score 0 · Answer 1 · answered Jun 06 '20 at 17:13

0

This is two conditions filter with match and contains

con1=df['col1'].str.match('(\s*)?(\d{9})')
con2=df['col1'].str.contains('2000|2001')
yourdf=df[~(con1|con2)]

answered Jun 06 '20 at 17:13

BENY

317,841
20
164
234

Hey, thank you! I will give it a go as soon as I get a chance. I gave an update to my initial post so maybe you'd want to take a look. Appreciate it! – nick_zam Jun 07 '20 at 20:48

Grzegorz Skibinski · Answer 2 · 2020-06-07T21:08:46.447

0

Try:

df=df.loc[df["x"].str.match(r"^(\s*)((\d{9})|(.*20[0-2]\d.*))$")]

x being your input column, and df being your dataframe.

edited Jun 07 '20 at 21:08

answered Jun 07 '20 at 21:02

Grzegorz Skibinski

12,624
2
11
34

Python Pandas Delete Row If Condition

2 Answers2