How to delete continuous four digits from a column value in pandas dataframe

Question

I have a data frame like this:

col1         col2                col3
 A        12134 tea2014           2
 B        2013 coffee 1           1
 C        green 2015 tea          4

I want to remove where the digits occurring for exact four times

The result will look like:

 col1         col2                col3
 A        12134 tea                 2
 B         coffee 1                 1
 C        green tea                 4

What is the best way to do it using python

cs95 · Accepted Answer · 2019-01-20T09:24:53.410

3

You will need str.replace with a carefully applied regex pattern:

# Thanks to @WiktorStribiżew for the improvement!
df['col2'] = df['col2'].str.replace(r'(?<!\d)\d{4}(?!\d)', '')
df

  col1        col2  col3
0    A   12134 tea     2
1    B    coffee 1     1
2    C  green  tea     4

Regex Breakdown
The pattern (?<!\d)\d{4}(?!\d) will look for exactly 4 digits that are not preceeded by digits before or after (so strings of less/more than 4 digits are left alone).

(
    ?<!   # negative lookbehind 
    \d    # any single digit
)
\d{4}     # match exactly 4 digits
(
    ?!    # negative lookahead
    \d
)

edited Jan 20 '19 at 09:24

answered Jan 03 '19 at 10:54

cs95

379,657
97
704
746

1

that is some kicka** regex. :O also +1 for the breakdown – anky Jan 03 '19 at 10:55
1

@anky_91 Huh, if that impressed you, you should take a look at [this](https://stackoverflow.com/a/53915214/4909087)... – cs95 Jan 03 '19 at 10:57
yeah, speechless already. :D – anky Jan 03 '19 at 10:58
@coldspeed what if the string starts with continuous five digits then I want to drop first four digits and take the fifth one ? for example 12345abc will be replaced by 5abc – Kallol Jan 03 '19 at 11:01
`r'((?<=\D)|(?<=^))\d{4}(?=\D|$)'` = `r'(?<!\d)\d{4}(?!\d)'`. This is a dupe of https://stackoverflow.com/a/3532970/3832970 – Wiktor Stribiżew Jan 03 '19 at 11:03
@KallolSamanta You can use `((?<=\D)|(?<=^))\d{4}`. – cs95 Jan 03 '19 at 11:03

How to delete continuous four digits from a column value in pandas dataframe

1 Answers1