1

I have a data frame like this:

col1         col2                col3
 A        12134 tea2014           2
 B        2013 coffee 1           1
 C        green 2015 tea          4

I want to remove where the digits occurring for exact four times

The result will look like:

 col1         col2                col3
 A        12134 tea                 2
 B         coffee 1                 1
 C        green tea                 4

What is the best way to do it using python

Kallol
  • 2,089
  • 3
  • 18
  • 33

1 Answers1

3

You will need str.replace with a carefully applied regex pattern:

# Thanks to @WiktorStribiżew for the improvement!
df['col2'] = df['col2'].str.replace(r'(?<!\d)\d{4}(?!\d)', '')
df

  col1        col2  col3
0    A   12134 tea     2
1    B    coffee 1     1
2    C  green  tea     4

Regex Breakdown
The pattern (?<!\d)\d{4}(?!\d) will look for exactly 4 digits that are not preceeded by digits before or after (so strings of less/more than 4 digits are left alone).

(
    ?<!   # negative lookbehind 
    \d    # any single digit
)
\d{4}     # match exactly 4 digits
(
    ?!    # negative lookahead
    \d
)
cs95
  • 379,657
  • 97
  • 704
  • 746