-1

I am working on a machine learning problem, and I am trying to write a lambda function to remove the punctuation from a pandas column, unfortunately the lamda expression is not working as expected

combi['tidy_tweet'] = combi['tidy_tweet'].apply(lambda x: x.replace("[^a-zA-Z#]", " "))

The above expression leaves the column intact, while I expect it to remove the punctuation.

Does anybody have any idea what is wrong with the lambda expression above?

Max
  • 9,100
  • 25
  • 72
  • 109
  • 2
    You are trying to `replace` using regex but unfortunately is not supported. You need to use `re` module. This question has answer here [How to input a regex in string.replace?](https://stackoverflow.com/questions/5658369/how-to-input-a-regex-in-string-replace) – CodeIt Aug 06 '19 at 15:39
  • Can you provide `tidy_tweet` dataframe? – Tony Montana Aug 06 '19 at 15:43

2 Answers2

4

If you need to replace by a regular expression, then you need to import re and use re.sub() instead of str.replace():

 ...lambda x: re.sub("[^a-zA-Z#]", "", x)
Israel Unterman
  • 13,158
  • 4
  • 28
  • 35
1

x arg in your lambda function is a pure string that calls built-in str.replace method which does not operate on regex patterns.
Instead you may just apply pandas.Series.replace function with regex flavour:

combi['tidy_tweet'] = combi['tidy_tweet'].replace(r'[^a-zA-Z#]', ' ', regex=True)
RomanPerekhrest
  • 88,541
  • 4
  • 65
  • 105