2

I am using this code but it does not remove "ellipsis":

Column Review contains 1500 rows of text

Df["Reviews"] = Df['Reviews'].apply(lambda x : " ".join(re.findall('[\w\.]+',x)))

example text would be: "dealer said it does not reimburse dealers for loaners or rentals... so why even be a dealership if they make faulty cars and you re on the line to help customers"

Saud
  • 43
  • 1
  • 11

3 Answers3

2

You can try any of the below ways-

With REGEX

import pandas as pd
pd.set_option('max_colwidth', 400)
df = pd.DataFrame({'Reviews':['dealer said it does not reimburse dealers for loaners or rentals... so why even be a dealership if they make faulty cars and you re on the line to help customers']})
df['Reviews'] = df['Reviews'].replace('\.+','.',regex=True)
print(df)

With REGEX

import re
regex = r"[.]+"
test_str = "dealer said it does not reimburse dealers for loaners or rentals... so why even be a dealership if they make faulty cars and you re on the line to help customers" 
subst = "."
result = re.sub(regex, subst, test_str, 0, re.MULTILINE | re.IGNORECASE)
if result:
    print (result)

With REGEX

import re
regex = r"(\W)\1+"
test_str = "dealer said it does not reimburse dealers for loaners or rentals... so why even be a dealership if they make faulty cars and you re on the line to help customers"
subst = "\\1"
result = re.sub(regex, subst, test_str, 0, re.MULTILINE)   
if result:
    print (result)
A l w a y s S u n n y
  • 36,497
  • 8
  • 60
  • 103
0

Series.str.replace should work for simple expressions:

df.Reviews.str.replace("...", "")
Kyle
  • 2,814
  • 2
  • 17
  • 30
0

If you want to remove this specific word from each row, then you don't need to use RegEx. You can use str.replace as indicated here: How to strip a specific word from a string?

Df["Reviews"] = Df['Reviews'].apply(lambda x:x.replace("ellipsis",""))
Loïc L.
  • 46
  • 4