0

I am currently working with email data and when extracting from Outlook, the body of the email still keeps all of the escape characters within the string.

I'm using the re package in Python to achieve this, but to no avail.

Here's an example of text I'm trying to rid the escape characters from:

I am completely in agreement with that. \r\n\r\n\rbest regards.

Expected:

I'd like this to read: "I am completely in agreement with that. best regards.

I've tried the following to extract the unwanted text:

re.findall(r'\\\w+', string)
re.findall(r'\\*\w+', string)
re.findall(r'\\[a-z]+', string)

None of these are doing the trick. I'd appreciate any help!

Thanks!

sophros
  • 14,672
  • 11
  • 46
  • 75

4 Answers4

3

you can try this:

re.sub(r'\n|\r','', string)


'I am completely in agreement with that. best regards.'
Billy Bonaros
  • 1,671
  • 11
  • 18
  • 1
    For Python 2.x and unicode strings it may be necessary to first compile the pattern with flag `re.UNICODE` for this to work. – sophros Sep 06 '19 at 14:20
0

You are confusing a representation of whitechars (please read more about them here).

You should rather be looking for \r, \n characters this way:

re.findall(r'\n\w+', string)

or

re.findall(r'\r\w+', string)
sophros
  • 14,672
  • 11
  • 46
  • 75
0

It seems you want to get rid of the line returns. If so, you don't need the re module, just use:

string.replace("\r\n", "")
Guillaume Adam
  • 191
  • 2
  • 10
0

You can write a function by yourself:

def function(string):
    while '\\' in string:
        ind = string.find('\\')
        string = string[:ind] + string[ind+2:]

    return string
ARD
  • 333
  • 1
  • 13