removing an backslash from a string

Question

I have a string that is a sentence like I don't want it, there'll be others

So the text looks like this I don\'t want it, there\'ll be other

for some reason a \ comes with the text next to the '. It was read in from another source. I want to remove it, but can't. I've tried. sentence.replace("\'","'")

sentence.replace(r"\'","'")

sentence.replace("\\","")

sentence.replace(r"\\","")

sentence.replace(r"\\\\","")

I know the \ is to escape something, so not sure how to do it with the quotes

Do you have the actual text `'I don\'t want it, there\'ll be other'` in the source code? Or do you read the text from some file or input from the user? — Some programmer dude, Oct 16 '15 at 11:51
How do you write text? Backslashes are automatically removed on print. — Eugene Soldatov, Oct 16 '15 at 11:51
@JoachimPileborg It was read in from some file, not inputted — jason, Oct 16 '15 at 11:53
crap, when I do `print` that variable it doesn't so up, so it is a `nltk` problem then? It is splitting `don\'t`, all i see is `don` — jason, Oct 16 '15 at 12:00

score 9 · Accepted Answer · answered Oct 16 '15 at 11:51

9

The \ is just there to escape the ' character. It is only visible in the representation (repr) of the string, it's not actually a character in the string. See the following demo

>>> repr("I don't want it, there'll be others")
'"I don\'t want it, there\'ll be others"'

>>> print("I don't want it, there'll be others")
I don't want it, there'll be others

answered Oct 16 '15 at 11:51

Cory Kramer

114,268
16
167
218

this doesn't help me, because I feed the string through `nltk` and it thinks `don` is a separate word, cutting off the word `don't` – jason Oct 16 '15 at 11:56
i think this is a `nltk` problem then, thanks for the help – jason Oct 16 '15 at 12:06
1

It's not an nltk "problem". The backslashes are how python is showing you that the string doesn't end at the apostrophe, as everyone has said. The usual NLTK tokenization intentionally breaks up words at the apostrophe; this has nothing to do with the backslashes. – alexis Oct 16 '15 at 20:49

score 2 · Answer 2 · answered Oct 16 '15 at 11:57

2

Try to use:

sentence.replace("\\", "")

You need two backslashes because first of them act as escape symbol, and second is symbol that you need to replace.

answered Oct 16 '15 at 11:57

Eugene Soldatov

9,755
2
35
43

score 1 · Answer 3 · answered Oct 16 '15 at 12:08

1

It is better to use regular expression to remove backslash:

>>> re.sub(u"u\005c'", r"'", "I don\'t want it, there\'ll be other")
"I don't want it, there'll be other"

answered Oct 16 '15 at 12:08

Mayur Koshti

1,794
15
20

score 0 · Answer 4 · edited May 23 '17 at 12:10

If your text comes from crawled text and you didn't clean it up by unescaping before you process it with NLP tools, then you could easily unescape the HTML markups, e.g.:

In python2.x:

>>> import sys; sys.version
'2.7.6 (default, Jun 22 2015, 17:58:13) \n[GCC 4.8.2]'
>>> import HTMLParser
>>> txt = """I don\'t want it, there\'ll be other"""
>>> HTMLParser.HTMLParser().unescape(txt)
"I don't want it, there'll be other"

In python3:

>>> import sys; sys.version
'3.4.0 (default, Jun 19 2015, 14:20:21) \n[GCC 4.8.2]'
>>> import html
>>> txt = """I don\'t want it, there\'ll be other"""
>>> html.unescape(txt)
"I don't want it, there'll be other"

See also: How do I unescape HTML entities in a string in Python 3.1?

removing an backslash from a string

4 Answers4