How do you remove backslashes and the word attached to the backslash in Python?

Question

I understand to remove a single backslash we might do something like from Removing backslashes from a string in Python

I've attempted to:

I'd like to know how to remove in the list below all the words like '\ue606',

A = 
['Historical Notes 1996',
'\ue606',
'The Future of farms 2012',
'\ch889',
'\8uuuu',]

to transform it into

['Historical Notes 1996',
'The Future of farms 2012',]

I tried:

A = ['Historical Notes 1996',
'\ue606',
'The Future of farms 2012',
'\ch889',
'\8uuuu',]

for y in A:
      y.replace("\\", "")
A

It returns:

['Historical Notes 1996',
 '\ue606',
 'The Future of farms 2012',
 '\\ch889',
 '\\8uuuu']

I'm not sure how to address the string following the '\' or why it added a new '\' rather than remove it.

What have you tried? Where are you getting stuck? Stack Overflow generally expects you to show a good-faith attempt at meeting your requirements on your own before posting here, in accordance with [ask]. — esqew, Jun 10 '21 at 18:30
Thanks @esqew for your feedback. I added my attempt at this. I'm quite new to python so I know my attempt is incorrect, but hopefully it offers some insight into where I am — Katie Melosto, Jun 10 '21 at 18:40
The question isn't clear at all. The issue is that `'\ue606'` means a string with **one** character in it (which Python represents with a Unicode escape, but will print as ), but `"\ch889"` means a string with **six** characters in it - a backslash, lowercase c, etc. It is **necessary** to understand *what the data actually is*, and then show a [mre] that clarifies the problem properly. — Karl Knechtel, Aug 07 '22 at 06:29
Anyway, there seem to be two separate questions here: 1) why nothing was removed (see https://stackoverflow.com/questions/9189172/why-doesnt-calling-a-string-method-do-anything-unless-its-output-is-assigned); 2) why the backslashes are doubled up (see https://stackoverflow.com/questions/24085680/why-do-backslashes-appear-twice). — Karl Knechtel, Aug 07 '22 at 06:32

mcsoini · Accepted Answer · 2021-06-10T18:51:49.210

6

Python is somewhat hard to convince to just ignore unicode characters. Here is a somewhat hacky attempt:

l = ['Historical Notes 1996',
'\ue606',
'The Future of farms 2012',
'\ch889',
'\8uuuu',]


def not_unicode_or_backslash(x):
    try:
        x = x.encode('unicode-escape').decode()
    finally:
        return not x.startswith("\\")
        

[x for x in l if not_unicode_or_backslash(x)]

# Output: ['Historical Notes 1996', 'The Future of farms 2012']

The problem is that you can't check directly whether or not the string starts with a backslash since \ue606 is not considered as the 6-character string, but as a single unicode character. Because of this, it does not start with a backslash and for

[x for x in l if not x.startswith("\\")]

you get

['Historical Notes 1996', '\ue606', 'The Future of farms 2012']

edited Jun 10 '21 at 18:51

answered Jun 10 '21 at 18:44

mcsoini

6,280
2
15
38

OK -- that's helpful. It seems like python is reading something like \ue606 as a unicode character – Katie Melosto Jun 10 '21 at 18:55
2

@KatieMelosto Python 3 strings are *always* Unicode by definition. – BoarGules Jun 14 '21 at 22:05
@BoarGules - ok - thanks I did not know that! – Katie Melosto Jun 15 '21 at 18:35
"Unicode characters" does not make very much sense and certainly isn't a meaningful distinction in the way that this answer implies. – Karl Knechtel Aug 07 '22 at 06:30
@KarlKnechtel I see what you mean. Do you have a suggestion on how to rephrase? – mcsoini Aug 07 '22 at 07:30
I don't think the question was well posed in the first place, so I don't know that I can give useful advice. – Karl Knechtel Aug 07 '22 at 07:37

angel_dust · Answer 2 · 2021-06-10T19:18:01.307

3

You can use this.
Use isprintable() for unicode string and '\\' for strings start with backlash.

List = ['Historical Notes 1996','\ue606','The Future of farms 2012','\ch889','\8uuuu',]
print([x for x in List if x[0] != '\\' and x.isprintable()])

edited Jun 10 '21 at 19:18

answered Jun 10 '21 at 18:54

angel_dust

121
2
10

1

Thanks so much, @Osamu Zenji. I also found this helpful website: https://java2blog.com/remove-unicode-characters-python/#:~:text=There%20are%20many%20ways%20to%20to%20remove%20unicode,%28%29%20method%20to%20decode%20%28%29%20it%20back.%201 about removing unicode – Katie Melosto Jun 10 '21 at 19:10

How do you remove backslashes and the word attached to the backslash in Python?

2 Answers2