4

I understand to remove a single backslash we might do something like from Removing backslashes from a string in Python

I've attempted to:

I'd like to know how to remove in the list below all the words like '\ue606',

A = 
['Historical Notes 1996',
'\ue606',
'The Future of farms 2012',
'\ch889',
'\8uuuu',]

to transform it into

['Historical Notes 1996',
'The Future of farms 2012',]

I tried:

A = ['Historical Notes 1996',
'\ue606',
'The Future of farms 2012',
'\ch889',
'\8uuuu',]

for y in A:
      y.replace("\\", "")
A

It returns:

['Historical Notes 1996',
 '\ue606',
 'The Future of farms 2012',
 '\\ch889',
 '\\8uuuu']

I'm not sure how to address the string following the '\' or why it added a new '\' rather than remove it.

halfer
  • 19,824
  • 17
  • 99
  • 186
Katie Melosto
  • 1,047
  • 2
  • 14
  • 35
  • 2
    What have you tried? Where are you getting stuck? Stack Overflow generally expects you to show a good-faith attempt at meeting your requirements on your own before posting here, in accordance with [ask]. – esqew Jun 10 '21 at 18:30
  • Thanks @esqew for your feedback. I added my attempt at this. I'm quite new to python so I know my attempt is incorrect, but hopefully it offers some insight into where I am – Katie Melosto Jun 10 '21 at 18:40
  • The question isn't clear at all. The issue is that `'\ue606'` means a string with **one** character in it (which Python represents with a Unicode escape, but will print as ), but `"\ch889"` means a string with **six** characters in it - a backslash, lowercase c, etc. It is **necessary** to understand *what the data actually is*, and then show a [mre] that clarifies the problem properly. – Karl Knechtel Aug 07 '22 at 06:29
  • Anyway, there seem to be two separate questions here: 1) why nothing was removed (see https://stackoverflow.com/questions/9189172/why-doesnt-calling-a-string-method-do-anything-unless-its-output-is-assigned); 2) why the backslashes are doubled up (see https://stackoverflow.com/questions/24085680/why-do-backslashes-appear-twice). – Karl Knechtel Aug 07 '22 at 06:32

2 Answers2

6

Python is somewhat hard to convince to just ignore unicode characters. Here is a somewhat hacky attempt:

l = ['Historical Notes 1996',
'\ue606',
'The Future of farms 2012',
'\ch889',
'\8uuuu',]


def not_unicode_or_backslash(x):
    try:
        x = x.encode('unicode-escape').decode()
    finally:
        return not x.startswith("\\")
        

[x for x in l if not_unicode_or_backslash(x)]

# Output: ['Historical Notes 1996', 'The Future of farms 2012']

The problem is that you can't check directly whether or not the string starts with a backslash since \ue606 is not considered as the 6-character string, but as a single unicode character. Because of this, it does not start with a backslash and for

[x for x in l if not x.startswith("\\")]

you get

['Historical Notes 1996', '\ue606', 'The Future of farms 2012']
mcsoini
  • 6,280
  • 2
  • 15
  • 38
3

You can use this.
Use isprintable() for unicode string and '\\' for strings start with backlash.

List = ['Historical Notes 1996','\ue606','The Future of farms 2012','\ch889','\8uuuu',]
print([x for x in List if x[0] != '\\' and x.isprintable()])
angel_dust
  • 121
  • 2
  • 10
  • 1
    Thanks so much, @Osamu Zenji. I also found this helpful website: https://java2blog.com/remove-unicode-characters-python/#:~:text=There%20are%20many%20ways%20to%20to%20remove%20unicode,%28%29%20method%20to%20decode%20%28%29%20it%20back.%201 about removing unicode – Katie Melosto Jun 10 '21 at 19:10