0

I have a list in python

['Deals', "\\\\xe2\\\\x98\\\\x85What\\\\\\'s New\\\\xe2\\\\x98\\\\x85", '\\\\xc3\\\\x80la carte & Value Meals', 'Crispy Chicken', 'Share Box', 'Happy Meals', 'Desserts', 'McCafe\\\\xc3\\\\xa9', 'Beverages', 'Side Lines', 'Snack Time']

I want to decode it like "\\\\xe2\\\\x98\\\\x85What\\\\\\'s New\\\\xe2\\\\x98\\\\x85", '\\\\xc3\\\\x80la carte & Value Meals','McCafe\\\\xc3\\\\xa9' to 'What's New, la carte & Value Meals,McCafe' The desired output would be without any endcoding (\\x98) Desired output:

['Deals', "What's New", 'la carte & Value Meals', 'Crispy Chicken', 'Share Box', 'Happy Meals', 'Desserts', 'McCafe', 'Beverages', 'Side Lines', 'Snack Time']
Red
  • 26,798
  • 7
  • 36
  • 58
  • Could you please be more specific? You can take your list and replace string.replace('\\x98','') This will replace '\\x98' with ''. It will effectively delete it. – Valentyn Jun 12 '20 at 21:05
  • 2
    Possible duplicate of [Regular expression that finds and replaces non-ascii characters with Python](https://stackoverflow.com/questions/2758921/regular-expression-that-finds-and-replaces-non-ascii-characters-with-python) (per "please see this answer" link in second answer) – pppery Jun 12 '20 at 22:14
  • 1
    Really, your list is a bit garbled UTF-8 with result of `★What's New★`, `Àla carte` and `McCafeé`. Accented letters are letters, and _Black Star_ could matter as well… – JosefZ Jun 12 '20 at 22:15
  • It seems like what you're asking for is to replace / delete non-ASCII literals (e.g., `\xe2`) but keep any other spaces letters/spaces. [This answer](https://stackoverflow.com/questions/2758921/regular-expression-that-finds-and-replaces-non-ascii-characters-with-python) seems to be the general-purpose solution for finding/replacing non-ASCII characters in a string. – Professor Nuke Jun 12 '20 at 21:07

1 Answers1

0

You can use the built in module, re:

import re
l = ['Deals', "\\\\xe2\\\\x98\\\\x85What\\\\\\'s New\\\\xe2\\\\x98\\\\x85", '\\\\xc3\\\\x80la carte & Value Meals', 'Crispy Chicken', 'Share Box', 'Happy Meals', 'Desserts', 'McCafe\\\\xc3\\\\xa9', 'Beverages', 'Side Lines', 'Snack Time']
l = [re.sub(r'x\S\d',' ',s).replace('\\','').strip() for s in l]
print(l)

Output:

['Deals', "What's New", 'la carte & Value Meals', 'Crispy Chicken', 'Share Box', 'Happy Meals', 'Desserts', 'McCafe', 'Beverages', 'Side Lines', 'Snack Time']

re.sub(r'x\S\d',' ',s) will return the string s with every 3 neighboring characters that are in this pattern:

x, any character that's not a space, number

replaced with a space.

Red
  • 26,798
  • 7
  • 36
  • 58
  • 1
    I'm downvoting because it's not universal solution. For instance, _Latin Small Letter E With Diaeresis_ with UTF-8 byte sequence `0xC3`, `0xAB` does not match your pattern, see also [this comment](https://stackoverflow.com/questions/62352305/decoding-a-list-in-python#comment110277581_62352305). – JosefZ Jun 12 '20 at 22:40