0

I have a list, the list contains unicode elements I want to strip ')' and \n and blank space from the list. Essentially create a "clean" copy of the list.

My attempts reference this SO solution Remove specific characters from a string in python and python docs strings for 2.7.

I create my list using bs4 imports removed to minimise size.

def isNotBlank(myString):
    if myString and myString.strip():
        return True
    return False

names = soup.find_all('span', class_="TextLarge")
bucket_list = []

for name in names:
    for item in name.contents:
        for value in item.split('('):
            if isNotBlank(value):
                bucket_list.append(value)

translation_table = dict.fromkeys(map(ord, ')(@\\n#$'), None)
[x.translate(translation_table) for x in bucket_list ]

so print(names) returns

[<span class="TextLarge">Mossfun (11) (Rtg:103)</span>, <span class="TextLarge">58.0</span>, <span class="TextLarge scratched">Atmospherical (8)
      (Rtg:99)</span>, <span class="TextLarge">56.5</span>, <span class="TextLarge scratched">Chloe In Paris (7)
      (Rtg:97)</span>, <span class="TextLarge">55.5</span>, <span class="TextLarge">Bound For Earth (5) (Rtg:92)</span>, <span class="TextLarge">55.5</span>, <span class="TextLarge">Fine Bubbles (4) (Rtg:91)</span>, <span class="TextLarge">55.5</span>, <span class="TextLarge">Brook Road (9) (Rtg:90)</span>, <span class="TextLarge">55.5</span>, <span class="TextLarge">Shamalia (10) (Rtg:89)</span>, <span class="TextLarge">55.5</span>, <span class="TextLarge scratched">Tawteen (6) (Rtg:88)</span>, <span class="TextLarge">55.5</span>, <span class="TextLarge">Ygritte (2) (Rtg:77)</span>, <span class="TextLarge">55.5</span>, <span class="TextLarge">Tahni Dancer (1) (Rtg:76)</span>, <span class="TextLarge">55.5</span>, <span class="TextLarge">All Salsa (3) (Rtg:72)</span>, <span class="TextLarge">55.5</span>]

and bucket_list returns as

[u'Mossfun ', u'11) ', u'Rtg:103)', u'58.0', u'Atmospherical ', u'8) \n      ', u'Rtg:99)', u'56.5', u'Chloe In Paris ', u'7) \n      ', u'Rtg:97)', u'55.5', u'Bound For Earth ', u'5) ', u'Rtg:92)', u'55.5', u'Fine Bubbles ', u'4) ', u'Rtg:91)', u'55.5', u'Brook Road ', u'9) ', u'Rtg:90)', u'55.5', u'Shamalia ', u'10) ', u'Rtg:89)', u'55.5', u'Tawteen ', u'6) ', u'Rtg:88)', u'55.5', u'Ygritte ', u'2) ', u'Rtg:77)', u'55.5', u'Tahni Dancer ', u'1) ', u'Rtg:76)', u'55.5', u'All Salsa ', u'3) ', u'Rtg:72)', u'55.5']

Hoping for

[['Mossfun', 11, 103, 58.0],[Atmospherical, 8, 99, 56.5]]

Currently it passes translation with all characters in place

Community
  • 1
  • 1
sayth
  • 6,696
  • 12
  • 58
  • 100

1 Answers1

1

You are ignoring the return value here; you are translating just fine (albeit not actually handling newlines):

>>> bucket_list = [u'Mossfun ', u'11) ', u'Rtg:103)', u'58.0', u'Atmospherical ', u'8) \n      ', u'Rtg:99)', u'56.5', u'Chloe In Paris ', u'7) \n      ', u'Rtg:97)', u'55.5', u'Bound For Earth ', u'5) ', u'Rtg:92)', u'55.5', u'Fine Bubbles ', u'4) ', u'Rtg:91)', u'55.5', u'Brook Road ', u'9) ', u'Rtg:90)', u'55.5', u'Shamalia ', u'10) ', u'Rtg:89)', u'55.5', u'Tawteen ', u'6) ', u'Rtg:88)', u'55.5', u'Ygritte ', u'2) ', u'Rtg:77)', u'55.5', u'Tahni Dancer ', u'1) ', u'Rtg:76)', u'55.5', u'All Salsa ', u'3) ', u'Rtg:72)', u'55.5']
>>> translation_table = dict.fromkeys(map(ord, ')(@\\n#$'), None)
>>> [x.translate(translation_table) for x in bucket_list ]
['Mossfu ', '11 ', 'Rtg:103', '58.0', 'Atmospherical ', '8 \n      ', 'Rtg:99', '56.5', 'Chloe I Paris ', '7 \n      ', 'Rtg:97', '55.5', 'Boud For Earth ', '5 ', 'Rtg:92', '55.5', 'Fie Bubbles ', '4 ', 'Rtg:91', '55.5', 'Brook Road ', '9 ', 'Rtg:90', '55.5', 'Shamalia ', '10 ', 'Rtg:89', '55.5', 'Tawtee ', '6 ', 'Rtg:88', '55.5', 'Ygritte ', '2 ', 'Rtg:77', '55.5', 'Tahi Dacer ', '1 ', 'Rtg:76', '55.5', 'All Salsa ', '3 ', 'Rtg:72', '55.5']

but the results are stored in a new list; the original strings are not changed in-place as they are immutable. Assign the result back to bucket_list, and fix that newline problem by using \n, not \\n:

translation_table = dict.fromkeys(map(ord, ')(@\n#$'), None)
bucket_list = [x.translate(translation_table) for x in bucket_list ]

You may want to throw in a str.strip() to get rid of the remaining whitespace; the result would be:

>>> [x.translate(translation_table).strip() for x in bucket_list ]
['Mossfun', '11', 'Rtg:103', '58.0', 'Atmospherical', '8', 'Rtg:99', '56.5', 'Chloe In Paris', '7', 'Rtg:97', '55.5', 'Bound For Earth', '5', 'Rtg:92', '55.5', 'Fine Bubbles', '4', 'Rtg:91', '55.5', 'Brook Road', '9', 'Rtg:90', '55.5', 'Shamalia', '10', 'Rtg:89', '55.5', 'Tawteen', '6', 'Rtg:88', '55.5', 'Ygritte', '2', 'Rtg:77', '55.5', 'Tahni Dancer', '1', 'Rtg:76', '55.5', 'All Salsa', '3', 'Rtg:72', '55.5']

The str.strip() would take care of the newlines too in this specific case.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343