Unable to replace/delete \xa0 from a string in Python (parsing from Excel)

Question

    from collections import Counter
    from openpyxl import load_workbook

    nomefile = 'SerieA18_19.xlsx'

    wb = load_workbook(nomefile)
    ws = wb.worksheets
    sheet = wb.active
    max_row = sheet.max_row

    results = []
    for i in range(1, max_row + 1):
      cell_obj = sheet.cell(i, 1).value
      cell_obj.strip()
      cell_obj.replace('\\xa0', ' ')
      if cell_obj[2:3] == '-':
         results.append(cell_obj)
      if cell_obj[3:4] == '-' and cell_obj[:1] != '(':
         results.append(cell_obj)


    results_counter = Counter()
    for response in results:
       results_counter.update(response.split(','))

    print(results_counter)

OUTPUT is as follows : Counter({'1\xa0-\xa01': 44, '2\xa0-\xa01': 39, '1\xa0-\xa00': 35, '0\xa0-\xa00': 34, '2\xa0- \xa00': 28, '0\xa0-\xa01':

I am not able to delete/replace these '\xa0' that are probably coming from the Excel file

The strange issue is that if I print(results[0]) the output is correct '1-1' — Al Pan, Apr 26 '20 at 14:29

Nandu Raj · Answer 1 · 2020-04-26T14:28:24.147

1

String in python is immutable. You need to assign the value to a variable. Replace

cell_obj.strip()
cell_obj.replace('\\xa0', ' ')

with

  cell_obj = cell_obj.strip().replace(u'\xa0', u' ')

\xa0 is actually non-breaking space in Latin1 (ISO 8859-1), also chr(160). When .encode('utf-8'), it will encode the unicode to utf-8, that means every unicode could be represented by 1 to 4 bytes. For this case, \xa0 is represented by 2 bytes \xc2\xa0.

Read up on http://docs.python.org/howto/unicode.html.

edited Apr 26 '20 at 14:28

answered Apr 26 '20 at 14:19

Nandu Raj

2,072
9
20

You are right but doesn't work either – Al Pan Apr 26 '20 at 14:20
I've just tried but nothing changes in the output – Al Pan Apr 26 '20 at 14:22
Try: cell_obj = cell_obj.strip().replace(r"\xa0", ' ') @AlPan – Nandu Raj Apr 26 '20 at 14:23
It doesn't change – Al Pan Apr 26 '20 at 14:24
Check the edited answer @AlPan – Nandu Raj Apr 26 '20 at 14:28
ichanged the above row in cell_obj = cell_obj.strip().encode('utf-8') but Counter in Output is empty now – Al Pan Apr 26 '20 at 14:36
Yea, Needs to encode it to utf-8. So did my answer work? It also changes it to utf-8. Did you try this? cell_obj = cell_obj.strip().replace(u'\xa0', u' ') – Nandu Raj Apr 26 '20 at 14:38
not really, output now is -> Counter() – Al Pan Apr 26 '20 at 14:40
GREEEEAAAAT !!! Now it is perfect thnks – Al Pan Apr 26 '20 at 14:40
ok cool. I had actually updated to this answer like 12 mins ago. I also you didn't see my updated answer. Thats why asked you to again to check. Happy that it worked. Kindly accept my answer – Nandu Raj Apr 26 '20 at 14:42
i did it but my reputation is still too low – Al Pan Apr 26 '20 at 14:44
Try now. Either upvote or accept answer should work. If it didn't ignore. Happy that the problem is solved. – Nandu Raj Apr 26 '20 at 14:46

Unable to replace/delete \xa0 from a string in Python (parsing from Excel)

1 Answers1