1
    from collections import Counter
    from openpyxl import load_workbook

    nomefile = 'SerieA18_19.xlsx'

    wb = load_workbook(nomefile)
    ws = wb.worksheets
    sheet = wb.active
    max_row = sheet.max_row

    results = []
    for i in range(1, max_row + 1):
      cell_obj = sheet.cell(i, 1).value
      cell_obj.strip()
      cell_obj.replace('\\xa0', ' ')
      if cell_obj[2:3] == '-':
         results.append(cell_obj)
      if cell_obj[3:4] == '-' and cell_obj[:1] != '(':
         results.append(cell_obj)


    results_counter = Counter()
    for response in results:
       results_counter.update(response.split(','))

    print(results_counter)

OUTPUT is as follows : Counter({'1\xa0-\xa01': 44, '2\xa0-\xa01': 39, '1\xa0-\xa00': 35, '0\xa0-\xa00': 34, '2\xa0- \xa00': 28, '0\xa0-\xa01':

I am not able to delete/replace these '\xa0' that are probably coming from the Excel file

Al Pan
  • 39
  • 7

1 Answers1

1

String in python is immutable. You need to assign the value to a variable. Replace

cell_obj.strip()
cell_obj.replace('\\xa0', ' ')

with

  cell_obj = cell_obj.strip().replace(u'\xa0', u' ')

\xa0 is actually non-breaking space in Latin1 (ISO 8859-1), also chr(160). When .encode('utf-8'), it will encode the unicode to utf-8, that means every unicode could be represented by 1 to 4 bytes. For this case, \xa0 is represented by 2 bytes \xc2\xa0.

Read up on http://docs.python.org/howto/unicode.html.

Nandu Raj
  • 2,072
  • 9
  • 20