1

When i am trying to extract the data from an xlsx files. I get the encoding details with the data as well.

Consider the code as shown below,

column_number = 0
column_headers = []
#column_headers = sheet.row_values(row_number)
while column_number <= sheet.ncols - 1:
    column_headers.append(sheet.cell(row_number, column_number).value)
    column_number+=1

return column_headers

output is,

[u'Rec#', u'Cyc#', u'Step', u'TestTime', u'StepTime', u'Amp-hr', u'Watt-hr', u'Amps', u'Volts', u'State', u'ES', u'DPt Time', u'ACR', u'DCIR']

I just want to extract the cell value which is the data without "u'" attached to it . How can i get just that ?

golldy
  • 1,279
  • 1
  • 15
  • 31
  • The main thing I have to ask is: **WHY** do you need the data without the `u`? I ask because it really sounds like you don't know what the `u` means. You haven't mentioned Unicode at all, you say you are getting "encoding details" when in fact the `u` means just the opposite, and in most cases a Unicode string compares equal to its "look-alike" ASCII-encoded byte string, so I'm really curious to find out what you need to get rid of the `u`s for. – John Y Oct 04 '13 at 03:04
  • it creates problems when i want to match some data from the cell directly to a harcoded value. Also, when i have to create a dictionary using this data and read it into a mongo collection, the mapping of data that comes from xlsx files and csv files are different. So this according to me xlrd specific. CSV python module does not do this. I hope i could help to resolve ur curiosity. – golldy Oct 04 '13 at 03:23
  • The Python `csv` module doesn't support Unicode, but **ALL** character data stored in an Excel file **is** Unicode. What hard-coded values are you trying to match? `u'foo'` is equal to `'foo'` in Python. – John Y Oct 04 '13 at 03:31
  • say the value of sheet.cell(4,0) is 3 and when i wanna match this with 3 . It doesnt really work. – golldy Oct 04 '13 at 16:07
  • Well, you accepted an answer already, but I'm still really confused. 3 is equal to 3 in Python. 3 is even equal to 3.0 in Python. So I have no idea why you are having any trouble. – John Y Oct 04 '13 at 17:35
  • the answer that i accepted works just the why i wanted. so the encode ascii ignore helps the 'u' disappear...! – golldy Oct 04 '13 at 17:58

2 Answers2

0

Have you tried the following:

print data.value

In the new code could you try this:

import unicodedata
...
output = []
for cell in column_headers:
    output.append(unicodedata.normalize('NFKD', cell))
return output

Please see this for more info: https://stackoverflow.com/a/1207479/2168278

Community
  • 1
  • 1
0

You can use string encoding to convert the unicode to ascii. So your updated code should be

column_headers.append((sheet.cell(row_number, column_number).value).encode('ascii','ignore'))

You can get the value by using data.value for the content of the field name. Also note that integers are imported as floats by default. So, you may end with with an additional .0 in the end, which you can remove by typecasting the value by using int(data.value).

lionelmessi
  • 1,116
  • 2
  • 10
  • 17
  • It might work directly... but please take a look at the updated code in my question, it still gives me the same ouput. No luck. – golldy Oct 04 '13 at 02:07