2

I am using xlrd to parse a .xlsx file to a JSON format. I encountered a problem with commas and dots..

The value which I entered into my Excel file: 152,203. I would like to parse this into a value of: 152.203, but no luck yet. It saves the value as 152 which is a problem. In Dutch commas are often used as dots, so this mistake has to be caught.

Partial code example:

key3 = OrderedDict()
row_values3 =  sheet.col_values(1, 0, 60)

comma_to_dot = row_values3[36]  # cell with 152,203
key3['Value'] = comma_to_dot.replace(",",".")

data.append(key3)    

j = json.dumps(data)

with open(full_path, 'w') as datafile:
    datafile.write(j)

print(full_path)

This code give the error: AttributeError: 'float' object has no attribute 'replace'

I have also tried:

key3['Value'] = int(row_values3[36])

But this also returns 152

jonrsharpe
  • 115,751
  • 26
  • 228
  • 437
Anna Jeanine
  • 3,975
  • 10
  • 40
  • 74
  • JSON requires floats to be `x.y`, and doesn't allow thousand separators in numbers. How you *display* that is a locale issue, which should be handled by whatever UI your users see the data through. – jonrsharpe Nov 14 '16 at 11:35
  • I get that, but I just want to change commas to dots in one value. These could only be one or two commas. – Anna Jeanine Nov 14 '16 at 11:37
  • It seems like you need to set the locale in Excel, this isn't really anything to do with XLRD. As the error message tells you, you are getting a floating point *number* back, not a string. – jonrsharpe Nov 14 '16 at 11:37
  • I can't change this in Excel because this script handles Excel files that are uploaded by a user. – Anna Jeanine Nov 14 '16 at 11:39
  • Then you can't do anything about this, the user will have to enter numbers according to the locale they have set. That information is already lost by the time you receive the data. – jonrsharpe Nov 14 '16 at 11:39
  • So no way that I can change a comma to a dot in my Python parsing script?? – Anna Jeanine Nov 14 '16 at 11:40
  • It's super easy to switch dots to commas, and your code would work perfectly *if* you were receiving a string representing a number, like `'152,203'`. **But you are not.** You are receiving a number, like `152`, and there's no way for you to tell whether that's the number the original user meant to enter or not. The way that number is displayed, whether `152,000` or `150.000` or `150000`, is outside of your control, it's up to the user to set their OS and Excel to use an appropriate locale. – jonrsharpe Nov 14 '16 at 11:42
  • But the number entered is 152,203? – Anna Jeanine Nov 14 '16 at 11:44
  • Please listen to what I'm telling you: *this does not appear to be a problem you can solve in Python*. By the time XLRD gets involved, the number is already `152`. If that's not correct, you need to enter `152.203` into Excel, or change your locale to one that uses `,` as the thousands separator. Look into the cell formatting options within Excel, and ensure that the number being stored *in the spreadsheet* is the one you actually want. What is `repr(row_values3[36])`? It's a float apparently, but with what value? `152.0`? – jonrsharpe Nov 14 '16 at 11:48
  • Oke thank you for elaboration on our comment. I just wanted to confirm that this is a integer with a comma, and I am suprissed that Excel/XLRD disregards to ,203. – Anna Jeanine Nov 14 '16 at 11:49
  • @jonrsharpe Are you sure about this? If the excel file is able to be opened again and still has the full 152,203 value, then it is a problem with the python library misunderstanding foreign locales. Now it may be that the library was only ever designed to understand locales which use a period, but that doesn't change the fact that it is a python / python library issue not an Excel issue. I don't think we've had any indication as to whether a saved file when reopened in excel still has the full data in or not. – Jon Taylor Nov 14 '16 at 11:52
  • @JonTaylor XLRD gets the underlying cell value, it's not trying to parse strings or something like that. This is why people get surprised by e.g. floats coming out of date cells: http://stackoverflow.com/questions/13962837/reading-date-as-a-string-not-float-from-excel-using-python-xlrd – jonrsharpe Nov 14 '16 at 11:53
  • @jonrsharpe Yup I understand that, but if it is getting the underlying cell value and that cell does indeed contain the value 152,203 and is understood with the locale of the xls file as being a float then it is XLRD that is misunderstanding the file therefore is a python library issue. It would be completely ridiculous to suggest that files can only be passed in from some locals due to the way they represent floats. Obviously as I said, they may state in XLRD that it only supports certain locals, but this still comes down to being a python side issue surely? – Jon Taylor Nov 14 '16 at 12:01
  • *"It would be completely ridiculous to suggest that files can only be passed in from some locals"* - indeed, and that's exactly the **opposite** of what I'm saying. The value Excel stores behind the scenes does not include the idea of a locale, it's just a floating point number; it's neither `152,203` nor `152.203` (or even `152203`, for that matter!) Whether it's shown with thousands separators, and what character is used for that, is purely an interest of the UI layer, and unrelated to how the float values are represented in memory. XLRD gets the *value*, not the *representation*. – jonrsharpe Nov 14 '16 at 12:10
  • @jonrsharpe Then I am confused as to why you suggest this is not something that can be fixed in python and must be changed by the creators of the excel files? If the number is indeed stored as a floating point in the background then surely XLRD should be able to read it? What am I missing? – Jon Taylor Nov 14 '16 at 12:13
  • 1
    @JonTaylor that the number being stored is apparently `152.203` (where the dot is the decimal point, not the thousands separator), **not** `152203`, because the locale settings the Excel sheet is using say that `,` is the decimal point. – jonrsharpe Nov 14 '16 at 12:19
  • @jonrsharpe haha thanks :) – Jon Taylor Nov 14 '16 at 12:21
  • @OP sorry about all of that spam, but hopefully that's made things a bit clearer! If not, or if further investigation demonstrates that I got completely the wrong end of the stick, do let me know. – jonrsharpe Nov 15 '16 at 10:37

1 Answers1

0

How do you know that it saves the value as 152?

After this:

comma_to_dot = row_values3[36]  # cell with 152,203

insert this:

print(type(comma_to_dot), repr(comma_to_dot))

This will show you the exact value in the cell. Interpret a dot as a decimal point. Locale is irrelevant. Please show (copy/paste) what is printed. If it's not what you want, please tell us what you do want.

John Machin
  • 81,303
  • 11
  • 141
  • 189