Read excel and correctly convert decimal numbers in multiple formats

Question

I am trying to read a table from an Excel file with python3.7 in the following way (due to dependencies I cannot use another alternative to this function):

df = pd.read_excel(ruta_origen, sheet_name=sheet, thousands='.', dtype=object)

The problem comes with the numeric columns, that although I specify the parameter dtype=object, python interprets those columns as numeric and performs an internal conversion. The problem comes because these numeric columns have values in multiple formats, for example: 10000,25 and 10.000,25 . I'm trying to eliminate those problems by writing the following code:

df[col] = (
    df[col].astype(str)
    .apply(lambda x: x.replace('.', ''))
    .apply(lambda x: x.replace(',', '.'))
    .astype(float).fillna(0)
)

but when I write the contents of the table to a CSV file, the result is not as I expected, which would be without the point as thousands separator and with the comma as the decimal separator (example 10000,25).

df.to_csv('file.csv', sep='|', index=False, header=True, decimal= ',')

The above fragment of code converts the number 46420370,25 to 4642037025.0, and the number 4,197,214.82 to 4197214,82 Can anyone help me?

Can you provide a reproducible sample of `df`? See [reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). — BigBen, Aug 02 '23 at 17:04
The conversion of 4,197,214.82 to 4197214,82 is exactly what I would expect. Why do you think that's wrong? — Tim Roberts, Aug 02 '23 at 19:43
It looks like this numeric column, although containing numbers, is text. If this is the case, maybe converting it to numbers first would be helpful, as you would have to deal with a single format then. — mits, Aug 02 '23 at 21:28
@BigBen a reproducible sample would be the following: `df = pd.DataFrame({'col': ['46420370,25', '4528923,00', '3623.538,40'], 'col1': ['4.197.214,82', '3004913,95', '608238,71']})` and the expected result would be: `result = pd.DataFrame({'col': ['46420370,25', '4528923,00', '3623538,40'], 'col1': ['4197214,82', '3004913,95', '608238,71']})` — J. Mendoza, Aug 03 '23 at 06:23
@TIMRoberts the number conversion of 4,197,214.82 is correctly, but the above code converts the number 46420370,25 to 4642037025.0 and thats wrong — J. Mendoza, Aug 03 '23 at 06:30

Read excel and correctly convert decimal numbers in multiple formats

0 Answers0