1

I am trying to convert the values in a Pandas DataFrame from string to float but am running into an error.

The column looks like this:

PIB (IBGE/2005)
---------------
 71.638.000
114.250.000
 44.373.000
462.258.000
186.812.000

Where the . are the digit group separators, so 71.638.000 should become the float 71638000.

But I am getting the error:

ValueError: could not convert string to float: '71.638.000'

Here is an image of my full DataFrame:

Dataframe screenshot How can I convert this column from string to float?

Henry Woody
  • 14,024
  • 7
  • 39
  • 56
  • 4
    What is `71.638.000` supposed to be as a float? – Mark May 16 '20 at 22:17
  • 1
    It looks like that the character `.` in your dataset is a thousand separator, which confuses your parsing library (probably set by default to understand `.` as the English decimal point). I'd recommend cleaning your data to remove those `.` – Pac0 May 16 '20 at 22:21
  • @Pac0 maybe the problem is really the thousand separator, I'll check that out. thank you – Geraldo Britto May 16 '20 at 22:30
  • use `pd.read_csv(yourfile, thousands='.')` or `pd.to_numeric(df[column])` – Umar.H May 16 '20 at 22:38
  • Does this answer your question? [Pandas: convert dtype 'object' to int](https://stackoverflow.com/questions/39173813/pandas-convert-dtype-object-to-int) – Umar.H May 16 '20 at 22:39
  • It worked, the problem was really the period as a thousands separator ... thank you very much to everyone who contributed – Geraldo Britto May 16 '20 at 22:56

2 Answers2

1

The values in column 'PIB (IBGE/2005)' appear to have a period as thousands separators. You'll have to remove those before converting the values to floats. You could try something along these lines to prepare the column for conversion to float:

df1 = df1['PIB (IBGE/2005)'].apply(lambda x: x.replace('.', ''))

EDIT: As suggested in the comments below as a more proper solution to avoid apply

df1 = df1['PIB (IBGE/2005)'].str.replace('.', '')
Chris Greening
  • 510
  • 5
  • 14
  • 1
    Or use one of pandas' string methods for that: df1 = df1['PIB (IBGE/2005)'].str.replace('.', '') – Arne May 16 '20 at 22:31
  • 1
    don't use [apply](https://stackoverflow.com/questions/54432583/when-should-i-ever-want-to-use-pandas-apply-in-my-code) when you have methods can you use in its place. – Umar.H May 16 '20 at 22:42
  • 1
    It worked, the problem was really the period as a thousands separator ... thank you very much to everyone who contributed – Geraldo Britto May 16 '20 at 22:55
0

Replace '.' with '' and convert string into float

S = "123.567.000"
D = float(S.replace('.',''))
print(D)
Hirusha Fernando
  • 1,156
  • 10
  • 29