2

i'm using a dataset that contains a column "Streams" dtype: object and i just need to replace "," by "." to later use pandas.to_numeric() and convert String by float64. Is there a way to replace only the characters and keep the numbers?

Example: 48,633,449 to 48.633.449

Code:

import pandas as pd
import numpy as np

dados = pd.read_csv("spotify_dataset.csv")

dados.dropna()
dados['Streams'].replace(",", ".")
dados['Streams'] = pd.to_numeric(dados['Streams'])
dados.head()

and got this:

ValueError: Unable to parse string "48,633,449" at position 0

[Error]

1

eshirvana
  • 23,227
  • 3
  • 22
  • 38
  • 1
    if you change `48,633,449` to `48.633.449` How can you later on convert to numeric? What number is `1.2.3`?? No you should consider changing `48,633,449` to `48633449` – Onyambu Jun 02 '22 at 00:54

3 Answers3

1

You are throwing away your replace since you are not assigning it to anything. Unless you explicitly use inplace=True arguments, Pandas methods do not change the current instance of an object (Series, Dataframes).

You can provide the result of replace as the argument to the to_numeric function

import pandas as pd
import numpy as np

dados = pd.read_csv("spotify_dataset.csv")

dados = dados.dropna()
dados['Streams'] = pd.to_numeric(dados['Streams'].replace(",", "."))
dados.head()
1

You should be using .str.replace instead of just .replace.

dados['Streams'] = pd.to_numeric(dados['Streams'].str.replace(",", ""))

Also, I don't think your intention is to replace commas with decimals. That would result in the same error since multiple decimals are invalid.

StevenS
  • 662
  • 2
  • 7
-1
import pandas as pd
import numpy as np

dados = pd.read_csv("spotify_dataset.csv")

dados = dados.dropna()
dados['Streams'] = dados['Streams'].replace(",", ".")
dados['Streams'] = pd.to_numeric(dados['Streams'])
dados.head()
  • 2
    even if the answer is correct, it should include an explanation in some form to say what the code is doing – rv.kvetch Jun 02 '22 at 00:39