0

I'm reading in a csv file that has a column title 'Funding' showing the total amount of funding a company received during their start-up. The column is formatted as a string and includes something similar to the following using pandas:

# create ID column ranging from 1 to 10
id_col = list(range(1, 11))

# create Funding column
funding_col = ['$5M', '$20M', '$5M', 'Unknown', '$20M', 'Unknown', 'Unknown', '$5M', '$20M', 'Unknown']

# create dictionary with column names as keys and column data as values
data = {'ID': id_col, 'Funding': funding_col}

# create DataFrame from dictionary
df = pd.DataFrame(data)

# print DataFrame
print(df)

My question is: How can I convert the 'Funding' column into an integer? How do I deal with the 'Unknown' values? Notice that the Funding is in $M or $B.

I tried stripping the $ and B or M using the .strip function. I also tried to coerce into numeric format using something similar to this:

df['A'] = pd.to_numeric(df['A'], errors='coerce')
wjandrea
  • 28,235
  • 9
  • 60
  • 81
zacramer
  • 1
  • 1
  • What was the problem with your tried solution? – Michael Butscher May 12 '23 at 00:18
  • coercing generated NaN values for the entire 'Funding' column – zacramer May 12 '23 at 00:21
  • Welcome to Stack Overflow! Check out the [tour] and [How to ask a good question](/help/how-to-ask) for tips. Have you done any research? I did a vague google, `pandas convert number suffix`, and found this, which looks helpful: [Convert the string 2.90K to 2900 or 5.2M to 5200000 in pandas dataframe](https://stackoverflow.com/q/39684548/4518341) – wjandrea May 12 '23 at 00:27

0 Answers0