I keep coming across the same issue no matter how many solutions I come up with. I downloaded population data from the united census bureau, and using pandas, I created a data frame. I wanted to do a summary of statistics across all the states and create a line chart for one of the states( it can be any state). the problem is, I keep getting "ValueError: invalid literal for int() with base 10: '989,415'".
link for the data: https://data.census.gov/cedsci/table?q=population%20by%20state&tid=PEPPOP2019.PEPANNRES
This is what I have so far.( The lines with the # are all the solutions I've tried)
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
#open file
with open('/Users/Documents/population data.csv', 'r') as data_file:
#create a dataframe and read the file into it
df= pd.read_csv(data_file)
#rename/clean up headers
df= df.set_axis(['state', '4/1/2010 census popolation',
'4/1/2010',
'7/1/2010',
'7/1/2011',
'7/1/2012',
'7/1/2013',
'7/1/2014',
'7/1/2015',
'7/1/2016',
'7/1/2017',
'7/1/2018',
'7/1/2019'], axis= 'columns')
#indexed by 'state'
df.set_index('state', inplace= True)
#change values to integer
df.dtypes
df.describe()
#df["4/1/2010 census popolation"] = df["4/1/2010 census popolation"].astype("int")
#df.astype({"4/1/2010 census popolation":'int', "4/1/2010":'int'})
#df['4/1/2010 census popolation']= df['4/1/2010 census popolation'].apply(np.int64)
#pd.to_numeric(df)```
[screenshot of the data frame][1]
[1]: https://i.stack.imgur.com/xRkl9.jpg