A noob question (revised): I read in a .csv file and I tried to specify dtypes as in the following:
import csv
import pandas as pd
cdc = pd.read_csv('myFile.csv',
dtype = {
'Phenotype': str,
'State': str,
'EventType': str,
'EventYear': str,
'AgeCategory': str,
'NumberTested' : str,
'NumberResistant': str,
'PercentResistant': str
})
But after reading the df file, I get:
df.dtypes
Phenotype object
State object
EventType object
EventYear object
AgeCategory object
NumberResistant object
PercentResistant object
dtype: object
I thought instead I'd get dtypes of string for each column.
I'd like each column to be read as a string because some columns have a mixture of numbers and strings as you'll see below in the .csv example file. Once the file is read in I can start manipulating the gosh darn thing!
Bottom line: I want to clean up the data rows and replace "None Tested" and "Not Defined" with NaNs or zeroes. I can't do that with 'objects,' that I can figure out, anyway.
FYI, I've read in the .csv as a df with no 'dtype' parameter but I get the same problem. After reading the file with no dtypes specfied, I tried creating a new column of integers from existing columns, but the 'object' dtype seems to get in the way of that, too.
I'm stuck. I've looked around and I can't seem to figure this out myself.
Sample input .csv file here: (there are no extra lines between rows, I just tried to make the rows more readable)
Phenotype,State,EventType,EventYear,AgeCategory,NumberTested,NumberResistant,PercentResistant
Acinetobacter,AK,All HAIs,2011, 1-18,2,1,0.5
Acinetobacter,AK,CAUTI,2011, 1-18,0,None Tested,Not Defined
Acinetobacter,AK,CLABSI,2011, 1-18,0,None Tested,Not Defined
Acinetobacter,AK,SSI,2011, 1-18,0,None Tested,Not Defined
Acinetobacter,AK,All HAIs,2011,<1,2,2,1.0
Acinetobacter,AK,CAUTI,2011,<1,0,None Tested,Not Defined
Acinetobacter,AK,CLABSI,2011,<1,0,None Tested,Not Defined
Acinetobacter,AK,SSI,2011,<1,0,None Tested,Not Defined
Acinetobacter,AK,All HAIs,2011,19-64,(1-19),Insufficient Data,Insufficient Data