I have a Data Frame that I read in as,
df = pd.read_csv(r'path\file.csv', encoding = "ISO-8859-1")
This is how it looks,
Machine ID Machine June July August
0 100 ABC 10 12 nan
1 100 ABC nan 15 15
2 101 CDQ 12 20
3 101 CDQ 15 32 11
And data types:
Machine ID int 64
Machine object
June float64
July object
August float64
When I try to groupby
like this,
machine_group = df.groupby(['Machine ID','Machine'])\['June', 'July', 'August'].sum()\
.reset_index()
I only get June
and August
as July
has an empty space/ empty string.
ID Machine June August
0 100 ABC 10 15
1 101 CDQ 27 31
Therefore, I tried the fllowoing,
df = df.apply(pd.to_numeric, errors = 'ignore')
This did not convert my July
column to numeric/float64.
Next, I tried this,
df.replace(r'\s+', np.nan, regex=True)
This also, did not work. I still have the empty space in my data frame. Not sure what to do.
I was reading this post, seems like I have the inverse issue of this.
How can I make sure I have nan
instead of empty string? because that empty string in July
column makes the column to be object
and it doesn't count for aggregation in groupby
clause.
(I checked the original .csv file and that exact line, it is normal empty cell as others, where other empty cells get read in as nan
and this particular one is not)
Any suggestions would be nice.