0

I selected the date range for my dataset using this post. But now there seems to be an abnormality in my data when I use pandas groupby and sum, there seems to be missing data.

The date b/w 2020-04-07 to 2020-04-12 is missing and those date data are being added to 2020-04-06

Code:

covid19India['Date'] = pd.to_datetime(covid19India['Date'],infer_datetime_format=True)
covid19India_new= covid19India[(covid19India['Date'] >= '2020-03-25') & (covid19India['Date']  <= '2020-05-31')].sort_values('Date')
df1=covid19India_new.groupby('Date').sum()
df1.reset_index(inplace=True)
df1.head(20)

enter image description here

Saurav
  • 75
  • 8

1 Answers1

0

You can drop rows with missing values from your dataframe before applying the groupby() and sum().

covid19India['Date'] = pd.to_datetime(covid19India['Date'],infer_datetime_format=True)
# drop any rows with at least one missing value
covid19India_new= covid19India[(covid19India['Date'] >= '2020-03-25') & (covid19India['Date']  <= '2020-05-31')].sort_values('Date').dropna()
df1=covid19India_new.groupby('Date').sum()
df1.reset_index(inplace=True)
Derek O
  • 16,770
  • 4
  • 24
  • 43
  • There are no missing values data for date between 2020-04-07 to 2020-04-13 is present but it's being merged in 2020-04-06 – Saurav Jun 22 '20 at 04:00
  • That's strange... I'll see if i can figure out an alternate solution – Derek O Jun 22 '20 at 04:52
  • I made changes and it worked but I'm not able to understand what's the issue was, can you explain? covid19India['Date'] = pd.to_datetime(covid19India['Date'], ̶i̶n̶f̶e̶r̶_̶d̶a̶t̶e̶t̶i̶m̶e̶_̶f̶o̶r̶m̶a̶t̶=̶T̶r̶u̶e̶ dayfirst=True) and df1=covid19India_new.groupby( ̶'̶D̶a̶t̶e̶'̶covid19India_new['Date']).sum() df1.reset_index(inplace=True) df1=df1.drop('Sno',axis=1) – Saurav Jun 22 '20 at 07:33