edit for solution at bottom
I am working on a code that reads in multiple csv data sets and then visualizes the data on a singular graph. The data set that is not running correctly contains 365 rows and is 2 columns (date row 'yyyymmdd' and value row 'extent'). I am trying to replicate a function I used twice already in the code (with new function/variable names) but it is not accepting this code.
I have tried a few things to trouble shoot. First, I tried to not use a function which seemed to circumvent the problem, but this will not work for what I need the code to do overall (I need to be able to graph specific date ranges and the df.extent option didn't seem to accept this). I have also tried looking at the data set for any errors that would prevent data from being read and haven't found any. The data set to be read in here was generated from a separate code and I briefly read that may have been the problem, but I also tried to save the data to a new excel workbook to check and that did not help the problem so I believe it is something in my code.
the function I created is as follows:
def DOI_CDR_18(start,end):
cdr_date=cdr18.loc[(cdr18['yyyymmdd']>=start)&(cdr18['yyyymmdd']<end)]
cdr_drop_18=cdr_date.drop('extent', axis=1)
return cdr_drop_18
date_cdr18=DOI_CDR_18('1/1/2018','12/31/2018')
def CDR_extent_18(start,end):
cdr_extent=cdr18.loc[(cdr18['yyyymmdd'] >= start) & (cdr18['yyyymmdd'] < end)]
cdr_extent_drop=cdr_extent.drop(['yyyymmdd'],axis=1)
return cdr_extent_drop
cdr18_ext=CDR_extent_18('1/1/2018','12/31/2018')
plt.plot(date_cdr18,cdr18_ext,color='green',label='NRT CDR')
plt.legend()
an example of my data format is as such:
yyyymmdd extent
1/1/2018 12672693
1/2/2018 12758550
1/3/2018 12885867
I was expecting 365 data points, both day and extent, to be outputted. Instead the variable explorer lists 116 data points (points from rows 1-16 and then 273-363) having been read in and it will not plot these 116 points even after being read in (error for unhashable type: numpy.ndarray)
solution: I found I needed to use the pd.to_datetime() function.
my specific code is:
cdr18= pd.read_csv("index.csv",parse_dates=True, nrows=366)
cdr18['yyyymmdd'] = pd.to_datetime(cdr18['yyyymmdd'], infer_datetime_format=True)