If I have multiple csv files each containing timeseries data indexed by date. Is there a way to create a single dataframe containging all the data with the index adjusting for new dates that may not have been seen previously in the prior files. For example say I read in timeseries 1:
03/01/2001 2.984
04/01/2001 3.016
05/01/2001 2.891
08/01/2001 2.527
09/01/2001 2.445
11/01/2001 2.648
12/01/2001 2.803
15/01/2001 2.943
The dataframe would look pretty much like the data above. But if I then read in another file say timeseries 2
02/01/2001 24.75
03/01/2001 24.35
04/01/2001 25.1
08/01/2001 23.5
09/01/2001 23.6
10/01/2001 24.5
11/01/2001 24.7
12/01/2001 24.4
You can see that timeseries 1 has a value for 05/01/2001 and timeseries 2 does not. Also timeseries 2 has data points for 02/01/2001 and 10/01/2001. So is there a way to end up with the following:
02/01/2001 null 24.75 ..etc
03/01/2001 2.984 24.35 ..etc
04/01/2001 3.016 25.1 ..etc
05/01/2001 2.891 null ..etc
08/01/2001 2.527 23.5 ..etc
09/01/2001 2.445 23.6 ..etc
10/01/2001 null 24.5 ..etc
11/01/2001 2.648 24.7 ..etc
12/01/2001 2.803 24.4 ..etc
15/01/2001 2.943 null ..etc
where the index adjusts for the new dates and any timeseries with out data for that day is set to null or some such value?
My code so far is fairly basic, I can walk through a directory and open .csv files and ready them into a dataframe but I do not know how to combine the dataframes together in the way outlined above.
def getTimeseriesData(DataPath,columnNum,startDate,endDate):
#print('startDate: ',startDate,' endDate: ',endDate)
colNames = ['date']
path = DataPath
print('DataPath: ',DataPath)
filePath = path, "*.csv"
allfiles = glob.glob(os.path.join(path, "*.csv"))
for fname in allfiles:
name = os.path.splitext(fname)[0]
name = os.path.split(name)[1]
colNames.append(name)
dataframes = [pd.read_csv(fname, header=None,usecols=[0,columnNum]) for fname in allfiles]
#not sure of the next bit