I have 20 dataframes of the following format but with 140000 or so lines. The data format is '%Y/%m/%d' or YYYY/MM/DD
In [1]: data1 = pd.DataFrame({'Day': ['2020-04-07','2020-04-07', '2020-04-07','2020-08-11','2020-08-11','2020-08-11','2020-06-14','2020-06-14','2020-06-14'],
'Time': ['23:41:18', '23:42:56', '23:44:34','10:23:10','15:24:46','10:24:13','23:41:18','23:42:56','23:44:34'],
'V': [1044.865, 1044.889, 1044.914,320.014,320.033,320.018,1044.865,1044.889,1044.914]})
data2 = pd.DataFrame({'Day': ['2020-04-07','2020-04-07', '2020-04-07','2020-08-11','2020-08-11','2020-08-11','2020-06-14','2020-06-14','2020-06-14'],
'Time': ['23:41:18', '23:42:56', '23:44:34','10:23:10','15:24:46','10:24:13','23:41:18','23:42:56','23:44:34'],
'V': [1044.865, 1044.887, 1044.914,320.014,320.033,320.018,1044.865,1044.889,1044.914]})
data3 = pd.DataFrame({'Day': ['2020-04-07','2020-04-07', '2020-04-07','2020-08-11','2020-08-11','2020-08-11','2020-06-14','2020-06-14','2020-06-14'],
'Time': ['23:41:18', '23:42:56', '23:44:34','10:23:10','15:24:46','10:24:13','23:41:18','23:42:56','23:44:34'],
'V': [1044.865, 1044.888, 1044.914,320.014,320.033,320.018,1044.865,1044.889,1044.914]})
In [2]:data2.head(15)
Out[2]:
Day Time V
0 2020-04-07 23:41:18 1044.865
1 2020-04-07 23:42:56 1044.887
2 2020-04-07 23:44:34 1044.914
3 2020-08-11 10:23:10 320.014
4 2020-08-11 15:24:46 320.033
5 2020-08-11 10:24:13 320.018
6 2020-06-14 23:41:18 1044.865
7 2020-06-14 23:42:56 1044.889
8 2020-06-14 23:44:34 1044.914
And and I'm using the following loop to try and sort the dataframes by the dates in the column 'Day' and then by the 'Time' column. In my actual dataframe I can have around 3 measures for each minute.
My purpose is to not have to type the dataframe names 20 times, and this is the solution I found that works quite well with .index .drop() .reset_index() attributes. But not with the .sort_values() as seen in this loop for some reason:
In [3]:for n in range(1,4,1):
globals()["data" + str(n)]['Day'] = pd.to_datetime(globals()["data" + str(n)]['Day'],
format = '%Y/%m/%d')
globals()["data" + str(n)].sort_values(by=['Day','Time'])
globals()["data" + str(n)]['Day'] = globals()["data" + str(n)]['Day'].astype(str)
Out[3]:
Day Time V
0 2020-04-07 23:41:18 1044.865
1 2020-04-07 23:42:56 1044.887
2 2020-04-07 23:44:34 1044.914
3 2020-08-11 10:23:10 320.014
4 2020-08-11 15:24:46 320.033
5 2020-08-11 10:24:13 320.018
6 2020-06-14 23:41:18 1044.865
7 2020-06-14 23:42:56 1044.889
8 2020-06-14 23:44:34 1044.914
But if i just use the loop to make the 'Date' column a 'datetime' and then use the .sort_values() by manually typing the dataframe name it works.
In [4]:for n in range(1,4,1):
globals()["data" + str(n)]['Day'] = pd.to_datetime(globals()["data" + str(n)]['Day'],
format = '%Y/%m/%d')
data2.sort_values(by=['Day','Time'])
Out[3]:
Day Time V
0 2020-04-07 23:41:18 1044.865
1 2020-04-07 23:42:56 1044.887
2 2020-04-07 23:44:34 1044.914
6 2020-06-14 23:41:18 1044.865
7 2020-06-14 23:42:56 1044.889
8 2020-06-14 23:44:34 1044.914
3 2020-08-11 10:23:10 320.014
4 2020-08-11 10:24:13 320.018
5 2020-08-11 15:24:46 320.033
Any suggestions on how to make this work in a more dynamic way?