Sorting datetime by looping through multiple dataframes using globals() as a variable reference

Question

I have 20 dataframes of the following format but with 140000 or so lines. The data format is '%Y/%m/%d' or YYYY/MM/DD

In [1]:  data1 = pd.DataFrame({'Day': ['2020-04-07','2020-04-07', '2020-04-07','2020-08-11','2020-08-11','2020-08-11','2020-06-14','2020-06-14','2020-06-14'],
                           'Time': ['23:41:18', '23:42:56', '23:44:34','10:23:10','15:24:46','10:24:13','23:41:18','23:42:56','23:44:34'],
                           'V': [1044.865, 1044.889, 1044.914,320.014,320.033,320.018,1044.865,1044.889,1044.914]})
        
        data2 = pd.DataFrame({'Day': ['2020-04-07','2020-04-07', '2020-04-07','2020-08-11','2020-08-11','2020-08-11','2020-06-14','2020-06-14','2020-06-14'],
                           'Time': ['23:41:18', '23:42:56', '23:44:34','10:23:10','15:24:46','10:24:13','23:41:18','23:42:56','23:44:34'],
                           'V': [1044.865, 1044.887, 1044.914,320.014,320.033,320.018,1044.865,1044.889,1044.914]})
        
        data3 = pd.DataFrame({'Day': ['2020-04-07','2020-04-07', '2020-04-07','2020-08-11','2020-08-11','2020-08-11','2020-06-14','2020-06-14','2020-06-14'],
                           'Time': ['23:41:18', '23:42:56', '23:44:34','10:23:10','15:24:46','10:24:13','23:41:18','23:42:56','23:44:34'],
                           'V': [1044.865, 1044.888, 1044.914,320.014,320.033,320.018,1044.865,1044.889,1044.914]})

    In [2]:data2.head(15)
    Out[2]: 
          Day         Time        V
    0  2020-04-07   23:41:18   1044.865
    1  2020-04-07   23:42:56   1044.887 
    2  2020-04-07   23:44:34   1044.914
    3  2020-08-11   10:23:10   320.014
    4  2020-08-11   15:24:46   320.033
    5  2020-08-11   10:24:13   320.018
    6  2020-06-14   23:41:18   1044.865
    7  2020-06-14   23:42:56   1044.889
    8  2020-06-14   23:44:34   1044.914

And and I'm using the following loop to try and sort the dataframes by the dates in the column 'Day' and then by the 'Time' column. In my actual dataframe I can have around 3 measures for each minute.

My purpose is to not have to type the dataframe names 20 times, and this is the solution I found that works quite well with .index .drop() .reset_index() attributes. But not with the .sort_values() as seen in this loop for some reason:

 In [3]:for n in range(1,4,1):
            globals()["data" + str(n)]['Day'] = pd.to_datetime(globals()["data" + str(n)]['Day'], 
            format = '%Y/%m/%d')
            globals()["data" + str(n)].sort_values(by=['Day','Time'])
            globals()["data" + str(n)]['Day'] = globals()["data" + str(n)]['Day'].astype(str)
 Out[3]: 
          Day         Time        V
    0  2020-04-07   23:41:18   1044.865
    1  2020-04-07   23:42:56   1044.887 
    2  2020-04-07   23:44:34   1044.914
    3  2020-08-11   10:23:10   320.014
    4  2020-08-11   15:24:46   320.033
    5  2020-08-11   10:24:13   320.018
    6  2020-06-14   23:41:18   1044.865
    7  2020-06-14   23:42:56   1044.889
    8  2020-06-14   23:44:34   1044.914

But if i just use the loop to make the 'Date' column a 'datetime' and then use the .sort_values() by manually typing the dataframe name it works.

 In [4]:for n in range(1,4,1):
            globals()["data" + str(n)]['Day'] = pd.to_datetime(globals()["data" + str(n)]['Day'], 
            format = '%Y/%m/%d')
        data2.sort_values(by=['Day','Time'])

 Out[3]: 
          Day         Time        V
    0  2020-04-07   23:41:18   1044.865
    1  2020-04-07   23:42:56   1044.887 
    2  2020-04-07   23:44:34   1044.914
    6  2020-06-14   23:41:18   1044.865
    7  2020-06-14   23:42:56   1044.889
    8  2020-06-14   23:44:34   1044.914
    3  2020-08-11   10:23:10   320.014
    4  2020-08-11   10:24:13   320.018
    5  2020-08-11   15:24:46   320.033

Any suggestions on how to make this work in a more dynamic way?

First thing first: **stop using `globals()`!** It’s almost *never* appropriate. For your use-case a dictionary or a regular list might be appropriate. — Konrad Rudolph, Oct 20 '20 at 20:38
https://stackoverflow.com/questions/19158339/why-are-global-variables-evil — polkas, Oct 21 '20 at 07:01
You have to use temp variable or inplace argument inside sort_values. — polkas, Oct 22 '20 at 21:13

Sorting datetime by looping through multiple dataframes using globals() as a variable reference

0 Answers0