0

I have the following dataframe:

             ID             Datetime    Measurement            A    B       C
0          1000  2009-01-01 00:29:59          0.117            0    0       1
1          1000  2009-01-01 00:59:59          0.050            0    0       1
2          1000  2009-01-01 01:29:59          0.025            0    0       1
3          1000  2009-01-01 01:59:59          0.025            0    0       1
4          1000  2009-01-01 02:29:59          0.049            0    0       1
         ...                  ...            ...          ...  ...     ...
157992991  7444  2010-04-06 21:59:59          1.457            1    0       0
157992992  7444  2010-04-06 22:29:59          1.811            1    0       0
157992993  7444  2010-04-06 22:59:59          1.054            1    0       0
157992994  7444  2010-04-06 23:29:59          0.925            1    0       0
157992995  7444  2010-04-06 23:59:59          1.003            1    0       0
[157992996 rows x 6 columns]

The first column represents the ID of a consumer. The second represents the Datetime when the Measurement of the third column was taken. This measurements are taken half-hourly every day. And the three last columns, A, B and Cis an one-hot-encode information regarding the type of consumer.

What i need here is to change the format of this dataframe, to one where in each row it should have 48 measurements (1 day of half-hour measurements) followed by the one-hot-encode information. The columns should be only the HH:MM:SS (the YY-MM-DD does not matter) and the one-hot-encode variables A, B and C. For example, it should looke like this for the first three rows (that would be related to measurements from the ID 1000 in 2009-01-01 until 2009-01-03) and the last row (related the measurements from the ID 7444 in 2010-04-06):

            00:29:59 00:59:59 01:29:59 01:29:59 ... 23:29:59 23:59:59 A B C
0              0.117    0.050    0.025    0.025 ...    0.280    0.133 0 0 1 <--- 2009-01-01 from ID 1000
1              0.237    0.251    0.028    0.172 ...    0.223    0.111 0 0 1 <--- 2009-01-02 from ID 1000
2              0.157    0.122    0.039    0.101 ...    0.321    0.139 0 0 1 <--- 2009-01-03 from ID 1000
       ...      ...      ...      ...   ...    ...      ...   ...
3,291,520      0.433    0.541    0.829    0.901 ...    0.925    1.003 1 0 0 <--- 2010-04-06 from ID 7444

So far, i get the column names like this:

cols_name = df['Datetime'][0:48].dt.strftime('%H:%M:%S').values

Not really sure how can i proceed from here, any ideas?

EDIT: None of the examples in the suggesed answer solves my problem. It is a different kind of pivot that is not there.

Murilo
  • 533
  • 3
  • 15

0 Answers0