How to do transformation from one DataFrame to create a new DataFrame with different format?

Question

Now I have a DataFrame TrainLabelModified, a 120538 x 3 DataFrame as below:

        user_id  video_id  operation_times
0             0        10                3
1             0        15                3
2             0        19                7
3             0        21                3
4             0        28                5
5             0        30                9
6             0        39                3
7             0        40                3
8             0        45                3
9             0        47                2
10            0        58                3
...         ...       ...              ...
120526     5048         7                1
120527     5048        37                2
120528     5048        40               12
120529     5048        49                2
120530     5048        52                6
120531     5049         3               49
120532     5049        25               14
120533     5049        35               21
120534     5049        36                1
120535     5049        37                4
120536     5049        46               25
120537     5049        53               10
120538     5049        56                5

And I want a new DataFrame TrainDataFinal, a 5050 x 64 DataFrame like this:

        user_id  video_0_operation_times v1_ot v2_ot ... v61_ot  v62_dt  
0             0        ...                 ...  ...       ...      ...
1             1        ...                 ...  ...       ...      ...
2             2        ...                 ...  ...       ...      ...
3             3        ...                 ...  ...       ...      ...
4             4        ...                 ...  ...       ...      ...
5             5        ...                 ...  ...       ...      ...
...         ...        ...                 ...  ...       ...      ...
5048       5048        ...                 ...  ...       ...      ...
5049       5049        ...                 ...  ...       ...      ...

For example, for user 0 in sample data, his/her v(n)_ot is: v10_ot = 3, v15_ot = 3, v19_ot = 7, ... , v58_ot = 3 and other v(n)_ot = 0.

My idea is to create a TrainDataFinal = np.zeros([5050,64]) and assign value to it one by one according to TrainDataModified. But since the DataFrame is quite huge, it might cost too much time. Is there any solution to this issue?

@Vinnton read this from piRsquared https://stackoverflow.com/questions/47152691/how-to-pivot-a-dataframe/47152692#47152692 you will understand pivot better — Bharath M Shetty, Nov 27 '17 at 12:38
@COLDSPPED Unfortunately I got an error: `ValueError: Index contains duplicate entries, cannot reshape`. Is there anything I can to fix the error? — Vinnton, Nov 27 '17 at 12:40
Okay, try this - `df.pivot_table(index='user_id', columns='video_id', values=operation_times')` — cs95, Nov 27 '17 at 13:02

How to do transformation from one DataFrame to create a new DataFrame with different format?

0 Answers0