Now I have a DataFrame TrainLabelModified
, a 120538 x 3
DataFrame as below:
user_id video_id operation_times
0 0 10 3
1 0 15 3
2 0 19 7
3 0 21 3
4 0 28 5
5 0 30 9
6 0 39 3
7 0 40 3
8 0 45 3
9 0 47 2
10 0 58 3
... ... ... ...
120526 5048 7 1
120527 5048 37 2
120528 5048 40 12
120529 5048 49 2
120530 5048 52 6
120531 5049 3 49
120532 5049 25 14
120533 5049 35 21
120534 5049 36 1
120535 5049 37 4
120536 5049 46 25
120537 5049 53 10
120538 5049 56 5
And I want a new DataFrame TrainDataFinal
, a 5050 x 64
DataFrame like this:
user_id video_0_operation_times v1_ot v2_ot ... v61_ot v62_dt
0 0 ... ... ... ... ...
1 1 ... ... ... ... ...
2 2 ... ... ... ... ...
3 3 ... ... ... ... ...
4 4 ... ... ... ... ...
5 5 ... ... ... ... ...
... ... ... ... ... ... ...
5048 5048 ... ... ... ... ...
5049 5049 ... ... ... ... ...
For example, for user 0
in sample data, his/her v(n)_ot
is: v10_ot
= 3, v15_ot
= 3, v19_ot
= 7, ... , v58_ot
= 3 and other v(n)_ot
= 0.
My idea is to create a TrainDataFinal = np.zeros([5050,64])
and assign value to it one by one according to TrainDataModified
. But since the DataFrame is quite huge, it might cost too much time. Is there any solution to this issue?