I have a pandas dataframe like below:
MSuite TCase KWord
0 MS1 Nan Nan
1 NaN T1 NaN
2 NaN NaN K1
3 NaN NaN K4
4 NaN NaN K8
5 NaN NaN V3
6 NaN T2 NaN
7 NaN NaN K7
8 NaN NaN K12
9 NaN NaN V10
10 MS2 NaN NaN
11 NaN T3 NaN
12 NaN NaN K22
13 NaN NaN K15
14 NaN NaN V3
15 NaN T4 NaN
16 NaN NaN K10
17 NaN NaN K4
18 NaN NaN K12
19 NaN NaN K2
20 NaN NaN V6
And I want to split it into the following matricies:
Matrix 1 - K Matrix
[[1, 4, 8, 7, 12, 22, 15, 10, 4, 12, 2],
[1, 1, 1, 2, 2, 3, 3, 4, 4, 4, 4],
[1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2]]
For the top row
Each K# is stored in a separate dataframe/array that I want to get the index of for each K in chronological order.
For the middle row
I want to denote under which TCase each K is stored in, i.e. T1 is consisting of K1, K4, K8...
Bottom Row
Similar to middle row, but with the MSuite denoting.
Matrix 2 - V Matrix
[[3, 10, 3, 6],
[4, 3, 3, 5],
[1, 1, 2, 2]]
Top Row
Similar to top row in Matrix 1, where I have a separate file with a list of all the various V# available, and I would like to just grab the index of the file V# that is the same as the one in the dataframe.
Middle Row
The index of the V# within each group of T#, i.e in T1, V3 is the 4th KWord, and for T2, V10 is the 3rd KWord.
Bottom Row
Denoting which MSuite the V# is falling under. i.e V3 in T1 as well as V10 in T2 are both falling under MS1 -> 1 while V3 in T3 and V6 in T4 are both falling under MS2 -> 2.
Attempt So Far
I began by looping through the dataframe, but came across this word of advice. So now I am wondering if I need to somehow implement vectorization, but I really do not know where to start with that.
If not vectorization, I would continue to loop through the data frame, while caching at each MSuite and TCase increase, and appending the cached values.