So I have a Pandas DataFrame with [207684 rows x 3 columns]:
Row Column Coef
0 obj F0000010042600010020 552.261551
1 ARE000001 F0000010042600010020 1.000000
2 IDA00000126004 F0000010042600010020 1.000000
3 MOA26004 F0000010042600010020 60.600000
4 POL26004 F0000010042600010020 6.744780
5 FIB26004 F0000010042600010020 439.350000
6 DIS26004 F0000010042600010020 -727.200000
7 TR0001004 F0000010042600010020 0.006313
8 FR0020004 F0000010042600010020 0.007481
9 DIF0020004 F0000010042600010020 -4666.200000
10 obj F0000010052600010020 693.506264
11 ARE000001 F0000010052600010020 1.000000
... ... ... ...
I have to create a matrix from this data, using the information on the first 2 columns as indices and the values from the third columns as the entries in the matrix. What I thought to do at first was get the unique values on the first 2 columns, loop through them getting the values from the dataframe like that:
>>> Rows = Dados["Row"].unique()
>>> Cols = Dados["Column"].unique()
>>> ProblemM=np.zeros((len(Rows),len(Cols)))
>>> for (i,index1) in zip(Rows,tqdm_notebook(range(len(Rows)))):
... for (j,index2) in zip(Cols,tqdm_notebook(range(len(Cols)))):
... Data = Dados.loc[(Dados.Row==i) & (Dados.Column==j),'Coef'].values
... ProblemM[index1,index2]=Data[0] if len(Data)>0 else None
But as expected that will take ages, as I'll have a matrix with dimensions [6813 x 21683], is there some ways to significantly improve the performance for this task?!