Performance for creating a matrix by getting indices and the values from a Pandas Dataframe

Question

So I have a Pandas DataFrame with [207684 rows x 3 columns]:

                   Row                Column         Coef
0                  obj  F0000010042600010020   552.261551
1            ARE000001  F0000010042600010020     1.000000
2       IDA00000126004  F0000010042600010020     1.000000
3             MOA26004  F0000010042600010020    60.600000
4             POL26004  F0000010042600010020     6.744780
5             FIB26004  F0000010042600010020   439.350000
6             DIS26004  F0000010042600010020  -727.200000
7            TR0001004  F0000010042600010020     0.006313
8            FR0020004  F0000010042600010020     0.007481
9           DIF0020004  F0000010042600010020 -4666.200000
10                 obj  F0000010052600010020   693.506264
11           ARE000001  F0000010052600010020     1.000000
...                ...                   ...          ...

I have to create a matrix from this data, using the information on the first 2 columns as indices and the values from the third columns as the entries in the matrix. What I thought to do at first was get the unique values on the first 2 columns, loop through them getting the values from the dataframe like that:

>>> Rows = Dados["Row"].unique()
>>> Cols = Dados["Column"].unique()
>>> ProblemM=np.zeros((len(Rows),len(Cols)))
>>> for (i,index1) in zip(Rows,tqdm_notebook(range(len(Rows)))):
...     for (j,index2) in zip(Cols,tqdm_notebook(range(len(Cols)))):
...         Data = Dados.loc[(Dados.Row==i) & (Dados.Column==j),'Coef'].values
...         ProblemM[index1,index2]=Data[0] if len(Data)>0 else None

But as expected that will take ages, as I'll have a matrix with dimensions [6813 x 21683], is there some ways to significantly improve the performance for this task?!

What is your expected output based on the example data you've shown. — Erfan, Nov 28 '19 at 13:39
Sounds like a [`pivot`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.pivot.html) — ALollz, Nov 28 '19 at 13:43
@ALollz didn't know about that function, it does exactly what i wanted, and really fast. Thank you very much — Lexnard, Nov 28 '19 at 13:59
@Lexnard great! Since this is a pivot problem, I'm going to close it as a duplicate of a very highly rated and informative post. It's a very thorough question and answer, though it may be a bit much at first. Still I think it will be very helpful, and if you encounter a specific problem that isn't addressed in that post feel free to ask another question after trying to pivot yourself. :D — ALollz, Nov 28 '19 at 14:01

Performance for creating a matrix by getting indices and the values from a Pandas Dataframe

0 Answers0