How to efficiently transform df?

Question

I have the following df:

          CUI1      CUI2  tot
0     C0000699  C3894683    2
1     C0000699  C0101725    1
2     C0000699  C1882413    3
...        ...       ...  ...
9995  C0000715  C0026382   56
9996  C0000715  C0010334   101
...

which I need to transform into a co-occurrence matrix:

          C0000699 C3894683 C0101725 C1882413 ... C0026382
C0000699  0        2        1        3            m
...
C3894683  2        0        n        p        ... q
...
etc..

The df is extremely large (~11 m rows), and I have tried looping through it to set columns (using unstack and get_level, etc.), but it is taking an inordinate amount of time and memory. Also, note, that pairs that have a count of 0 tot are not given in the initial stacked df.

Any vectorized solutions to speed this up and lower the memory footprint? The closest I could find was this constructing-a-co-occurrence-matrix-in-python-pandas, but the df is structured differently.

How is performance with [`pivot`](https://pandas.pydata.org/docs/reference/api/pandas.pivot.html)? — Parfait, Feb 12 '22 at 17:38
Quite well! To be honest, I didn't even think of the simple solution: `cooc.pivot_table(values='tot', index=['CUI1'], columns="CUI2")` — horcle_buzz, Feb 12 '22 at 19:41

score 0 · Answer 1 · answered Feb 12 '22 at 21:09

0

you can try this.

df.pivot_table(index='CUI1',columns='CUI2',values='tot')

answered Feb 12 '22 at 21:09

AvivSar

273
1
8

See above, I already got the solution. – horcle_buzz Feb 12 '22 at 21:40

How to efficiently transform df?

1 Answers1