I am working on large table using python (using pandas library).
I would like to perform various kind of vector operations such as Correlation with each rows of the table.
It might be a simple problem, but for me it is difficult to deal with the DataFrame structure. I do not have a good idea about how to convert each row (or column) into lists (or numpy arrays).
Even counting the number of rows does not seem to be a simple problem, because function like df.count()
seems to ignore null data.
Simple data table and the expected result table are like below. In this case, I would like to calculate sum of each row pairs.
The size of real table is much bigger (more than 1000 rows and columns) and contains some null values.
Data.csv:
Label Col1 Col2
Row1 1 2
Row2 3 4
Row3 5 6
Output.csv:
Label Col3
Row1,Row2 4,6
Row1,Row3 6,8
Row2,Row3 8,10