I am struggling to get something like a matrix dataframe in pandas/NumPy.
I have the following dataframe
Value | Source | Placement | First |
---|---|---|---|
A | GooFeed | 1 | GooFeed |
A | CyberCyber | 2 | GooFeed |
A | GooFeed | 3 | GooFeed |
B | CyberCyber | 1 | CyberCyber |
B | GooFeed | 2 | CyberCyber |
C | WowFeed | 1 | WowFeed |
C | WowFeed | 2 | WowFeed |
C | CyberCyber | 3 | WowFeed |
C | GooFeed | 4 | WowFeed |
C | CyberCyber | 5 | WowFeed |
D | GooFeed | 1 | GooFeed |
What I want to achieve is a percentage of the feeds having data from each other. The feeds with the 1 in Placement got the data (value A) first. Sometimes the same values appear in multiple data feeds and so are multiple placements from the same feed at one value, like with value C and CyberCyber. But the values should be unique and not count as 2 times data from WowFeed.
I calculated total values in a separate dataframe like the following:
Source | Firsts | TotalDistinctValues | FirstSources |
---|---|---|---|
GooFeed | 2 | 4 | {'CyberCyber':1;'WowFeed':1} |
CyberCyber | 1 | 3 | {'GooFeed':1;'WowFeed':1} |
WowFeed | 1 | 1 |
And the matrix I am looking for is the following:
source | GooFeed | CyberCyber | WowFeed | Total |
---|---|---|---|---|
GooFeed | 25% | 25% | 50% | |
CyberCyber | 33% | 33% | 66% | |
WowFeed | ||||
Average | 33% | 25% | 58% |
Average (bottom row) of the column values excluding the value source matching itself. Like GooFeed at GooFeed.
Total (outer row on the right) should be to total of its values from its own distinct values.
I really do not get my hand around how this should work. Am I even right to try it in pandas/NumPy?
I am so glad for every help.
Best regards
Maurice