I have a two-column dataframe where each row represents a pair.
import pandas as pd
x = pd.DataFrame([['dog', 'cat'], ['fish', 'parrot'], ['dog', 'llama'], ['pig', 'sloth']])
My goal is to convert this into a square matrix, where both the index and column headers are filled with the unique values of the original dataframe, like this:
Using the helpful answer here, I can make a matrix based on the values:
df6 = x.pivot_table(index=0, columns=1, values=1, aggfunc='size', fill_value=0)
This is not quite what I want because it is not square (certain values, like 'dog,' are present in the index but omitted from the columns).
I altered the above to manually type in the items for the columns and rows:
df7 = df6.reindex(index=["cat","fish","pig","llama","parrot","sloth"], columns=["cat","fish","pig","llama","parrot","sloth"], fill_value=0)
Again, this is not quite what I want because it is time-consuming to construct. So I tried adding a line to get the unique list of values:
listOfItems = pd.unique(df.values.ravel('K')
This doesn't work because it gives me 0 and 1, rather than the string values. So I tried obtaining the unique values of the header and column using the following:
listOfColumns = df6.columns
listOfIndex = df6.index
joinedlist = listOfColumns + listOfIndex
but I get an error message: operands could not be broadcast together with shapes (4,) (3,)
Does anyone have a good way to make a square matrix?