I have a df and I would like to use 2 of the columns (sample, var) so that the unique values from var column become the index and the unique values from sample other col become the new column names. I would then like to populate the table with 'True' wherever there was a row where a given sample and var co-occured in the original df and populate with na where sample and var did not co-occur.
sample, var
s1, v1
s1, v2
s2, v1
s2, v3
Would become:
s1, s2
v1, T, T
v2, T, na
v3, na, T
I apologise if there is already a question that answers this already. I am new to pandas and am not sure the technical words to search. I tried this possible anser but it didn't work for me and returned a table with samples still in a column rather than as col-headers like this:
pivot = df_all.pivot(index='VAR', columns='SAMPLE').stack(dropna=False)
print(pivot.head(20))
var, sample
v1, s1
v1, s2
v2, s1
v3, s2
Empty dataframe
Columns: []
Index: []
I also tried this:
df_all['MUT']=True
pivot = df_all.pivot(index='VAR', columns='SAMPLE', values='MUT').stack(dropna=False)
print(pivot.head(20))
This returned the right information but the samples were still in a column rather than being the names of the columns as I expected.
The aim is to ultimately make a heatmap so if there is a better solution to achieve this then that would also be fine. I figure the answer is something simple that I missed but I've tried searching and can't find it. I could probably solve this using iteration but am really looking for a vector/pandas type approach. Many thanks.