I have a dataset comprising values of 1 or 0 which identify whether a given mineral (M) is present, or not, within a sample (S). Example below, but the dataset itself includes about 100 minerals across 160 samples.
import numpy as np
import pandas as pd
data = np.array([['S1', '1', '1', '0', '0'],
['S2', '0', '1', '0', '1'],
['S3', '1', '1', '1', '1'],
['S4', '0', '0', '0', '1']])
minerals = ['Sample', 'M1', 'M2', 'M3', 'M4']
df = pd.DataFrame(data, columns=minerals).set_index('Sample')
co_occurrence = pd.DataFrame(columns=minerals[1:], index=minerals[1:])
For every pair of minerals, I need to identify how frequently they co-occur together in a separate dataframe called co_occurrence; that is, I need to compare every column pair in df, identify where both minerals in those columns are present (1), sum those occurrences and enter a matching total in co_occurrence.
In the example given, the value for the pair M1:M2 in co-occurrence should be 2 as they occur together twice in df.
How do I go about doing this?