How to draw venn diagram from a dummy variable in Python Matplotlib_venn?

Question

I have the following code to draw the venn diagram.

import numpy as np
import pandas as pd
import matplotlib_venn as vplt

x = np.random.randint(2, size=(10,3))
df = pd.DataFrame(x, columns=['A', 'B','C'])
print(df)
v = vplt.venn3(subsets=(1,1,1,1,1,1,1))

and the output looks like this:

I actually want to find the numbers in subsets() using the data set. How to do that? or is there any other easy way to make these venn diagram directly from the dataset. I also want to make a box around it and annotate the remaining area as people with all the A,B,C are 0. Then calculate the percentage of the people in each circle and keep it as label. Not sure how to achieve this.

Background of the Problem:

I have a dataset of more than 500 observations and these three columns are recorded from one variable where multiple choices can be chosen as answers. I want to visualize the data in a graph which shows that how many people have chosen 1st, 2nd, etc., as well as how many people have chosen 1st and 2nd, 1st and 3rd, etc.,

CT Zhu · Accepted Answer · 2019-09-09T23:43:52.937

3

Use numpy.argwhere to get the indices of the 1s for each column and plot them the resultant

In [85]: df
Out[85]: 
   A  B  C
0  0  1  1
1  1  1  0
2  1  1  0
3  0  0  1
4  1  1  0
5  1  1  0
6  0  0  0
7  0  0  0
8  1  1  0
9  1  0  0

In [86]: sets = [set(np.argwhere(v).ravel()) for k,v in df.items()]
    ...: venn3(sets, df.columns)
    ...: plt.show()

Note: if you want to draw an additional box with the number of items not in either of the categories, add those lines:

In [87]: ax = plt.gca()

In [88]: xmin, _, ymin, _ = ax.axes.axis('on')

In [89]: ax.text(xmin, ymin, (df == 0).all(1).sum(), ha='left', va='bottom')

edited Sep 09 '19 at 23:43

answered Sep 08 '19 at 22:59

CT Zhu

52,648
17
120
133

Awesome..exactly what I was looking for. Is there any easy of getting the complement and add it outside the 3 circles covering with a square? – David Sep 09 '19 at 12:49

emskiphoto · Answer 2 · 2023-03-03T15:47:08.580

This function plots a 3-circle Venn diagram from a 3-column Pandas df containing boolean values. Inspired by Python Matplotlib Venn diagram

from matplotlib_venn import venn3

def venn_diagram3_from_df(df):
    """Plots three primary circle Venn diagram from input df containing
    exactly three columns with True/False values.
    Uses venn3 from matplotlib_venn"""
    assert df.shape[1] == 3
    
    a_bool = df.iloc[:,0].values
    b_bool = df.iloc[:,1].values
    c_bool = df.iloc[:,2].values
    
    only_a = a_bool.sum()
    only_b = b_bool.sum()
    only_c = c_bool.sum()

    only_a_b = sum(a_bool & b_bool & ~c_bool)
    only_a_c = sum(a_bool & c_bool & ~b_bool)
    only_b_c = sum(b_bool & c_bool & ~a_bool)

    a_b_c = sum(a_bool & b_bool & c_bool)

    venn3(subsets=(only_a, only_b, only_a_b, only_c, only_a_c, only_b_c, a_b_c),
          set_labels=df.columns.tolist())


x = np.random.randint(2, size=(10,3))
df_ = pd.DataFrame(x, columns=['A', 'B','C']).astype(bool)

venn_diagram3_from_df(df_)

Three Variable Venn Diagram

How to draw venn diagram from a dummy variable in Python Matplotlib_venn?

Background of the Problem:

2 Answers2