The problem is how to find out the correlation between two categorical [series] items? the situation is like that i have to find out the correlation between HAVING_CPOX and NUM_VECILLA_veccine Given among children the main catch is that in HAVING CPOX COLUMNS have 4 unique value
- 1-Having cpox
- 2-not having cpox
- 99- may be NULL
- 7 i don't know
in df['P_NUMVRC']
: unique value is [1, 2, 3, 0, Nan,]
two different distinct series SO how do find put them together and find the correlation
I use value_counts for having frequency of each?
1 13781
2 213
3 1
Name: P_NUMVRC, dtype: int64 For having_cpox columns
2 27955
1 402
77 105
99 3 Name: HAD_CPOX, dtype: int64
the requirement is like this
A positive correlation (e.g., corr > 0) means that an increase in had _ch ickenpox_column (which means more no’s) would also increase the values of um_chickenpox_vaccine_column (which means more doses of vaccine). If there is a negative correlation (e.g., corr < 0), it indicates that having had chickenpox is related to an increase in the number of vaccine doses.