Question updated!!
I have 15 columns of categorical variables and I want the correlation among them. The data set is 20,000+ long and the data set looks like this:
state | job | hair_color | car_color | marital_status
NY | cs | brown | blue | s
FL | mt | black | blue | d
NY | md | blond | white | m
NY | cs | brown | red | s
Notice that 1st row and last row NY
, cs
, and s
repeats. I want to find out that kind of patterns. NY and cs is highly correlated. I need to rank the combination of values in the columns. Hope now the question make sense. Please notice that is NOT counting NY
or cs
. Is about finding out how many times NY
and blond
appears together in the same row. I need to do that for all values by row. Hope now this make sense.
I tried to utilize cor()
with R but since these are categorical variables the function doesn't work. How can I work with this data set to find the correlation among them?