-1

I have a dataframe with multiple choice questions, which have up to 25 different options. For each of these questions the options (from a SurveyMonkey download) get their own column - so there are as many columns as there are options and the string will appear in that column if the person has ticked that option.

Question: Select the three populations from this list that applies to you. The 20 options are: "We serve everyone", "Children", "Youth", "Older Adults", .....

My dataframe looks like this - The responses are organized under variable names Popn1 .... Popn20:

ID    Popn1                Popn2             Popn3 ......... Popn20
A     We serve everyone    Children          NA              NA 
B     NA                   Children          Youth           "String" 
C     We serve everyone    NA                NA              "String"
D     NA                   Children          Youth           "String"

...

In SAS, I would turn these into binary variables (1/0, 1 if they selected the option, 0 if they did not). Each variable from Popn1 - Popn20 would be in the form of 1/0. I would then use a proc tabulate to get a sum (N) and mean (percent) and get one frequency table of these values.

In R, is there a way I can run a frequency table of all these variables, to get the count, as well as percent, by number of responses received for the question, without turning the columns into binary variables? All the options I've seen suggest to do this first, but I'm looking for an alternative solution if one exists, of somehow counting the number of instances of each string.

Preferred output:

  Frequency counts for question: Select the 3 populations that apply to you.
                       n    %  
  We serve everyone    6    10%
  Children             10   16%
  Youth                5    xx% 
  ....
  Label of Popn20      X    xx%

Hope this is a bit clearer than my earlier post.

Pre
  • 111
  • 7
  • 1
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. The code you posted isn't valid R. Are those values supposed to be quoted? What exactly do you want to take the mean of? – MrFlick Nov 18 '20 at 04:50
  • Your question needs more explanation please! Please include a minimal data and a desired sample output so that your problem can be understood – AnilGoyal Nov 18 '20 at 05:15
  • Thank you for the suggestions - I have updated my original post – Pre Nov 18 '20 at 14:04
  • Why don't you want to convert it to a binary variable first? It's not difficult to do.... At the very least, you should rename your columns from "Popn1 ... Popn20" to the actual choices they'd contain, then you can use a function like [`multi.table`](https://juba.github.io/questionr/reference/multi.table.html) from the ["questionr" package](https://juba.github.io/questionr/index.html) or roll out your own solution with [@omri-newett's answer](https://stackoverflow.com/a/64887393/1270695) as a starting point. – A5C1D2H2I1M1N2O1R2T1 Nov 18 '20 at 18:16
  • I named them popn1...popn20 etc so I wouldn't have to write out each variable name if I could create an object with the paste function. The package (https://www.rdocumentation.org/packages/questionr/versions/0.7.2/topics/multi.table) looks useful for this! thanks for pointing it out. This is also my first big R project so I'm basically swimming against the current here. – Pre Nov 18 '20 at 19:29
  • @A5C1D2H2I1M1N2O1R2T1 is it true that this package wouldn't work for 2way frequency tables with multiple choice variables? I'd have to do a couple two way tables as well. – Pre Nov 18 '20 at 21:35
  • @Pre, What would be an example of a 2-way frequency table's input and output from your data? The "questionr" package also has a `cross.multi.table` function that you might want to look at. – A5C1D2H2I1M1N2O1R2T1 Nov 18 '20 at 22:10
  • For the column names, if you are sure there would be at least one non-NA value in each column, recreating names shouldn't be tough... – A5C1D2H2I1M1N2O1R2T1 Nov 18 '20 at 22:12
  • @A5C1D2H2I1M1N2O1R2T1 Yes i think few, if any columns are empty. Most have an option. I think i was thinking about the variable names differently, because I was thinking I would use variable labels to make the output meaningful. I actually spent a couple hours creating variable names in Excel, then read them in to align with each column. I read in the excel dataset without the first two obs bc the survey monkey headers are messy. Doesn't the package work with labels? – Pre Nov 18 '20 at 23:04
  • @A5C1D2H2I1M1N2O1R2T1 How do I convert each of those variables into binomial? https://www.researchgate.net/post/How-to-analyse-multiple-choice-questions-using-R is this the best way? – Pre Nov 19 '20 at 02:13
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/224769/discussion-between-a5c1d2h2i1m1n2o1r2t1-and-pre). – A5C1D2H2I1M1N2O1R2T1 Nov 19 '20 at 02:59

1 Answers1

1

If I understand your question correctly, here is a simple if inelegant solution.

counts <- colSums(!is.na(df))
percents <- counts/nrow(count)
Austin Graves
  • 1,044
  • 5
  • 12