0

I am trying to recode a variable that indicates total number of responses to a multiple response survey question. Question 4 has options 1, 2, 3, 4, 5, 6, and participants may choose one or more options when submitting a response. The data is currently coded as binary outputs for each option: var Q4___1 = yes or no (1/0), var Q4___2 = yes or no (1/0), and so forth.

This is the tabstat of all yes (1) responses to the 6 Q4___* variables

  Variable |       Sum
-------------+----------
      q4___1 |        63
      q4___2 |        33
      q4___3 |         7
      q4___4 |         2
      q4___5 |         3
      q4___6 |         7
------------------------
total = 115

I would like to create a new variable that encapsulates these values.

Can someone help me figure out how to create this variable, and if coding a variable in this manner for a multiple option survey question is valid?

When I used the replace command the total number of responses were not adding up, as shown below

  gen q4=. 
    replace q4 =1 if q4___1 == 1
    replace q4 =2 if q4___2 == 1
    replace q4 =3 if q4___3 == 1 
    replace q4 =4 if q4___4 == 1
    replace q4 =5 if q4___5 == 1
    replace q4 =6 if q4___6 == 1
    label values q4 primarysource`

      q4 |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |         46       48.94       48.94
          2 |         31       32.98       81.91
          3 |          6        6.38       88.30
          4 |          1        1.06       89.36
          5 |          3        3.19       92.55
          6 |          7        7.45      100.00
------------+-----------------------------------
      Total |         94      100.00

UPDATE to specify I am trying to create a new variable that captures the column sum of each question, not the rowtotal across all questions. I know that 63 participants responded yes to question 4 a) and 33 to question 4 b) so I want my new variable to reflect that.

This is what I want my new variable's values to look like.

q4
-------------+----------
      q4___1 |        63
      q4___2 |        33
      q4___3 |         7
      q4___4 |         2
      q4___5 |         3
      q4___6 |         7
------------------------
total = 115   
devlex
  • 1
  • 1

1 Answers1

0

The fallacy here is ignoring the possibility of multiple 1s as answers to the various Q4???? variables. For example if someone answers 1 1 1 1 1 1 to all questions, they appear in your final variable only in respect of their answer to the 6th question. Otherwise put, your code overwrites and so ignores all positive answers before the last positive answer.

What is likely to be more useful are

(1) the total across all 6 questions which is just

egen Q4_total = rowtotal(Q4????)

where the 4 instances of ? mean that by eye I count 3 underscores and 1 numeral.

(2) a concatenation of responses that is just

egen Q4_concat = concat(Q4????)

(3) a variable that is a concatenation of questions with positive responses, so 246 if those questions were answered 1 and the others were answered 0.

gen Q4_pos = "" 

forval j = 1/6 { 
    replace Q4_pos = Q4_pos + "`j'" if Q4____`j' == 1 
}

EDIT

Here is a test script giving concrete examples.

clear 
set obs 6 
forval j = 1/6 {
    gen Q`j' = _n <= `j'
}

list 

egen rowtotal = rowtotal(Q?)

su rowtotal, meanonly 

di r(sum)

* install from tab_chi on SSC
tabm Q? 

Results:

. list 

     +-----------------------------+
     | Q1   Q2   Q3   Q4   Q5   Q6 |
     |-----------------------------|
  1. |  1    1    1    1    1    1 |
  2. |  0    1    1    1    1    1 |
  3. |  0    0    1    1    1    1 |
  4. |  0    0    0    1    1    1 |
  5. |  0    0    0    0    1    1 |
     |-----------------------------|
  6. |  0    0    0    0    0    1 |
     +-----------------------------+
 
. egen rowtotal = rowtotal(Q?)
. su rowtotal, meanonly 
. di r(sum)
21

. tabm Q? 

           |        values
  variable |         0          1 |     Total
-----------+----------------------+----------
        Q1 |         5          1 |         6 
        Q2 |         4          2 |         6 
        Q3 |         3          3 |         6 
        Q4 |         2          4 |         6 
        Q5 |         1          5 |         6 
        Q6 |         0          6 |         6 
-----------+----------------------+----------
     Total |        15         21 |        36 
Nick Cox
  • 35,529
  • 6
  • 31
  • 47
  • I tried using the code and my numbers shown here unfortunately dont match the initial tabstat table I included above ```egen Q4_total = rowtotal(Q4___*)``` q4 | Freq. Percent Cum. ------------+----------------------------------- 0 | 8 7.84 7.84 1 | 77 75.49 83.33 2 | 14 13.73 97.06 3 | 2 1.96 99.02 4 | 1 0.98 100.00 ------------+----------------------------------- Total | 102 100.00 – devlex Nov 02 '22 at 16:48
  • That's hardly readable or reproducible. See EDIT of answer. – Nick Cox Nov 02 '22 at 20:03
  • Hi I see your edit, what i am looking for is the total of "1"s and "0s" in each column not row. I want my new variable to have the sum of 1's in each column/question. Given my data i have calculated that 63 individuals responded yes to question 1 and 33 responded yes to question 2, 7 to question 3, 2 to question 4, 5 to question 3 and 7 to question 4: so my new variable should look like q4 `1------63 2------33 3------7 4------2 5------3 6------7 total ----115` Sorry am new to stack overflow, confused on the formatting for replies – devlex Nov 03 '22 at 16:16
  • Here you should not post answers unless you have one but you can and should always edit your question if you wish or need to make it clearer. The idea of putting summaries of variables in new variables makes no obvious sense in Stata without a specific reason for wanting them to be variables — because in this case they would just be six constants if I get your gist. Otherwise put, your new variable is in effect a row vector whereas Stata variables are all column vectors. – Nick Cox Nov 03 '22 at 17:55