1

Basically the same problem that this question contains but in Stata and for integer values. (fastest way to detect if a value is in a set of values in Javascript)

Run the following code in Stata:

set obs 9

generate var1 = 1 in 1
replace var1 = 2 in 2
replace var1 = 3 in 3
replace var1 = 4 in 4
replace var1 = 5 in 5
replace var1 = 6 in 6
replace var1 = 7 in 7

generate var2 = 6 in 1
replace var2 = 5 in 2
replace var2 = 4 in 3
replace var2 = 3 in 4
replace var2 = 2 in 5
replace var2 = 1 in 6
replace var2 = 58 in 7
replace var2 = 69 in 8
replace var2 = 51 in 9

The idea is simple. If for example the value 5 (in var1) occurs in the set of all values contained in var2, I want to create var3 and stick a "yes" (in newly created var3) next to the 5 in var1 and "no" otherwise. So for example there would be a "no" (in newly created var3) next to 7 (in var1) because 7 isn't in any of the values contained in var2.

Community
  • 1
  • 1

2 Answers2

0

A brute force method for doing this, if I understand correctly, is:

gen var3="no" 
local N=_N 

forval i=1/`N' { 
    replace var3 = "yes" if var1 == var2[`i']
}

If you're not wed to your current data structure, it will be more efficient just to loop through all of the values you're looking for:

gen var3="no" 
foreach i in 1 2 3 4 5 6 58 69 51 {
    replace var3 = "yes" if var1==`i'
}
atkat12
  • 3,840
  • 7
  • 22
  • 22
0

An alternative method if a loop is somehow difficult to construct would be to use a merge:

clear
input float(var1 var2)
1  6
2  5
3  4
4  3
5  2
6  1
7 58
. 69
. 51
end

tempfile original
save `original'          * Save data (presumably you have this on disk already)

drop var1                * Keep only the key variable for the merge
rename var2 var1         * Rename for merge   

tempfile set2
save `set2'              * save file with only values from var2 in original set

use `original', clear

merge m:1 var1 using `set2'

drop if _merge == 2     * Drop new observations created for values only in var2

list, sepby(_merge)

Here, you will notice that the _merge variable created during the merge contains information on which values of var1 exist in var2. From this point it is somewhat trivial to destring and update the values to "yes" and "no", or to create a new variable conditional on the values of _merge.

Note also that the merge is likely to be significantly faster than a loop if you have a large dataset.

ander2ed
  • 1,318
  • 1
  • 11
  • 19