Unique combinations of variables in Stata

Question

I need assistance with getting a Stata code that can get me unique combinations of varibles. I have 7 variables and I need to run a code that can give me a unique combination of all of these variables. Every row will be a unique combination of all 7 variables.

An example: V1: A, B, C V2: 1, 2, 3 A1 A2 A3, B1 B2 B3, C1 C2 C3

Unique combination of all variables - total 9 combinations.

I have 15000 observations. I got a code in R but R won't get the output on a large data (memory error). I want to get this in Stata.

Thanks everyone, here is the link for this same question I need help with in R. http://stackoverflow.com/questions/27264952/unique-combinations-of-all-variables/27265123?noredirect=1#comment43038726_27265123 — Freewill, Dec 05 '14 at 03:30
I guess that's an improvement but only if a Stata user has knowledge of or doesn't mind studying R. @Nick and I have mentioned some options. Did you try them? If yes, why didn't they work for you? If no, why not? — Roberto Ferrer, Dec 05 '14 at 03:40
Roberto - I'm the one who posted the question. I'm not very familiar with STATA, so really don't understand when you say "check out egen, group()" or "try installing groups from SSC" or "sounds more like fillin". These are all words alien to me so I need direction on what is that I'm trying etc. — Freewill, Dec 05 '14 at 14:43
It's difficult to advise if you have not moved beyond the very basics. Only people with a lot of free time (and other properties) will try to explain the fundamentals. The shaded text that appears throughout is by definition, code. So you can insert that in the Stata command window and see what it does. `help ` is one way of getting help in Stata. The recommendation is to read the first chapters of the **Stata user's guide**. It comes bundled with your Stata installation. Go to **Help > PDF Documentation** in the menu bar, to get started. — Roberto Ferrer, Dec 05 '14 at 15:05
It works both ways: if you ask a poor question that can't be decoded, you won't get much attention. People are **very willing** to answer specific, well explained questions **if you put in some work first**. — Nick Cox, Dec 06 '14 at 12:36

Nick Cox · Answer 1 · 2014-12-05T09:07:13.860

3

It is not especially clear what you want created or done. There is no code here, not even R code showing how what you want is done in R. There is no reproducible example.

You might want to check out egen, group(). (A previous answer to this effect from @Dimitriy V. Masterov, an experienced user of Stata, was twice incorrectly deleted as spam, presumably by people not knowing Stata.)

Alternatively, try installing groups from SSC.

UPDATE: The answer sounds more like fillin. For "unique" read "distinct".

edited Dec 05 '14 at 09:07

answered Dec 04 '14 at 11:09

Nick Cox

35,529
6
31
47

It isn't that we are ignorant of anything, it is that *six words* padded with nonsense to beat the spam filter don't constitute a valid, quality answer on [SO]. The system keeps flagging the answer as low quality and putting it in the review queue. It would have been just as easy to write something less terse and superficially less spammy, as you have done, and the problem would disappear. – talonmies Dec 04 '14 at 19:21
@talonmies My point is just that a human, acting in good faith and with good intentions, made the wrong decision, and I tried my best to correct that. No moderator can know all of the languages covered here, which is where other users can help. – Nick Cox Dec 04 '14 at 19:42

score 0 · Answer 2 · answered Dec 22 '14 at 23:28

Bit of a late response, but I just stumbled across this today. If I understand the question, Something like this should do the trick, although I'm not sure it's easily applied to more complex data or if this would even be the best way...

* Create Sample Data
clear
set obs 3
gen str var1 = "a" in 1
replace var1="b" in 2
replace var1="c" in 3
gen var2= _n 

* Find number of Unique Groupings to set obs
by var1 var2, sort: gen groups=_n==1
keep if groups==1
drop groups
di _N^2
set obs 9

* Create New Variable

forvalues i = 4(3)9 {
    forvalues j = 5(3)9 {
        forvalues k = 6(3)9 {
        replace var1="a" if _n==`i'
        replace var1="b" if _n==`j'
        replace var1="c" if _n==`k'
        }
    }
}

sort var1
egen i=seq(), f(1) t(3)

tostring i, replace
gen NewVar=var1+i
list NewVar


     +--------+
     | NewVar |
     |--------|
  1. |     a1 |
  2. |     a2 |
  3. |     a3 |
  4. |     b1 |
  5. |     b2 |
     |--------|
  6. |     b3 |
  7. |     c1 |
  8. |     c2 |
  9. |     c3 |
     +--------+

Unfortunately as far as I know, there is no easy way to do this - it will require a fair amount of code. Although, I saw another answer or comment that mentioned cross which could be very useful here. Another command worth checking out is joinby. But even with either of these methods, you will have to split your data into 7 different sets based on the variables you want to 'cross combine'.

Anyway, Good Luck if you haven't yet found your solution.

This is a chunk of code with no commentary on what it does or how it might be generalised. Note that `if _n ==` is a clumsy alternative to `in`. — Nick Cox, Dec 23 '14 at 09:24
Thanks alot for the code. I ended up giving up on STATA and collapsing a few items in my input dataset using factors to come up with the unique combinations I was looking for. R was able to generate options that I could use. I used the following : 'pairs=unique(expand.grid(V1,V2,V3,V4,V5,V6))' — Freewill, Dec 27 '14 at 21:54

score 0 · Answer 3 · answered Apr 17 '17 at 22:15

If you just want the combination of that 7 variables, you can do it like this:

    keep v1 v2 v3 v4 v5 v6 v7
    duplicates drop
    list

Then you will get the list of unique combinations of those 7 variables. You can save the file with different name from the original dataset. Please make sure that you do not save the dataset directly. Otherwise you will lose your original data.

Unique combinations of variables in Stata

3 Answers3