0

I am trying to write a Stata program that does some calculation by an identifier and I wish to make it so that the identifier can be either string or integer.

A grossly simplified version of what I am trying to do is like this:

clear all

***** test data
input str10 id1 id2 x y
a   1   20  40
a   1   140 20
a   1   0   70
b   2   50  25
b   2   25  50
b   2   40  42
end

*****
capture program drop myprog
program define myprog
    version 14.2
    syntax using, ID(varname) Mean(varname)
    tempname postname

    quietly levelsof `id', local(ids)
    local idtype: type `id'

    postfile `postname' `idtype' `id' `mean' `using', replace


    foreach i of local ids {
        quietly summarize `mean' if `id'==`i'
        post `postname' (`i') (`r(mean)')
    }

    postclose `postname'
end

And I expect both of the following to work:

myprog using "test1.dta", id(id1) mean(x)
myprog using "test2.dta", id(id2) mean(x)

Any advice?

Glen Ng
  • 97
  • 4

2 Answers2

2

Just use an if / else statement to distinguish between the two cases:

capture program drop myprog
program define myprog
    version 14.2
    syntax using, ID(varname) Mean(varname)
    tempname postname

    quietly levelsof `id', local(ids)
    local idtype: type `id'

    postfile `postname' `idtype' `id' `mean' `using', replace

    if substr("`idtype'" , 1, 3) == "str" {
        foreach i of local ids {
            summarize `mean' if `id'=="`i'", meanonly 
            post `postname' ("`i'") (`r(mean)')
        }
    } 
    else {
        foreach i of local ids { 
            summarize `mean' if `id'==`i', meanonly 
            post `postname' (`i') (`r(mean)')       
        }
    }

    postclose `postname'
end

Incidentally, note the use of the meanonly option of summarize.

Nick Cox
  • 35,529
  • 6
  • 31
  • 47
  • Thanks! Turns out I was only 10 more minutes of googling away from my answer and I ended up doing something similar to what you did. – Glen Ng Mar 13 '19 at 10:44
0

This is what I ended up doing:

capture program drop myprog
program define myprog
    version 14.2
    syntax using, ID(varname) Mean(varname)
    tempname postname

    quietly levelsof `id', local(ids)
    local idtype: type `id'
    postfile `postname' `idtype' `id' `mean' `using', replace

    capture confirm string variable `id'

    if !_rc {
        foreach i of local ids {
            quietly summarize `mean' if `id'=="`i'"
            post `postname' ("`i'") (`r(mean)')
        }
    }
    else {
        foreach i of local ids {
            quietly summarize `mean' if `id'==`i'
            post `postname' (`i') (`r(mean)')
        }
    }

    postclose `postname'
end

The two near-identical loops look ugly but I guess that's fine.

Glen Ng
  • 97
  • 4
  • 1
    See the revision to @Pearly Spencer's answer, which got closer to yours. You could also use `egen newid = group(id), label` and then post the value labels of `newid`, which are always string. That way, there would be just one loop. – Nick Cox Mar 13 '19 at 10:46
  • Good suggestion. Unfortunately in my actual use-case I will be merging the postfile results back to another "main" dataset and I have to preserve the data type. – Glen Ng Mar 13 '19 at 10:48