1

I with to draw the same random numbers with Stata and R. Essentially I want to obtain the same series of random numbers with sample in R and rdiscrete in Stata. However, I have tried to provide a complete, but small, reproducible example in each language.

I think the sample function is doing the same thing as the rdiscrete function, but I am not certain. Assuming these functions are doing the same thing I simply need them to return the same random numbers.

I am using Stata 12.

Here is my R code:

set.seed(1234)

wave_of_cy  = 2
wave_obs = 20

fake_dat <- read.table(text = '
     nobs  p1   p2
      0   .20  .10
      1   .10  .15
      2   .10  .15
      3   .05  .10
      4   .05  .10
      5   .20  .05
      6   .10  .05
      7   .05  .05
      8   .05  .05
      9   .10  .20
', header = TRUE, stringsAsFactors = FALSE)

p_hrand  = fake_dat[, (wave_of_cy+1)]
pp_hrand = p_hrand / sum(p_hrand)

my_rdata = sample(nrow(fake_dat), wave_obs, prob=pp_hrand, replace = TRUE)
my_rdata

hrand    = fake_dat[my_rdata, 1]
hrand

Here is my Stata code:

clear
set seed 1234
global wave_of_cy  = 2
set obs 20
local wave_obs = _N

clear
input nobs p1 p2
0 .20 .10
1 .10 .15
2 .10 .15
3 .05 .10
4 .05 .10
5 .20 .05
6 .10 .05
7 .05 .05
8 .05 .05
9 .10 .20
end
list
save fake_dat

clear

use "fake_dat.dta", replace
putmata fake_data = (nobs p1 p2), replace

mata:
     p_hrand  = fake_data[., $wave_of_cy+1]
     pp_hrand = p_hrand :/ sum(p_hrand)
     my_rdata = rdiscrete(`wave_obs', 1, pp_hrand)
     my_rdata
     hrand    = fake_data[my_rdata, 1]
     hrand
end
Mark Miller
  • 12,483
  • 23
  • 78
  • 132
  • 2
    Unfortunately, you cannot reproduce random seeds even with same ### across platforms. This has been asked with Python and R, Matlab and Python, SAS and Stata, etc. All use different algorithms. Try saving data to disk (csv, txt, etc.) or pass via command line i/o if needing to reuse same data. – Parfait Oct 04 '19 at 22:02
  • 1
    See www.random.org for a cross-platform solution. –  Oct 05 '19 at 00:17
  • 1
    Practically speaking, @Parfait's suggestion is what folks would do 99% of the time. If for some reason you really can't or won't pass data via IO, you need to have R & Stata call the same external code, or else have R call Stata code or vice versa. – JohnE Oct 05 '19 at 13:35

1 Answers1

2

As mentioned, random generation across softwares/languages are not easily replicated as each runs different algorithms even with same seed number. In order to reproduce the same random generation, you will need to interface the two platforms either:

  • With dual language APIs (e.g., rpy2 to run R inside Python, reticulate to run Python inside R, or twister to run Python's random.random() inside Matlab)

  • Run a lower level language like C/C++ to be evoked at application layer between both softwares such as in SAS and Stata;

    This approach is possible here since R is written in C, Fortran, and R and Stata (being a software not language) is written in C so both can call same random number algorithm;

  • Run command line in either platform and export/import resulting data with i/o text processing.

Below demonstrates the last option.


R (calling Stata in batch mode, assumes no blank lines after very last end line)

setwd("C:\\Path\\To\\Working\\Directory")
# RUN DO SCRIPT WHICH OUTPUTS LOG OF SAME NAME
system("C:\\Path\\To\\StataMP-64.exe /e do myStataScript.do")

# READ IN LOG FILE TO CHARACTER VECTOR
stata_log <- readLines("myStataScript.log")

# EXTRACT NEEDED hrand OUTPUT LINES (N=20)
stata_data <- stata_log[(length(stata_log)-26):(length(stata_log)-7)]

# MATRIX BUILD OF EXTRACT AND RETURN SECOND ROW (TO MIRROR STATA'S RESULTS)
sapply(strsplit(stata_data, "\\|"), as.integer)[2,]
# [1] 9 9 1 9 0 9 4 1 0 2 2 2 0 6 2 7 1 5 3 1

Stata (calling Rscript automated executable)

First add needed lines in R sample script:

setwd("C:\\Path\\To\\Working\\Directory")

... original code ...

# SAVE hrand DATA TO DISK
write.csv(data.frame(hrand), "RandomSeedDataSample.csv", row.names = FALSE)

Then run Stata script:

* RUN R SCRIPT
shell "C:\Path\To\R\bin\Rscript.exe" "C:\Path\myRScript.R"

* IMPORT CSV FILE
import delimited using "C:\Path\To\Working\Directory\RandomSeedDataSample.csv", clear

* MATRIX BUILD (TO MIRROR R'S RESULTS)
putmata hrand = (hrand), replace

mata
    hrand
end

:         hrand
        1
     +-----+
   1 |  9  |
   2 |  3  |
   3 |  3  |
   4 |  3  |
   5 |  5  |
   6 |  3  |
   7 |  9  |
   8 |  2  |
   9 |  3  |
  10 |  4  |
  11 |  3  |
  12 |  4  |
  13 |  2  |
  14 |  8  |
  15 |  2  |
  16 |  6  |
  17 |  2  |
  18 |  2  |
  19 |  9  |
  20 |  2  |
     +-----+
Parfait
  • 104,375
  • 17
  • 94
  • 125