0

I'm pretty new to R, and I'm trying to learn how to do some simulations. Currently I have a program that does the following:

  1. In one function, uses a DGP to create fake data, which is returned as a tibble
  2. In another function, randomly assign fake observations to treatment
  3. In the final function, merge random assignment results with fake data and run regression. I return a list that includes the estimate and p-value using the below code
    tauhat <- trobust %>% filter(term=="TTRUE") %>% pull(estimate)
    pvalue <- trobust %>% filter(term=="TTRUE") %>% pull(p.value)
    return(list(tauhat,pvalue))

If I run these functions once, I get something like the below

> finitepop(finiteN=20)
[[1]]
[1] 0.3730686

[[2]]
[1] 0.03445962

I then use replicate to repeat that process say 100 times. I end up with a 2X100 thing - perhaps it's an array? - and I'd like to turn that into a 100X2 tibble, that is, a tibble with columns for the estimate and p-value and simulation results stored as observations. The results from the sim look like

> finitesim <- (replicate(n=reps,finitepop(finiteN=20)))
> finitesim
     [,1]        [,2]      [,3]      [,4]       [,5]      [,6]      [,7]     
[1,] -0.03096849 0.206797  0.2386698 0.09374408 0.1462773 0.2479394 0.2177207
[2,] 0.8850678   0.2622687 0.2105784 0.5990369  0.3279901 0.1063231 0.2489028
     [,8]      [,9]       [,10]       [,11]     [,12]      [,13]     
[1,] 0.1661424 0.00977172 -0.08761129 0.1170922 -0.1559203 0.278062  
[2,] 0.2086819 0.9390261  0.6071284   0.472165  0.4214389  0.05973561

How should I convert the results to a nice tibble?

EDIT: Below is a MWE, where for convenience I changed the right hand side variable to x, and I didn't create the clustering structure for lm_robust

library(tidyverse)
library(lmerTest)                       #for lmer
library(merTools)                       #for lmer
library(estimatr)                       #for cluster robust se


finitepop <- function(finiteN){
    
    fakedata <- tibble(
        id=1:finiteN,
        x=rnorm(n=finiteN),
        y=rnorm(n=finiteN)
        )
    robust <- lm_robust(data=fakedata,y~x,cluster=id)
    trobust <- tidy(robust)
    tauhat <- trobust %>% filter(term=="x") %>% pull(estimate)
    pvalue <- trobust %>% filter(term=="x") %>% pull(p.value)
    return(list(tauhat,pvalue))
    }


finitesim <- (replicate(n=10,finitepop(finiteN=20),simplify=FALSE))

finitesim
PaulB
  • 299
  • 1
  • 11
  • This is a good start, but it's going to be very difficult for others to help you without a [reproducible example](https://stackoverflow.com/a/5963610/2572423). – JasonAizkalns Mar 26 '21 at 12:31
  • Right, thanks for the reminder. I've edited with a minimal working example – PaulB Mar 26 '21 at 12:50

1 Answers1

2

Use can use map_df from the purrr package (part of tidyverse):

finitepop <- function(finiteN){
  
  fakedata <- tibble(
    id=1:finiteN,
    x=rnorm(n=finiteN),
    y=rnorm(n=finiteN)
  )
  robust <- lm_robust(data=fakedata,y~x,cluster=id)
  trobust <- tidy(robust)
  tauhat <- trobust %>% filter(term=="x") %>% pull(estimate)
  pvalue <- trobust %>% filter(term=="x") %>% pull(p.value)
  print(pvalue)
  return(tibble(tauhat,pvalue))
}

finitesim <- replicate(n=10,finitepop(finiteN=20),simplify=FALSE) %>%
  purrr::map_df(as.data.frame)

> finitesim
         tauhat     pvalue
1   0.035057186 0.89818890
2  -0.248569087 0.24159959
3   0.111054217 0.75700470
4   0.596779950 0.00223398
5  -0.004052686 0.98418837
6  -0.105390590 0.67410417
7  -0.107913504 0.54778478
8  -0.021681712 0.89834059
9  -0.161811559 0.49091499
10  0.241477999 0.21281508
Will Hipson
  • 366
  • 2
  • 9
  • When I try this on my example my output is a mess; I get columns titled "X.0.27065" with most cells having "NA." Did you create this output using my example? – PaulB Mar 26 '21 at 13:22
  • Sorry, yes I forgot to include that I changed the return value from a `list` to a `tibble`. See revised answer. – Will Hipson Mar 26 '21 at 13:27