0

EDIT - Improved the question by including a reproducible example and giving more clarity to my issues

Hi, my issue is that I have to translate this Stata code to R for it to be used in a large dataset:

sort UF UPA Ano Trimestre
            loc j = 1
            loc stop = 0 
            loc count = 0
            while `stop' == 0 {
                loc lastcount = `count'
                count if p201 == . & n_p == `i'+1 
                loc count = r(N)
                if `count' == `lastcount' {
                    loc stop = 1
                }
                else {
                    if r(N) != 0 {
                        replace p201 = p201[_n - `j'] if
        UF == UF[_n - `j'] &
        UPA == UPA[_n - `j'] &
        n_p == `i'+1 & n_p[_n - `j'] == `i' & 
        p201 ==. & forw[_n - `j'] != 1 &
        replace forw = 1 if UF == UF[_n + `j'] &
        UPA == UPA[_n + `j'] &
        p201 == p201[_n + `j'] &
        n_p == `i' & n_p[_n + `j']==`i'+ 1 &
        forw != 1
        loc j = `j' + 1
                    }
                    else {
                        loc stop = 1
                    }
                }
            }
            replace back = p201 !=. if n_p == `i'+1
            replace forw = 0 if forw != 1 & n_p == `i'
        }

My dataset is huge and more complex than the example posted below. I would like to understand mainly what is the usefulness of the while loop involving j.

Here is a toy example and the desired result in R:

start <- data.frame(
  Ano = c(2012, 2012, 2012, 2012),
  Trimestre = c("1", "2", "3", "4"),
  UF = c(28, 28, 28, 28),
  UPA = c(280020150, 280020150, 280020150, 280020150),
  n_p = c(1, 2, 3, 4),
  p201 = c(1, NA, NA, NA),
  back = c(NA, NA, NA, NA),
  forw = c(NA, NA, NA, NA)
)

end <- data.frame(
  Ano = c(2012, 2012, 2012, 2012),
  Trimestre = c("1", "2", "3", "4"),
  UF = c(28, 28, 28, 28),
  UPA = c(280020150, 280020150, 280020150, 280020150),
  n_p = c(1, 2, 3, 4),
  p201 = c(1, 1, 1, 1),
  back = c(NA, 1, 1, 1),
  forw = c(1, 1, 1, 0)
)

Mainly, in the dataset there are multiple possible combinations for UF, UPA that identify the individual. Ano and Trimestre denote year and trimesters.

It seems as if the dataset is only matching all rows with the same UF-UPA by having them all according to the first value of p201 in each group. Variables back and forw equal 1 if an observation is paired with some other one in a past or future date.

My question then is if someone can help me say what are the while and j's for? I am not sure if the code could be greatly simplified in R by only using group_by from dplyr. I am not sure even if a for loop would be required. However, I am not sure if this is only because of the particular subset of the data I have posted here or if these parts are indeed necessary. Is there a clever way to find out by testing some other stuff?

  • 1
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Just asking for code translation is off-topic. Say what the code needs to do and describe where exactly you are getting stuck during your translation. – MrFlick Jul 04 '20 at 01:40
  • I can't exactly do that, since I only have raw databases with and without running the whole code. Furthermore, I can't take a sample of it and paste it here – Arthur Carvalho Brito Jul 04 '20 at 02:03
  • @MrFlick Please refer now to the updated question – Arthur Carvalho Brito Jul 04 '20 at 04:02
  • 3
    @ArthurCarvalhoBrito In a [mcve] we may use sample data, _help: [how-to-make-a-great-r-reproducible-example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610). You also need to focus on one problem to be on-topic. It's probably better to express what you want to do in R and show some attempts, rather than waiting for a translation from Stata to R. – jay.sf Jul 04 '20 at 06:38
  • @jay.sf The thing is that I've done exactly that in producing an exemple in R. I still think the Stata code is important since I am not sure the exemple I've posted is general enough - I dont get the part with the j subscripts and the while loops – Arthur Carvalho Brito Jul 04 '20 at 07:15

1 Answers1

3

I can't read Stata code but from your text description it sounds like just a bit of dplyr will work for you

library(dplyr)
start %>% 
  group_by(UF, UPA) %>% 
  mutate(
    p201 = first(p201),
    back = row_number()>1,
    forw = row_number()<n()
  )
Nick Cox
  • 35,529
  • 6
  • 31
  • 47
MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • I've been ruminating for a while how to replicate SAS by group processing in R, and your answer to this question helped me figure it out. Upvoted. – Len Greski Jul 04 '20 at 13:00