-1

I'm translating Stata code to R code, but now I'm having some n00b troubles like this one.

This is my Stata code:

gen     aposentadofam=1 if proprendaposent  > 0 & proprendaposent ~=.;
replace aposentadofam=0 if proprendaposent == 0 | proprendaposent ==.;

And this is what I tried to do in R:

# pemg <- mutate(pemg, aposentadofam = NA_real_)
  # pemg <- mutate(pemg, aposentadofam = case_when(proprendaposent >0 & !is.na(proprendaposent) ~ 1, TRUE ~ aposentadofam))
  # pemg <- mutate(pemg, aposentadofam = case_when(proprendaposent==0 | is.na(proprendaposent) ~ 0, TRUE ~ aposentadofam))

The line with is.na() seems to be running correctly, but the one with !is.na() does not. It gives me this error message:

LHS of case 1 (`proprendaposent > 0 & !is.na(proprendaposent) ~ 1`) must be a logical vector, not a `formula` object.

What should I do?

  • 2
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Dec 02 '21 at 03:08
  • The code given isn't idiomatic Stata. Your lengthy variable names are no doubt informative and appropriate in your case, but to focus on principles let's use something shorter. The form `gen wanted = foo > 0 & foo != .` would get you there in one. Although it may not bite in your context, your code would map extended missing values `.a` to `.z` to 0 too. `gen wanted = foo > 0 & !missing(foo)` would be fine too. I copy there the implication that `foo` is never negative. Although this comment is entirely about Stata style, I doubt that you're obliged to use two lines in R. – Nick Cox Dec 02 '21 at 09:02
  • Thanks for the feedback! This Stata code is not mine, so the `!missing()` is something I can't use much. And this double code lines will repeat several times through the code, so... – Igor Mendonça Dec 02 '21 at 14:28

2 Answers2

2

Not enough reputation to comment (yet!) but I just ran the following using your example code (in R) with no issues. How exactly does your data/code differ?

library(dplyr)

pemg <- data.frame(c(1, 2, 3.1, 4, 5.5, 0, 0, 0, NA))
colnames(pemg) <- "proprendaposent"

pemg <- mutate(pemg, aposentadofam = NA_real_)
pemg <- mutate(pemg, aposentadofam = case_when(proprendaposent >0 & !is.na(proprendaposent) ~ 1, TRUE ~ aposentadofam))
pemg <- mutate(pemg, aposentadofam = case_when(proprendaposent==0 | is.na(proprendaposent) ~ 0, TRUE ~ aposentadofam))
pemg

which outputs:

  proprendaposent aposentadofam
1             1.0             1
2             2.0             1
3             3.1             1
4             4.0             1
5             5.5             1
6             0.0             0
7             0.0             0
8             0.0             0
9              NA             0
1

Often, within() is most illustrative.

dat <- within(dat, {
  aposentadofam <- NA
  aposentadofam[proprendaposent > 0 & !is.na(proprendaposent)] <- 1
  aposentadofam[proprendaposent == 0 | is.na(proprendaposent)] <- 0
})

Or using transform().

dat <- transform(dat, aposentadofam=ifelse(proprendaposent %in% c(0, NA), 0, 1))

Both functions come with base R, so you won't need any extra packages (which is rather rarely the case anyway).

#    proprendaposent aposentadofam
# 1                0             0
# 2                4             1
# 3                0             0
# 4                0             0
# 5                1             1
# 6                3             1
# 7                1             1
# 8                1             1
# 9                0             0
# 10              NA             0
# 11              NA             0
# 12               3             1

Data

dat <- structure(list(proprendaposent = c(0L, 4L, 0L, 0L, 1L, 3L, 1L, 
1L, 0L, NA, NA, 3L)), class = "data.frame", row.names = c(NA, 
-12L))
jay.sf
  • 60,139
  • 8
  • 53
  • 110