-2

How are you all doing?

I have dataframe (df1) in which I am generating 8 new columns (core_) using conditions from other variables partyid, vA, E3019. I am running each column individually and the code works, but I am trying to make it more elegant and trying to run the same code as a loop, and the loop don't work. First I tried:

df1$core_A <- ifelse(df1$partyid == df1$vA, df1$E3019_A, NA)
df1$core_B <- ifelse(df1$partyid == df1$vB, df1$E3019_B, NA)
df1$core_C <- ifelse(df1$partyid == df1$vC, df1$E3019_C, NA)
df1$core_D <- ifelse(df1$partyid == df1$vD, df1$E3019_D, NA)
df1$core_E <- ifelse(df1$partyid == df1$vE, df1$E3019_E, NA)
df1$core_F <- ifelse(df1$partyid == df1$vF, df1$E3019_F, NA)
df1$core_G <- ifelse(df1$partyid == df1$vG, df1$E3019_G, NA)
df1$core_H <- ifelse(df1$partyid == df1$vH, df1$E3019_H, NA)

Notice that the only differences between all the lines are the Letters at the end of the column names. So, the previous codes work. But since it is a repetitive operation and I wanted to make it more elegant, I tried to run the same code in a loop, but somehow it only gives me an error. This is what I am trying:

seq <- LETTERS[seq(1,8)]

for(i in seq){
  df1$core_[[i]] <- ifelse(df1$partyid == df1$v[[i]], df1$E3019_[[i]], NA)
}

I don't understand what is wrong with the code. Can anyone tell me what am I doing wrong? Weren't they supposed to be equivalent?

EDIT: I apologize for not providing a sample of my data. Bellow there is a sample, the 10 first rows of my data. To simplify, instead of going from A to H, I restricted the data from A to C. Let me know if this is enough, I couldn't figure out how to make

structure(list(E1004 = c("AUS_2019", "AUS_2019", "AUS_2019", 
"AUS_2019", "AUS_2019", "AUS_2019", "AUS_2019", "AUS_2019", "AUS_2019", 
"AUS_2019"), partyid = c(NA, NA, 36002, 36002, 36001, 36001, 
NA, NA, NA, 36001), vA = c(36001, 36001, 36001, 36001, 36001, 
36001, 36001, 36001, 36001, 36001), vB = c(36002, 36002, 36002, 
36002, 36002, 36002, 36002, 36002, 36002, 36002), vC = c(36003, 
36003, 36003, 36003, 36003, 36003, 36003, 36003, 36003, 36003
), E3019_A = c(10L, 7L, NA, 0L, NA, 6L, 8L, NA, 8L, 8L), E3019_B = c(4L, 
5L, NA, 9L, NA, 4L, 2L, 10L, 3L, 5L), E3019_C = c(2L, 3L, NA, 
5L, NA, 3L, 5L, NA, 6L, 0L)), row.names = c(NA, 10L), class = "data.frame")

Is this enough? Not sure, it is the first time I am making a question using my original dataset.

r2evans
  • 141,215
  • 6
  • 77
  • 149
Victor Shin
  • 183
  • 9
  • What do you expect `df1$core_[["A"]]` to represent? It will be much easier for people to help if you can provide some sample data so that we can run the code. – Jon Spring Apr 21 '23 at 21:05
  • I want core_A to be equal the value of E3019_A if partyid equals vA, otherwise I want it to be a missing value. The code works when I run column by column, but the loop doesn't work. Maybe I should edit to make it clear the first code works. – Victor Shin Apr 21 '23 at 21:08
  • VictorShin, possibly because we have no sample data, a common reason for downvotes (I did not give it). Can you provide 2-3 columns (for each letter) and 3-5 rows of sample data using `dput`, `data.frame`, or `read.table`? (see https://stackoverflow.com/q/5963269) – r2evans Apr 21 '23 at 21:29
  • 1
    Okey, I will get a sample! Sorry for not providing one. – Victor Shin Apr 21 '23 at 21:32

1 Answers1

1

base R

Enms <- grep("E3019.*", names(df1), value = TRUE)
vnms <- sub("E3019_", "v", Enms)
stopifnot(all(vnms %in% names(df1)))
corenms <- sub("E3019_", "core_", Enms)
newcols <- setNames(Map(function(v, E) ifelse(df1$partyid == v, E, E[NA]), 
                        df1[vnms], df1[Enms]),
                    corenms)
cbind(df1, newcols)
#       E1004 partyid    vA    vB    vC E3019_A E3019_B E3019_C core_A core_B core_C
# 1  AUS_2019      NA 36001 36002 36003      10       4       2     NA     NA     NA
# 2  AUS_2019      NA 36001 36002 36003       7       5       3     NA     NA     NA
# 3  AUS_2019   36002 36001 36002 36003      NA      NA      NA     NA     NA     NA
# 4  AUS_2019   36002 36001 36002 36003       0       9       5     NA      9     NA
# 5  AUS_2019   36001 36001 36002 36003      NA      NA      NA     NA     NA     NA
# 6  AUS_2019   36001 36001 36002 36003       6       4       3      6     NA     NA
# 7  AUS_2019      NA 36001 36002 36003       8       2       5     NA     NA     NA
# 8  AUS_2019      NA 36001 36002 36003      NA      10      NA     NA     NA     NA
# 9  AUS_2019      NA 36001 36002 36003       8       3       6     NA     NA     NA
# 10 AUS_2019   36001 36001 36002 36003       8       5       0      8     NA     NA

The use of E[NA] is to ensure we have the right class of NA (there are ten or so different classes of NA); since ifelse is not class-safe, it's best to enforce it ourselves.

dplyr

library(dplyr)
df1 %>%
  mutate(
    across(
      vA:vC,
      ~ get(sub("^V", "E3019_", cur_column()))[if_else(partyid == ., TRUE, NA)], 
      .names = "{sub('^v','core_',.col)}"
    )
  )
#       E1004 partyid    vA    vB    vC E3019_A E3019_B E3019_C core_A core_B core_C
# 1  AUS_2019      NA 36001 36002 36003      10       4       2     NA     NA     NA
# 2  AUS_2019      NA 36001 36002 36003       7       5       3     NA     NA     NA
# 3  AUS_2019   36002 36001 36002 36003      NA      NA      NA     NA  36002     NA
# 4  AUS_2019   36002 36001 36002 36003       0       9       5     NA  36002     NA
# 5  AUS_2019   36001 36001 36002 36003      NA      NA      NA  36001     NA     NA
# 6  AUS_2019   36001 36001 36002 36003       6       4       3  36001     NA     NA
# 7  AUS_2019      NA 36001 36002 36003       8       2       5     NA     NA     NA
# 8  AUS_2019      NA 36001 36002 36003      NA      10      NA     NA     NA     NA
# 9  AUS_2019      NA 36001 36002 36003       8       3       6     NA     NA     NA
# 10 AUS_2019   36001 36001 36002 36003       8       5       0  36001     NA     NA

The use of get(.) is so that we can reference another column based on "this" column's name, and [ifelse(.., T, NA)] is similar to the safe-NA trick above in the base-R method.

r2evans
  • 141,215
  • 6
  • 77
  • 149
  • Also, unrelated to the question, I would like to learn more about Base R. Is there a book or online course you would recommend? – Victor Shin Apr 21 '23 at 22:56
  • I usually refer to https://adv-r.hadley.nz/, though I don't know that it's perfect for learning-from-scratch. – r2evans Apr 21 '23 at 22:58