Using vector for recoding variables in a dataframe

Question

In a recent project, I have quite a big data frame. And I'd like to reprogram certain variables using a vector that I defined earlier.

I know there are many other ways to recode the data, but I was wondering if I could use the vector because it seems like an elegant solution.

df <- data.frame(
  A = c(1,2,2,1),
  B = c(1,1,1,2),
  C = c(2,2,1,2)
)


vector <- c(
  "A",
  "B"
)

Consider this example. Here I have created a vector, which consists of 2 Names in the Data set. Can I now use this vector to reprogram the data frame? E.g. I'd like to change all '1' to a '0' in the columns 'A' and 'B'.

I tried this:

df[df[,vector]==1] <- 0

Yet this code only works, when i define the Vector like this:

vector <- c(
  "A",
  "B",
  "C"
)

Therefore, when it includes all the variables in the data frame.

If I use the same code, but the vector does only include 'A' and 'B', i get the following error:

Error in `[<-.data.frame`(`*tmp*`, df[, vector] == 2, value = 1) : 
  unsupported matrix index in replacement

Do you have an Idea on how this might work?

Kind regards

`df[, vector] <- replace(df[, vector], df[, vector] == 1, 0)` — Maël, Feb 27 '23 at 13:15
That worked, thanks! Do you think it is also possible to use a vector to change the class of those columns? like so: ```ds[,varnames]<- as.numeric(ds[,varnames])``` That didn't work for me though... — Linus, Feb 27 '23 at 14:03
Nevermind, I figured it out: ```ds[,varnames] <- sapply(ds[,varnames],as.numeric)``` — Linus, Feb 27 '23 at 14:20
Have a look at [Replace all NA with FALSE in selected columns in R](https://stackoverflow.com/q/7279089/10488504) — GKi, Feb 28 '23 at 08:18

score 1 · Answer 1 · answered Feb 27 '23 at 13:32

1

You can use mutate(across()) from dplyr.

mutate(df,across(all_of(vector),\(v) replace(v,v==1,0)))

answered Feb 27 '23 at 13:32

langtang

22,248
1
12
27

GKi · Answer 2 · 2023-02-27T15:52:04.297

A base way could be to subset df with vector and then subset this where df[vector]==1.

df[,vector][df[,vector]==1] <- 0
#df[vector][df[vector]==1] <- 0 #Alternative

df
#  A B C
#1 0 0 2
#2 2 0 2
#3 2 0 1
#4 0 2 2

Another way could be to use a for loop.

for(i in vector) df[[i]][df[[i]]==1] <- 0
#for(i in vector) df[,i][df[,i]==1] <- 0 #Variant

Benchmark

bench::mark(check=FALSE,
langtang = local({df <- dplyr::mutate(df,dplyr::across(all_of(vector),\(v) replace(v,v==1,0)))}),
"Maël" = local({df[, vector] <- replace(df[, vector], df[, vector] == 1, 0)}),
GKi = local({df[,vector][df[,vector]==1] <- 0}),
GKi2 = local(for(i in vector) df[,i][df[,i]==1] <- 0),
GKi3 = local(for(i in vector) df[[i]][df[[i]]==1] <- 0)
)
#  expression      min median itr/s…¹ mem_al…² gc/se…³ n_itr  n_gc total…⁴ result
#  <bch:expr> <bch:tm> <bch:>   <dbl> <bch:by>   <dbl> <int> <dbl> <bch:t> <list>
#1 langtang     2.66ms    3ms    299.   7.89KB    8.37   143     4   478ms <NULL>
#2 Maël       219.56µs  241µs   4017.     280B   12.3   1955     6   487ms <NULL>
#3 GKi        222.48µs  243µs   4013.     280B   12.3   1951     6   486ms <NULL>
#4 GKi2       106.96µs  116µs   8452.     280B   12.3   4119     6   487ms <NULL>
#5 GKi3        60.75µs   65µs  15217.     280B   14.4   7398     7   486ms <NULL>

The for loop is about 3 times faster than the other base variants and about 50 times faster than the dplyr variant. All base variants use less memory compared to the dplyr variant.

Using vector for recoding variables in a dataframe

2 Answers2