0

Another thread solved a similar problem very nicely

But what i would like to do is get rid of some redundancy in my similar problem.

Using their example:

df <- data.frame(name = rep(letters[1:3], each = 3), foo=rep(1:9),var1 = letters[1:3], var2 = rep(3:5, each = 3))

creates:

df
  name foo var1 var2
1    a   1    a    3
2    a   2    a    3
3    a   3    a    3
4    b   4    b    4
5    b   5    b    4
6    b   6    b    4
7    c   7    c    5
8    c   8    c    5
9    c   9    c    5

But what do i need to do to replace multiple characters with unique values?

a=1
b=2
c=3

I tried:

df[,c(4,6)] <- lapply(df[,c(4,6)], function(x) replace(x,x %in% "a", 1), 
                                                             replace(x,x %in% "b", 2),
                                                             replace(x,x %in% "c", 3))

and

z<- c("a","b","c")
y<- c(1,2,3)
df[,c(1,3)] <- lapply(df[,c(1,3)], function(x) replace(x,x %in% z, y))

But neither seem to work.

Thanks.

Taiku
  • 333
  • 1
  • 8
  • 1
    You have 4 columns only. But your code tries to access columsn 4 and 6? – deschen Oct 19 '21 at 22:24
  • And your shown input data is not the one that you‘d get from running your code of creating `df`. – deschen Oct 19 '21 at 22:34
  • Sorry, i hurriedly copy and pasted from another code and tried to exapt some things from my own. Thanks for your solution below! – Taiku Oct 19 '21 at 22:37
  • If your result data frame is the same even after you try all our solutions, then please provide a minimal reproducible example of your data, not just a toy data set. https://stackoverflow.com/help/minimal-reproducible-example – deschen Oct 20 '21 at 06:27

5 Answers5

4

You can use dplyr::recode

df <- data.frame(name = rep(letters[1:3], each = 3), foo=rep(1:9),var1 = letters[1:3], var2 = rep(3:5, each = 3))


library(dplyr, warn.conflicts = FALSE)

df %>% 
  mutate(across(c(name, var1), ~ recode(., a = 1, b = 2, c = 3)))
#>   name foo var1 var2
#> 1    1   1    1    3
#> 2    1   2    2    3
#> 3    1   3    3    3
#> 4    2   4    1    4
#> 5    2   5    2    4
#> 6    2   6    3    4
#> 7    3   7    1    5
#> 8    3   8    2    5
#> 9    3   9    3    5

Created on 2021-10-19 by the reprex package (v2.0.1)

Across will apply the function defined by ~ recode(., a = 1, b = 2, c = 3) to both name and var1.

Using ~ and . is another way to define a function in across. This function is equivalent to the one defined by function(x) recode(x, a = 1, b = 2, c = 3), and you could use that code in across instead of the ~ form and it would give the same result. The only name I know for this is what it's called in ?across, which is "purrr-style lambda function", because the purrr package was the first to use formulas to define functions in this way.

If you want to see the actual function created by the formula, you can look at rlang::as_function(~ recode(., a = 1, b = 2, c = 3)), although it's a little more complex than the one above to support the use of ..1, ..2 and ..3 which are not used here.

Now that R supports the easier way of defining functions below, this purrr-style function is maybe no longer useful, it's just an old habit to write it that way.

df <- data.frame(name = rep(letters[1:3], each = 3), foo=rep(1:9),var1 = letters[1:3], var2 = rep(3:5, each = 3))

library(dplyr, warn.conflicts = FALSE)

df %>% 
  mutate(across(c(name, var1), \(x) recode(x, a = 1, b = 2, c = 3)))
#>   name foo var1 var2
#> 1    1   1    1    3
#> 2    1   2    2    3
#> 3    1   3    3    3
#> 4    2   4    1    4
#> 5    2   5    2    4
#> 6    2   6    3    4
#> 7    3   7    1    5
#> 8    3   8    2    5
#> 9    3   9    3    5

Created on 2021-10-19 by the reprex package (v2.0.1)

IceCreamToucan
  • 28,083
  • 2
  • 22
  • 38
  • what does the '.' (period) do in ```recode()```? – Taiku Oct 19 '21 at 22:41
  • Added an explanation to the answer – IceCreamToucan Oct 19 '21 at 22:51
  • Thanks... for some reason this solution doesnt seem to be working for my code. The dataframe looks the same afterwards with no replacement. – Taiku Oct 19 '21 at 22:54
  • This creates a copy. If you want to modify the original data frame, you have to reassign back to `df` (by putting `df <-` before `df %>%`). – IceCreamToucan Oct 19 '21 at 22:56
  • Right, i tried that, but it is still outputting the exact same dataframe. I tried using ~ and the ```\(x)``` and neither seem to work. No error codes. – Taiku Oct 19 '21 at 22:59
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/238338/discussion-between-taiku-and-icecreamtoucan). – Taiku Oct 19 '21 at 23:02
1

A simple for loop would do the trick:

for (i in 1:length(z)) {
  df[df==z[i]] <- y[i]
}

df

  name foo var1 var2
1    1   1    1    3
2    1   2    2    3
3    1   3    3    3
4    2   4    1    4
5    2   5    2    4
6    2   6    3    4
7    3   7    1    5
8    3   8    2    5
9    3   9    3    5
denisafonin
  • 1,116
  • 1
  • 7
  • 16
1

You could use a lookup vector combined with apply:

z <- c("a","b","c")
y <- c(1,2,3)

lookup <- setNames(y, z)

df[,c(1,3)] <- apply(df[,c(1,3)], 2, function(x) lookup[x])
df

This returns

  name foo var1 var2
1    1   1    1    3
2    1   2    2    3
3    1   3    3    3
4    2   4    1    4
5    2   5    2    4
6    2   6    3    4
7    3   7    1    5
8    3   8    2    5
9    3   9    3    5
Martin Gal
  • 16,640
  • 5
  • 21
  • 39
1

If you are open to a tidyverse approach:

library(tidyverse)

df_new <- df %>%
  mutate(across(c(var1, name), ~case_when(. == 'a' ~ 1,
                                          . == 'b' ~ 2,
                                          . == 'c' ~ 3)))

df_new

  name foo var1 var2
1    1   1    1    3
2    1   2    2    3
3    1   3    3    3
4    2   4    1    4
5    2   5    2    4
6    2   6    3    4
7    3   7    1    5
8    3   8    2    5
9    3   9    3    5

Note, this code works only if you change all values of your column. E.g. if there was a „d“ in your var1 column that you don‘t tuen into a number, it would be changed to NA.

deschen
  • 10,012
  • 3
  • 27
  • 50
  • Thanks deschen. Although, with many of these solutions, for some reason, the dataframe looks the same as the original after reassignment. Its strange. The only one that seems to 'stick' is denisafonin's solution; although, his solution changes every column. – Taiku Oct 19 '21 at 23:54
  • Note that most of the answers here don't change your original `df` data frame, tehy just print the result to the console. So what you want to do is to assign it to either a new object or overwrite it (see update in my post). – deschen Oct 20 '21 at 06:25
0
# Import data: df => data.frame
df <- data.frame(name = rep(letters[1:3], each = 3), foo=rep(1:9),var1 = letters[1:3], var2 = rep(3:5, each = 3))

# Function performing a mapping replacement:
# replaceMultipleValues => function() 
replaceMultipleValues <- function(df, mapFrom, mapTo){
  # Extract the values in the data.frame: 
  # dfVals => named character vector
  dfVals <- unlist(df)
  
  # Get all values in the mapping & data 
  # and assign a name to them: tmp1 => named character vector 
  tmp1 <- c(
    setNames(mapTo, mapFrom), 
    setNames(dfVals, dfVals)
  )

  # Extract the unique values: 
  # valueMap => named character vector
  valueMap <- tmp1[!(duplicated(names(tmp1)))]
  
  # Recode the values, coerce vectors to appropriate
  # types: res => data.frame
  res <- type.convert(
    data.frame(
      matrix(
        valueMap[dfVals], 
        nrow = nrow(df),
        ncol = ncol(df),
        dimnames = dimnames(df)
      )
    )
  )
  
  # Explicitly define the returned object: data.frame => env
  return(res)
}

# Recode values in data.frame: 
# res => data.frame
res <- replaceMultipleValues(
  df, 
  c("a", "b", "c"), 
  c("1", "2", "3")
)

# Print data.frame to console: 
# data.frame => stdout(console)
res
hello_friend
  • 5,682
  • 1
  • 11
  • 15