1

I have a large data set in which I'm looking to create a new column that renames categorical variables that are in an existing column. The three possible values in the existing column (called "Side") are 'l', 'r', and 'c'. In the new column I want observations labeled 'l' in the existing column to be named 'green', ones named 'r' to be labeled red, and 'c' to be labeled 'yellow'.

I want this:

Individual  Side  
1            l
2            r
3            c
4            r
...

To become this:

Individual  Side     Code
1            l       green
2            r       red
3            c       yellow  
4            r       red
...

My apologies for the relatively basic question--I'm not all that good with loops, etc. Thanks in advance.

887
  • 599
  • 3
  • 15

5 Answers5

3

You could use case_when from the dplyr package:

library(dplyr)

df$Code <- case_when(
    df$Side == "l" ~ "green",
    df$Side == "r" ~ "red",
    df$Side == "c" ~ "yellow",
    TRUE ~ "unknown"
)
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • How would I make that into a new column? Also what does the 'X' represent? When I try to do the code above I get the error "object 'x' not found". Apologies for my confusion. – 887 Aug 27 '20 at 17:31
  • @rogues77 Please check the updated answer. – Tim Biegeleisen Aug 27 '20 at 23:39
0

You can use ifelse() from base R:

#Data
df <- structure(list(Individual = 1:4, Side = c("l", "r", "c", "r")), class = "data.frame", row.names = c(NA, 
-4L))

The code:

#Create label
df$Code <- ifelse(df$Side=='l','green',
                  ifelse(df$Side=='r','red',
                         ifelse(df$Side=='c','yellow',NA)))

Output:

  Individual Side   Code
1          1    l  green
2          2    r    red
3          3    c yellow
4          4    r    red
Duck
  • 39,058
  • 13
  • 42
  • 84
  • Thanks for your answer. Given that my table is several hundred rows long, would thaqt change the code in the answer you provided (given it wouldn't be possible to list out every variable in 'Side'? – 887 Aug 27 '20 at 18:37
  • @rogues77 The code will work, but could you please tell me how many categories you have in `Side` variable? – Duck Aug 27 '20 at 18:42
  • Just 3 categories. I'm just confused why I would write 1:4 and then list out the first four rows that you wrote in the first segment ``` df <- structure(list(Individual = 1:4, Side = c("l", "r", "c", "r")), class = "data.frame", row.names = c(NA, -4L)) ``` Again sorry about the dumb questions – 887 Aug 27 '20 at 18:44
  • @rogues77 What you mentioned is only a dataframe for example. You don't have to write it. You can copy and paste into your console and `df` will appear in environment. In your real data `df` must be your dataframe. So, if you only have three classes the code will work. I hope this solution helped you :) – Duck Aug 27 '20 at 18:47
  • Ah I think I see what you mean. I tried it again, but I'm getting this error: Error in `$<-.data.frame`(`*tmp*`, field, value = logical(0)) : replacement has 0 rows, data has 113 Any ide why that might be? – 887 Aug 27 '20 at 18:57
  • @rogues77 What is the name of your dataframe? – Duck Aug 27 '20 at 18:57
  • @rogues77 I mean the dataframe where you are going to apply the code and that is giving the error! – Duck Aug 27 '20 at 18:58
  • The initial data frame is named "dat." For the sake of the example, I renamed my vectors. The example vector "Side" was initially called 'pff_Hash', and the example variable 'Code' is actually called 'field.' So my actual code is: ``` dat$field <- ifelse(dat$pff_Hash=='L','Right', ifelse(dat$pff_Hash=='R','Left', ifelse(dat$pff_Hash=='C','Middle', NA))) ``` – 887 Aug 27 '20 at 19:05
  • @rogues77 That should work fine. Do you still have error? – Duck Aug 27 '20 at 19:52
0

One other way without the use of if-else is to create a lookup table and read the string values out of that.

# some values as dataframe
dataset <- data.frame(
  Individual = 1:5,
  Side = c("l", "r", "c", "r", "l")
)

# create lookup table
lookup <- list(
  l = "green",
  r = "red",
  c = "yellow"
)

# add column
dataset$Code <- unlist(lookup[n$Side])

# Produces:
#   Individual Side   Code
# 1          1    l  green
# 2          2    r    red
# 3          3    c yellow
# 4          4    r    red
# 5          5    l  green

lookup[n$Side] returns a list with each index containing the corresponding string value from the lookup table. unlist then turns that list into a vector that can be appended to dataset.

Felix Jassler
  • 1,029
  • 11
  • 22
0

Here is a relatively simple way to do this using ifelse

Note that there are inbuilt functions like relabel to do similar tasks in R which are probably more efficient, but also require factor variable type

exampleData <- data.frame(
  Individual = c(1:4),
  Side = c("l", "r", "c", "r")
)

exampleData$Code <- ifelse(exampleData$Side == "l", "green", 
       ifelse(exampleData$Side == "r", "red", "yellow"))
mikeHoncho
  • 317
  • 2
  • 11
0

Another way to approach this is with a merge/join mentality. While a lookup table works very well when it's always 1 single column of data to add, if you ever have more columns then you either do multiple lookups or you can do a merge.

df1 <- structure(list(Individual = 1:4, Side = c("l", "r", "c", "r")), class = "data.frame", row.names = c(NA, -4L))
df2 <- structure(list(Side = c("l", "r", "c"), Code = c("green", "red", "yellow")), class = "data.frame", row.names = c(NA, -3L))

merge(df1, df2, by = "Side", all.x = TRUE)
#   Side Individual   Code
# 1    c          3 yellow
# 2    l          1  green
# 3    r          2    red
# 4    r          4    red

In the tidyverse, this can be done with left_join.

r2evans
  • 141,215
  • 6
  • 77
  • 149