-1

I have a data frame in R. One of its column (variable) is tumor site eg. (intestine, colon, lung, stomach, bladder). I need to categorize those sites according to the system. For example, I need if the site is the stomach, intestine, colon to form a new column renaming those sites with gastrointestinal.so I can decrease options in the "site of tumor" variable. part of data

enter image description here

  • 1
    Please provide some sample data for reproducibility, the `base::dput`-function my help you with that. – Jonas Aug 31 '21 at 12:10
  • Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. – Community Aug 31 '21 at 14:53

1 Answers1

0

This type of problem can be resolved in a few ways:

  1. merge/join
  2. dplyr::case_when or data.table::fcase
  3. named-vector "dictionary" of sorts

Sample data

(Ideally you would have provided this, I'll make up something.)

dat <- data.frame(tumorsite = rep(c("intestine", "colon", "lung", "stomach", "bladder"),2))

Merge/join

(This topic is well-informed by How to join (merge) data frames (inner, outer, left, right), https://stackoverflow.com/a/6188334/3358272.)

newdat <- data.frame(tumorsite = c("intestine", "colon", "lung", "stomach", "bladder"), newcolumn = c("A", "A", "B", "A", "C"))
newdat
#   tumorsite newcolumn
# 1 intestine         A
# 2     colon         A
# 3      lung         B
# 4   stomach         A
# 5   bladder         C

### base R
merge(dat, newdat, by = "tumorsite", all.x = TRUE)
#    tumorsite newcolumn
# 1    bladder         C
# 2    bladder         C
# 3      colon         A
# 4      colon         A
# 5  intestine         A
# 6  intestine         A
# 7       lung         B
# 8       lung         B
# 9    stomach         A
# 10   stomach         A

### dplyr
dplyr::left_join(dat, newdat, by = "tumorsite")

case_when (fcase)

library(dplyr)
dat %>%
  mutate(newcolumn = case_when(
    tumorsite %in% c("intestine", "colon", "stomach") ~ "A", 
    tumorsite %in% c("lung", "pharynx") ~ "B", 
    tumorsite %in% c("bladder", "ureter") ~ "C", 
    TRUE ~ "unk")
  )
#    tumorsite newcolumn
# 1  intestine         A
# 2      colon         A
# 3       lung         B
# 4    stomach         A
# 5    bladder         C
# 6  intestine         A
# 7      colon         A
# 8       lung         B
# 9    stomach         A
# 10   bladder         C

dictionary vector

## using previously-defined newdat, but can be made directly/manually
newdat_vec <- setNames(newdat$newcolumn, newdat$tumorsite)
newdat_vec
# intestine     colon      lung   stomach   bladder 
#       "A"       "A"       "B"       "A"       "C" 

dat$newcolumn <- newdat_vec[dat$tumorsite]
dat
#    tumorsite newcolumn
# 1  intestine         A
# 2      colon         A
# 3       lung         B
# 4    stomach         A
# 5    bladder         C
# 6  intestine         A
# 7      colon         A
# 8       lung         B
# 9    stomach         A
# 10   bladder         C
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • Could you explain more. I will create a reference for naming my data? – kirellos said Aug 31 '21 at 21:21
  • 1
    oh thank you very much appreciated. You are smart. I created a reference and imported it into my script. Then applied this list using your code and it renames 25000 items according to my reference. Thank you – kirellos said Aug 31 '21 at 21:39