I have a data frame in R. One of its column (variable) is tumor site eg. (intestine, colon, lung, stomach, bladder). I need to categorize those sites according to the system. For example, I need if the site is the stomach, intestine, colon to form a new column renaming those sites with gastrointestinal.so I can decrease options in the "site of tumor" variable. part of data
Asked
Active
Viewed 50 times
-1
-
1Please provide some sample data for reproducibility, the `base::dput`-function my help you with that. – Jonas Aug 31 '21 at 12:10
-
Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. – Community Aug 31 '21 at 14:53
1 Answers
0
This type of problem can be resolved in a few ways:
- merge/join
dplyr::case_when
ordata.table::fcase
- named-vector "dictionary" of sorts
Sample data
(Ideally you would have provided this, I'll make up something.)
dat <- data.frame(tumorsite = rep(c("intestine", "colon", "lung", "stomach", "bladder"),2))
Merge/join
(This topic is well-informed by How to join (merge) data frames (inner, outer, left, right), https://stackoverflow.com/a/6188334/3358272.)
newdat <- data.frame(tumorsite = c("intestine", "colon", "lung", "stomach", "bladder"), newcolumn = c("A", "A", "B", "A", "C"))
newdat
# tumorsite newcolumn
# 1 intestine A
# 2 colon A
# 3 lung B
# 4 stomach A
# 5 bladder C
### base R
merge(dat, newdat, by = "tumorsite", all.x = TRUE)
# tumorsite newcolumn
# 1 bladder C
# 2 bladder C
# 3 colon A
# 4 colon A
# 5 intestine A
# 6 intestine A
# 7 lung B
# 8 lung B
# 9 stomach A
# 10 stomach A
### dplyr
dplyr::left_join(dat, newdat, by = "tumorsite")
case_when (fcase)
library(dplyr)
dat %>%
mutate(newcolumn = case_when(
tumorsite %in% c("intestine", "colon", "stomach") ~ "A",
tumorsite %in% c("lung", "pharynx") ~ "B",
tumorsite %in% c("bladder", "ureter") ~ "C",
TRUE ~ "unk")
)
# tumorsite newcolumn
# 1 intestine A
# 2 colon A
# 3 lung B
# 4 stomach A
# 5 bladder C
# 6 intestine A
# 7 colon A
# 8 lung B
# 9 stomach A
# 10 bladder C
dictionary vector
## using previously-defined newdat, but can be made directly/manually
newdat_vec <- setNames(newdat$newcolumn, newdat$tumorsite)
newdat_vec
# intestine colon lung stomach bladder
# "A" "A" "B" "A" "C"
dat$newcolumn <- newdat_vec[dat$tumorsite]
dat
# tumorsite newcolumn
# 1 intestine A
# 2 colon A
# 3 lung B
# 4 stomach A
# 5 bladder C
# 6 intestine A
# 7 colon A
# 8 lung B
# 9 stomach A
# 10 bladder C

r2evans
- 141,215
- 6
- 77
- 149
-
Could you explain more. I will create a reference for naming my data? – kirellos said Aug 31 '21 at 21:21
-
1oh thank you very much appreciated. You are smart. I created a reference and imported it into my script. Then applied this list using your code and it renames 25000 items according to my reference. Thank you – kirellos said Aug 31 '21 at 21:39