0

I have very little coding experience, so please forgive me.

My project involves an analysis of TMN clinical stages as a variable included in the National Cancer Data Base.

I am trying to clean up the variable coding so that, as an example, the T stages "1A" or "c1B2" all are coded as T stage "1".

Currently, the T stage variable is coded in a column in a large data set I have uploaded to R.

How can I proceed? I am thinking of using a "if" conditional to replace each of char values.

Phil
  • 7,287
  • 3
  • 36
  • 66
  • Welcome to SO, user19113407! Questions on SO (especially in R) do much better if they are reproducible and self-contained. By that I mean including attempted code (please be explicit about non-base packages), sample representative data (perhaps via `dput(head(x))` or building data programmatically (e.g., `data.frame(...)`), possibly stochastically), perhaps actual output (with verbatim errors/warnings) versus intended output. Refs: https://stackoverflow.com/q/5963269, [mcve], and https://stackoverflow.com/tags/r/info. – r2evans May 14 '22 at 01:20
  • You might look at `dplyr::recode` or `dplyr::case_when`. Or if you have a large number of variables to recode, you might consider making a lookup table and joining that to the data as a simpler way to manage it all. For instance `df <- data.frame(animal = "cluckmeister", important_data = 3); df_lookup <- data.frame(animal = "cluckmeister", animal_grp = "chicken"); df2 <- merge(df, df_lookup)` – Jon Spring May 14 '22 at 04:30
  • @JonSpring: Thanks for your help! I should've clarified that the T stage variable includes many other values (up to 50 different values: "c1B2", "c3B1", etc) such that not all of the values correspond to stage 1 (they range from stage 1-4). How would my approach change in this case? – user19113407 May 16 '22 at 17:52
  • That's the case I'm trying to describe, it would just have more animals and more animal groups than my 1-item example. I would make a table of all the coded values for stages, with an accompanying column with the grouped stage, and then join that to your data. If there are dozens of values to maintain, it's cleaner to keep them in a table than in custom logic code. – Jon Spring May 16 '22 at 18:29

0 Answers0