Add additional patient labels (control vs non-control) to data in data.frame based on ID

Question

I have a list of participants with unique IDs (P1-35) in a clinical trial. Some were control (P15,P16,P29-P35) and some non-control (All the rest).

I have a data.frame with patient physiological responses such as SkinTemp and HeartRate. I've been plotting all the data but would like to subset it into Control vs Non-control to be able to look at it separately and plot it separately.

Is there a way of adding an extra column of Ns and Cs based on wthether they participant was control or non-control?

EDIT: New data

    dput(head(data.frame(lp2),10))
structure(list(id = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L), .Label = c("1", "10", "11", "12", "13", "14", "15", 
"16", "17", "18", "19", "2", "20", "21", "22", "23", "24", "25", 
"26", "27", "28", "29", "3", "30", "31", "32", "33", "34", "4", 
"5", "6", "7", "8", "9"), class = "factor"), Time = c(0, 0, 0, 
0, 0, 0, 0, 0, 0, 0), SkinTemp = c(27.781, 27.78, 27.779, 27.779, 
27.778, 27.777, 27.776, 27.775, 27.775, 27.774), HeartRate = c(70, 
70, 70, 70, 70, 70, 70, 70, 70, 70), RespirationRate = c(10, 
10, 10, 10, 10, 10, 10, 10, 10, 10), HeartRateZero = c(39.764, 
39.764, 39.764, 39.764, 39.764, 39.764, 39.764, 39.764, 39.764, 
39.764), HeartRateZeroNorm = c(0.273998277347115, 0.273998277347115, 
0.273998277347115, 0.273998277347115, 0.273998277347115, 0.273998277347115, 
0.273998277347115, 0.273998277347115, 0.273998277347115, 0.273998277347115
), RespirationRateZero = c(6.404, 6.404, 6.404, 6.404, 6.404, 
6.404, 6.404, 6.404, 6.404, 6.404), RespirationRateZeroNorm = c(0.158766362554542, 
0.158766362554542, 0.158766362554542, 0.158766362554542, 0.158766362554542, 
0.158766362554542, 0.158766362554542, 0.158766362554542, 0.158766362554542, 
0.158766362554542), SkinTempZero = c(0.43, 0.429000000000002, 
0.428000000000001, 0.428000000000001, 0.427, 0.426000000000002, 
0.425000000000001, 0.423999999999999, 0.423999999999999, 0.423000000000002
), SkinTempZeroNorm = c(0.0600307133882451, 0.0598911070780402, 
0.0597515007678348, 0.0597515007678348, 0.0596118944576294, 0.0594722881474245, 
0.0593326818372191, 0.0591930755270137, 0.0591930755270137, 0.0590534692168088
), TimeZero = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), TimeZeroNorm = c(0, 
0, 0, 0, 0, 0, 0, 0, 0, 0), Segment = c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L), TimeLower = c(-Inf, -Inf, -Inf, -Inf, -Inf, 
-Inf, -Inf, -Inf, -Inf, -Inf), TimeUpper = c(0, 0, 0, 0, 0, 0, 
0, 0, 0, 0)), .Names = c("id", "Time", "SkinTemp", "HeartRate", 
"RespirationRate", "HeartRateZero", "HeartRateZeroNorm", "RespirationRateZero", 
"RespirationRateZeroNorm", "SkinTempZero", "SkinTempZeroNorm", 
"TimeZero", "TimeZeroNorm", "Segment", "TimeLower", "TimeUpper"
), row.names = c(NA, 10L), class = "data.frame")

I get an error `Error in structure(list(id = structure(c(1L, 1L, 1L, 1L), .Label = c("1", : object 'Segment' not found` — akrun, Feb 07 '17 at 13:08
Hmmm I'm not sure why that is. let me see if I can make some data. But in any case it would look like: ID, Skin Temp, Control(Y/N) — HCAI, Feb 07 '17 at 13:26
Ok, looks like you have dput a `tbl_df`. Convert it to a `data.frame` and then do the dput — akrun, Feb 07 '17 at 13:30
Just use something like `ifelse(df$id %in% c(15, 16, 29:35), 'C', 'N')` — Sotos, Feb 07 '17 at 13:34
We can use `c("N", "C")[((lp2$id %in% c(15, 16, 29:35))+1)]` — akrun, Feb 07 '17 at 13:36
possible duplicate. [**Link-1**](http://stackoverflow.com/questions/16570302/how-to-add-a-factor-column-to-dataframe-based-on-a-conditional-statement-from-an) [**Link2**](http://stackoverflow.com/questions/14202008/add-column-values-based-on-other-columns-in-data-frame-using-for-and-if) — user5249203, Feb 07 '17 at 14:12

Joe · Accepted Answer · 2017-02-07T13:57:15.053

1

Turning the comment of @Sotos into an answer with a reproducible one-column data frame.

df <- data.frame(id = c(1:35))
df$Group <- ifelse(df$id %in% c(15:16, 29:35), "C", "N")

Or, with dplyr::mutate

library(dplyr)
df %>% mutate(Group = if_else(id %in% c(15:16, 29:35), "C", "N"))

Just replace df with your own data frame.

edited Feb 07 '17 at 13:57

answered Feb 07 '17 at 13:43

Joe

8,073
1
52
58

Won't this over-write my data though if i do lp2<-data.frame(id=c(1:35))? – HCAI Feb 07 '17 at 13:50
Ok, I have no problem of you doing that, but at least do it with base R. No need to be loading packages for such elementary procedures. – Sotos Feb 07 '17 at 13:50
@HCAI: use your own data instead of `df`. @Sotos point taken and I didn't initially realise I'd stolen your answer, sorry. – Joe Feb 07 '17 at 13:59
No worries @Joe. That's fine – Sotos Feb 07 '17 at 14:05
Oh I see. I'd only use the second part after library (dplyr)? – HCAI Feb 07 '17 at 14:56
@Joe Great stuff, thank you. dplyr looks really interesting! Just one quick question: What happens if the controls weren't sequential? e.g. I've realised P22 is also a control. – HCAI Feb 07 '17 at 16:02
Glad it's solved. It doesn't matter at all what order your data are in, because for each row the expression checks whether `id` is contained in the vector `c(15,16,29,30,31,32,33,34,34)` and returns `C` or `N` for each. – Joe Feb 07 '17 at 16:19

Add additional patient labels (control vs non-control) to data in data.frame based on ID

1 Answers1