0

I have a dataframe containing the safety data for 100 patients. There are different safety factors for each patient with the size of that specific factor.

   v1_d0_urt_redness v1_d0_urt_redness_size v1_d1_urt_redness v1_d1_urt_redness_size ...
P1          1              20             
P2          1              NA
P3          0              NA
.
.
.

Here redness=1 means there was redness and redness=0 means there was no redness, and therefore the redness_size was not reported.
In order to find what proportion of the data is missing I need to code the data as follows: if (the column containing redness=1 & the column containing redness_size=NA) then (the column containing redness_size<-NA) else if (the column containing redness=0 then the column containing redness_size<-0) to have this coded for d0,d1,.. and to repeat this process for the other variables like hardness, swelling and etc. Any ideas how one could implement this in R?

1 Answers1

2

If I understand well what you are trying to do and assuming your dataframe is called df, you can change values of the column redness_size by doing this:

df[df[,endsWith(colnames(df),"_redness")] == 1 & is.na(df[,endsWith(colnames(df),"redness_size")]),endsWith(colnames(df),"redness_size")] <- NA
df[df[,endsWith(colnames(df),"_redness")] == 1, endsWith(colnames(df),"redness_size")] <- 0
dc37
  • 15,840
  • 4
  • 15
  • 32
  • Thank you, but as there are 5 days for each factor, is there anyway to use the patterns, like if the name contains 'redness' then do this procedure, and it would do it for the variables in 5 days.? – Ecatrina Smith Nov 26 '19 at 18:19
  • I am using ```df[df[,endsWith(colnames(df),"redness")]==0 ,endsWith(colnames(df),"redness_size")]<-0``` , but I get the error ```Error in `[<-.data.frame`(`*tmp*`, df[, endsWith(colnames(df), "redness")] == : non-existent rows not allowed ```, any ideas what goes wrong here? – Ecatrina Smith Nov 27 '19 at 15:11
  • My colnames are : ```"v1_d0_urt_redness" "v1_d0_lt_redness" "v1_d1_urt_redness" "v1_d1_lt_redness" "v1_d2_urt_redness" "v1_d2_lt_redness" , "v1_d0_urt_redness_size" "v1_d0_lt_redness_size" "v1_d1_urt_redness_size" "v1_d1_lt_redness_size" "v1_d2_urt_redness_size"... ```, when I am using ```df[,endsWith(colnames(df),"redness")]``` I am getting the right columns but the earlier code does not work.. – Ecatrina Smith Nov 27 '19 at 15:22
  • I edited the question with the updated variable names and checked the summary(df). – Ecatrina Smith Nov 27 '19 at 15:43
  • OK, I modified my answer accordingly. Do you still have the same issue ? Both lines are not working ? Have you check that colnames `redness` are in numeric format ? You can also try to run only `df[df[,endsWith(colnames(df),"_redness")] == 1 & is.na(df[,endsWith(colnames(df),"redness_size")],endsWith(colnames(df),"redness_size")]` to see what rows have been returned – dc37 Nov 27 '19 at 15:45
  • I am getting the same error! What could be the reason? The output of your last code is quite weird, there are 6 rows in it and the rownames are ```NA 50 NA.1 NA.2 NA.3 NA.4 ``` – Ecatrina Smith Nov 27 '19 at 15:58
  • Can you edit your question and provide the output of `dput(df[1:10,])`. Without a reproducible example of your data, it is hard for me to troubleshoot this error (see https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – dc37 Nov 27 '19 at 16:00
  • I think I will rather do it one by one than using pattern, thanks for your help :) – Ecatrina Smith Nov 27 '19 at 16:15