R data.table tidying - condense rows based on muliple criteria

Question

***To ask this question I removed a Time column in my below example that exists in my actual data which featured the actual time (with different seconds despite same nominal time) which led to my data having one score per row. When I rounded these times I solved my problem using dcast :)

I am currently trying to tidy up my data and I'm running into some roadblocks (I'm a beginner to R and do most of my learning from this site). I want to transform my data to have symptoms shown in columns based both on matching the person and the nominal time of the symptom. Through this data tidying I will be reducing my data set of 64,000 observations to about 8,000. My data currently looks like this:

Person  Nominal.Time  Name           Score
1       +30           A              6
1       +30           B              9
1       +30           C              3
2       +90           A              1
2       +90           B              5
2       +90           C              2

I was able to transform my data into the following:

library(reshape2)
WideSymptomData <- dcast(SymptomData,Person+Nominal.Time~Symptom.Name, value.var="Symptom.Score")

Person  Nominal.Time  A   B   C
1       +30           6                
1       +30               9           
1       +30                   3       
2       +90           1              
2       +90               5          
2       +90                   2

But unfortunately I'm stumped at that point. I've been researching and can't seem to find out how to do the next step to ultimately reach this:

Person  Nominal.Time  A   B   C
1       +30           6   9   3                
2       +90           1   5   2

I think this question might be similar to mine although I wasn't able to successfully apply the answers for it. Any guidance is much appreciated, thanks!

`reshape(data, dir = 'wide', idvar = c('Person','Nominal.Time'), timevar = 'Symptom.Name')` — rawr, Apr 29 '16 at 16:05

MrFlick · Accepted Answer · 2016-04-29T16:28:40.623

2

Using the following data

SymptomData <- data.table(read.table(text="Person  Nominal.Time  Symptom.Name   Symptom.Score
1       +30           A              6
1       +30           B              9
1       +30           C              3
2       +90           A              1
2       +90           B              5
2       +90           C              2", header=T, colClasses=c("numeric","character","character","numeric")))

This works just fine

dcast(SymptomData, Person+Nominal.Time~Symptom.Name, value.var="Symptom.Score")
#   Person Nominal.Time A B C
# 1      1          +30 6 9 3
# 2      2          +90 1 5 2

Tested with reshape2_1.4.1 and data.table_1.9.6. Make sure your example is representative of your real data.

edited Apr 29 '16 at 16:28

answered Apr 29 '16 at 15:16

MrFlick

195,160
17
277
295

When I use that code, I receive this message "Using Nominal.Time as value column: use value.var to override." and the symptom A, B, C columns are populated with Nominal.Time values and I am still faced with many rows containing blanks (NAs) as I outlined in my initial post. – willz Apr 29 '16 at 15:24
You might want to try adding value.var parameter: dcast(SymptomData, Person+Nominal.Time~Symptom.Name, value.var="Symptom.Score") – Alan E Apr 29 '16 at 15:42
@willz can you add the version numbers of the packages you are using? And did you test with data above? Or did you have more columns or something? – MrFlick Apr 29 '16 at 15:43
@MrFlick I just tested with the data above and yes it did work. The difference between that and the data I actually have is some additional columns including Date, Actual Time, etc. That's why I'm a little confused where the error is coming from. I'm a little hesitant to subset my data, do the calculationa and then merge it back because I want to maintain those other columns and want to avoid error. – willz Apr 29 '16 at 17:10

R data.table tidying - condense rows based on muliple criteria

1 Answers1