Organizing data by the value in one column

Question

If I have data (a data frame) like

Type   Value     Date
A       1.1      1/1/2018
B       1.0      1/1/2018
C       9.9      1/1/2018
A       0.9      3/3/2018
B       1.0      3/3/2018
C       9.9      3/3/2018

How do I put the data into the form

Date        A      B      C   
1/1/2018    1.1   1.0   9.9
3/1/2018    0.9   1.0   9.9

As for why I would want to do this, it's because for each Date I want to compute the values B-A and C-A...if there's a way to do that more directly, that would be great too.

Thank you.

Edit to add minimal example:

Type = c("A","B","C","A","B","C") 
Value = c(1.1, 1.0, 9.9, 0.9, 1.0, 9.9) 
Date = c("1/1/2018", "1/1/2018", "1/1/2018", "3/3/2018","3/3/2018", "3/3/2018") 
df = data.frame(Type, Value, Date)

You are looking for the `cast` command. Try `cast(your_df, Date ~ Type, mean, value = 'Value')` — MDK, Sep 05 '18 at 01:22
@MDK thanks, what is `mean` doing there? I don't understand what it does in this context (since I don't want the mean of anything). — Ben S., Sep 05 '18 at 01:24
@Ben If the formula you give corresponds to more than one row in your original data frame, `cast` needs to know how to combine them. If your data has no repeats, then the `mean` does nothing. — MDK, Sep 05 '18 at 01:35

score 1 · Answer 1 · answered Sep 05 '18 at 01:25

1

Try cast(your_df, Date ~ Type, mean, value = 'Value')

answered Sep 05 '18 at 01:25

MDK

301
3
11

So, that should work, but I'm running into two issues. My data is in a data frame. The first is trivial--I don't have `cast()`, but `acast()` and `dcast()`. I believe this is a difference between `reshape` and `reshape2`. The second is that I get all NAs for my values with the warnings in each case `In mean.default(.value[i], ...) : argument is not numeric or logical: returning NA`. The column of the data frame in question certainly is numeric, as `str(df)` confirms, so I'm a little confused by this error. – Ben S. Sep 05 '18 at 14:23
Here's a minimal example: `Type = c("A","B","C","A","B","C")` `Value = c(1.1, 1.0, 9.9, 0.9, 1.0, 9.9)` `Date = c("1/1/2018", "1/1/2018", "1/1/2018", "3/3/2018","3/3/2018", "3/3/2018")` `df = data.frame(Type, Value, Date)` Then `dcast(df, Date ~ Type, mean, value = 'Value')` gives the right layout, but everything except the `Date` is `NA` with the warnings given above. – Ben S. Sep 05 '18 at 14:47
OK it looks like `dcast(df, Date ~ Type, value.var = 'Value')` works, don't know why, but it does. – Ben S. Sep 05 '18 at 15:03
`dcast` works the same way. If you had more than 1 row with the same Type and Date, `dcast` and `cast` both default to `length` and return output very different than what you wanted. In that case, you would need to add `fun.aggregate = mean` or uniquely identify your rows. Actually, this issue of how to deal with multiple rows matching the key values comes up in `cast` (reshape), `dcast` (reshape2), and `spread` (tidyr). AFAIK, `spread` cannot handle duplicates at all, and always throws an error if you have duplicates. Perhaps that's what you want. – MDK Sep 05 '18 at 18:16
Ok thanks. If you edit your answer so it works with the minimal example I will check it. I think you just need to change `value` to `value.var`. At least that is what does it for `dcast()` – Ben S. Sep 06 '18 at 19:26

Organizing data by the value in one column

1 Answers1