reshape and aggregate datatable

Question

I asked a very similar question and because I haven't quite gotten a handle on tidyr or reshape I have to ask another question. I have a datatable containing repeat id values (see below):

id  Product NI  
1   Direct Auto 15  
2   Direct Auto 15  
3   Direct Auto 15  
4   Direct Auto 15  
5   Direct Auto 15  
6   Direct Auto 15  
6   Mortgage    50  
9   Direct Auto 15  
10  Direct Auto 15  
11  Direct Auto 15  
12  Direct Auto 15  
13  Direct Auto 15  
14  Direct Auto 15  
15  Direct Auto 15  
16  Direct Auto 15  
1   Mortgage    50  
5   Personal    110  
19  Direct Auto 15  
20  Direct Auto 15  
1   Direct Auto 15

I would like the id aggregated to one row, the Product column to be 'spread' so that the values become variables, another variable containing the aggregated count of each Product by id, and the NI to be summed for each of the product groups by ID. So see example below:

id  DirectAuto  DA_NI   Mortgage    Mortgage_NI Personal    P_NI  
1   2   30  1   50  NA  NA  
2   1   15  NA  NA  NA  NA  
3   1   15  NA  NA  NA  NA  
4   1   15  NA  NA  NA  NA  
5   1   15  NA  NA  1   110  
6   1   15  1   50  NA  NA  
9   1   15  NA  NA  NA  NA  
11  1   15  NA  NA  NA  NA  
12  1   15  NA  NA  NA  NA  
13  1   15  NA  NA  NA  NA  
14  1   15  NA  NA  NA  NA  
15  1   15  NA  NA  NA  NA  
16  1   15  NA  NA  NA  NA  
19  1   15  NA  NA  NA  NA  
20  1   15  NA  NA  NA  NA

For example, id 1 has 2 Direct Auto, so his DA_NI is 30 and he has 1 Mortgage so his NI is Mortgage_NI = 50.

So, basically make a 'wider' datatable. I'm still reading and practicing tidyr and reshape, but in the mean-time maybe someone can help.
Here is some of my starting code:

df[, .(tot = .N, NI = sum(NI)), by = c("id","Product")]

Afterwards, using some tidyr & reshape commands I can't seem to get the final output I want.

Try formatting your code http://stackoverflow.com/help/formatting Fyi, the syntax in your last line of code only works for the data.table package, which has its own tag. — Frank, Jul 23 '15 at 17:07
It would really help if you would include code to create your sample data so people can try solutions without having to labor over that part. — ulfelder, Jul 23 '15 at 17:08
ulfelder...I just wrote the sample data in Excel and copied it in. Please let me know what is the preferred method. tu — user3067851, Jul 23 '15 at 17:11
The preferred method uses R's `dput` command. You'll have to move your data from excel into R before starting, so you might as well do that first. Here's a guide for `dput` and asking R questions generally: http://stackoverflow.com/a/28481250/1191259 — Frank, Jul 23 '15 at 17:35
my data is in R. I just put a sample in Excel to post on the board. TU I'll take a look at this. — user3067851, Jul 23 '15 at 17:39
FYI, the package (and tag) is `data.table`, not `datatable`. — Arun, Jul 23 '15 at 19:40

score 1 · Accepted Answer · answered Jul 23 '15 at 19:38

1

data.table v1.9.5 has more nicer features for melting and casting. Using dcast from the devel version:

require(data.table) # v1.9.5
dcast(dt, id ~ Product, fun.agg = list(sum, length), value.var="NI", fill=NA)

I think this is what you're looking for. You can checkout the new HTML vignettes here.

Rename the columns to your liking.

answered Jul 23 '15 at 19:38

Arun

116,683
26
284
387

I followed the steps and downloaded the eval version but received this error: Error in FUN(X[[7L]], ...) : could not find function ".fun"......also, will this give me the sum of NI and count of Product by ID, I'm trying to visualize this in your code – user3067851 Jul 23 '15 at 20:16
You're running `reshape2::dcast`, whereas this requires `data.table::dcast`. – Arun Jul 23 '15 at 20:19
1

that was great. Much less writing to accomplish my results...appreciate the mulitple value.r and value.var functionality that has been added. – user3067851 Jul 23 '15 at 20:49

Nick Kennedy · Answer 2 · 2015-07-23T19:04:27.877

0

It's a little tricky to do this. It can be done using tidyr and dplyr though it goes against Hadley Wickgam's tidy data principles.

dat %>%
  group_by(id, Product) %>%
  summarise(NI = sum(NI), n = n()) %>%
  gather(variable, value, n, NI) %>%
  mutate(
    col_name = ifelse(variable == "n",
      as.character(Product),
      paste(Product, variable, sep = "_"))
  ) %>%
  select(-c(Product, variable)) %>%
  spread(col_name, value)

edited Jul 23 '15 at 19:04

answered Jul 23 '15 at 17:40

Nick Kennedy

12,510
2
30
52

I get this error message: Error in n() : This function should not be called directly. Also, I think first(NI) should be sum, no? – user3067851 Jul 23 '15 at 18:36
I suspect you have both `plyr` and `dplyr` loaded. Try replacing `summarise` with `dplyr::summarise` or executing `detach("package:plyr")` first. – Nick Kennedy Jul 23 '15 at 19:02
And yes, it should have been `sum`. I misread what you wanted. – Nick Kennedy Jul 23 '15 at 19:07

reshape and aggregate datatable

2 Answers2