R, reshape2: dcast fills in 0 for missing combinations by default -- normal behavior?

Question

I have the following long table:

> long.table
Source: local data frame [846,996 x 3]

   index         case   amp
   (int)        (chr) (dbl)
1      1 TCGA-AR-A1AH     1
2      2 TCGA-AN-A0G0     1
3      2 TCGA-AR-A1AH     1
4      3 TCGA-AR-A1AH     1
5      4 TCGA-E9-A1R7     1
6      5 TCGA-AN-A0FL     1
7      6 TCGA-A7-A26G     1
8      6 TCGA-AN-A0FL     1
9      7 TCGA-A7-A26G     1
10     7 TCGA-AN-A0FL     1
..   ...          ...   ...

from which I created short.table by slicing off the first five rows:

> tbl.test.2
Source: local data frame [5 x 3]

  index         case   amp
  (int)        (chr) (dbl)
1     1 TCGA-AR-A1AH     1
2     2 TCGA-AN-A0G0     1
3     2 TCGA-AR-A1AH     1
4     3 TCGA-AR-A1AH     1
5     4 TCGA-E9-A1R7     1

If I use dcast(table, case ~ index) for each of the tables, I get different behavior: In the long case, I get integer-valued columns for the index values and the missing combinations are filled in with zeros. However, in the short case, I get numeric-valued columns for the index values and the missing combinations are filled in with NAs.

Question: Does default behavior change for very long tables?

I strongly suspect that it tells you what it's doing, something like "Aggregation function missing: defaulting to length". You will see a zero because the vector of values associated with that case ~ index combo has no observations. It needs an aggregation function because some other case ~ index combos in the table have > 1 observations (and so must be collapsed to a single value somehow). — Frank, Oct 14 '16 at 21:09
Thanks, Frank -- you right: I had missed the warning and I thought I'd removed all potential duplicate rows, but hadn't. — David Romano, Oct 16 '16 at 21:14
Also, in terms of labeling the question as a duplicate, I agree that the answer is in the linked question, but since I didn't think dcast wasn't working and I didn't realize there was a connection to fun.aggregate, I wouldn't have (and didn't) find that question when I was looking for an answer to my own -- does the 'duplicate' status mean the question will be removed? — David Romano, Oct 16 '16 at 21:31
Nope, it won't be removed. A duplicate is meant to serve as a "signpost", so others who have your problem and come here can be directed to the linked Q&A. — Frank, Oct 16 '16 at 22:36

R, reshape2: dcast fills in 0 for missing combinations by default -- normal behavior?

0 Answers0