8

I'll apologise in advance - I know this has likely been answered elsewhere, but I don't seem to be able to find the answer I need, and can't manage to adapt other code I have found to my needs.

I have a data frame:

FILE | TECHNIQUE | COUNT
------------------------
A    | ONE       | 10
A    | TWO       | 25
B    | ONE       |  5
B    | TWO       | 30
C    | ONE       | 30
C    | TWO       | 50

I would like to produce a data frame of the difference of the COUNT values between ONE and TWO, with a row for each FILE, i.e.

FILE | DIFFERENCE
-----------------
A    |   15
B    |   25
C    |   20

I'm convinced I should be able to do this fairly easily with base R or Plyr, but am a bit stuck. Could anyone suggest a good way to do this, and perhaps good tutorials on Plyr that might help me with similar problems in the future?

Thanks

obfuscation
  • 1,023
  • 3
  • 16
  • 23

1 Answers1

11

Using aggregate in base:

> aggregate(.~FILE, data= DF[, -2], FUN=diff)
  FILE COUNT
1    A    15
2    B    25
3    C    20

Using ddply in plyr

> ddply(DF[,-2], .(FILE), summarize, DIFFERENCE=diff(COUNT))
  FILE DIFFERENCE
1    A         15
2    B         25
3    C         20

with data.table

> # library(data.table)
> DT <- data.table(DF)
> DT[, diff(COUNT), by=FILE]
   FILE V1
1:    A 15
2:    B 25
3:    C 20

with by

> with(DF, by(COUNT, FILE, diff))
FILE: A
[1] 15
----------------------------------------------------------------------------- 
FILE: B
[1] 25
----------------------------------------------------------------------------- 
FILE: C
[1] 20

with tapply

> tapply(DF$COUNT, DF$FILE, diff)
 A  B  C 
15 25 20 

with summaryBy from doBy package

> # library(doBy)
> summaryBy(COUNT~FILE, FUN=diff, data=DF)
  FILE COUNT.diff
1    A         15
2    B         25
3    C         20

Update As percentage:

> aggregate(.~FILE, data= DF[, -2], function(x) (x[1]/x[2])*100)
  FILE    COUNT
1    A 40.00000
2    B 16.66667
3    C 60.00000
Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138
  • Firstly, thanks for the fantastic answers and thanks for the variety - you're seriously on a roll (and I will accept your answer in a moment). One final addition - how could I calculate it as a percentage of the original? (e.g. A would become 40%). I understand this would involve using something other than diff, right? – obfuscation Sep 03 '13 at 13:40
  • where that 40% comes from? – Jilber Urbina Sep 03 '13 at 13:44
  • Considering the 2 sample rows for FILE==A, COUNT for ONE = 10 and COUNT for TWO = 25, substitution gives (10/25)*100 = 40%. If I use 'FUN=function(x){x}', I can get three columns in the result, from which I can calculate this, but this doesn't exactly seem the right way to do it. – obfuscation Sep 03 '13 at 13:49
  • I had literally just written the same code as your update (having used FUN=function(x){print(x); x} to see what x actually was. Thanks for the help - already accepted the answer, sorry I can't do much more! – obfuscation Sep 03 '13 at 13:58
  • `x` is each value belonging to each gruop: `A, B` and `C`. – Jilber Urbina Sep 03 '13 at 14:02
  • 1
    What are you doing when you run out of options, offer solutions in different languages?! – PascalVKooten Sep 03 '13 at 14:12