1

I have a data frame which is constructed like this

age  share
...
 19   0.02
 20   0.01
 21   0.03
 22   0.04
...

I want to merge each age group into larger cohorts like <20, 20-24, 25-29, 30-34, >=35 (and sum the shares).

Of course this could be easily done by hand, but I hardly can believe there is no dedicated function for that. However, I am not able to find this function. Can you help me?

Roman Luštrik
  • 69,533
  • 24
  • 154
  • 197
speendo
  • 13,045
  • 22
  • 71
  • 107

1 Answers1

4

What you want to use is ?cut. For example:

> myData <- read.table(text="age  share
+  19   0.02
+  20   0.01
+  21   0.03
+  22   0.04", header=TRUE)
> 
> myData$ageRange <- cut(myData$age, breaks=c(0, 20, 24, 29, 34, 35, 100))
> myData
  age share ageRange
1  19  0.02   (0,20]
2  20  0.01   (0,20]
3  21  0.03  (20,24]
4  22  0.04  (20,24]

Notice that you need to include breakpoints that are below the bottom number and above the top number in order for those intervals to form properly. Notice further that the breakpoint is exactly (e.g.) 20, and not <=20, >=21; that is, there cannot be a 'gap' between 20 and 21 such that 20.5 would be left out.

From there, if you want the shares in rows categorized under the same ageRange to be summed, you can create a new data frame:

> newData <- aggregate(share~ageRange, myData, sum)
> newData
  ageRange share
1   (0,20]  0.03
2  (20,24]  0.07
gung - Reinstate Monica
  • 11,583
  • 7
  • 60
  • 79
  • ok, that works. However, how is the actual merging done? so that rows 1 & 2 and also 3 & 4 are merged to one row? Hope this is no stupid question... – speendo Nov 26 '13 at 17:29
  • I'm sure it isn't a stupid question; unfortunately, I don't understand what you mean. Can you update your question to show what you want the output to look like? – gung - Reinstate Monica Nov 26 '13 at 17:33
  • I think I got it: `aggregate(share ~ ageRange, myData, sum)` - would you add this to your answer? – speendo Nov 26 '13 at 17:33