0

i'm quite new to R and i'm looking for a way to summarise data by it's frequency. So, I have a dataframe like this:

immagine    media
1          1 60.65391
2          2 58.89603
3          3 60.45428
4          4 55.91487
5          5 56.11804
6          6 56.05239
7          7 61.12667
8          8 60.46287
9          9 57.96718
10        10 58.75914
11        11 60.39214
12        12 57.64966
13        13 57.14457
14        14 56.85810
15        15 56.97652
16        16 58.45831
17        17 57.37774
18        18 57.31794
19        19 60.89813
20        20 61.57055
21        21 59.62459
22        22 56.86678
23        23 56.46254
24        24 54.72302
25        25 56.04285
26        26 55.88004
27        27 56.64764

and I would like to have a table with the variable "media" split in groups like 55-60, 50-55 ... and it's relative immagine. I really don't know how to go on. thank you in advance to anyone.

I was looking also to divide the data by its confidence interval. Is it possibile to do with a cut function in order to have different groups dived by the 95% CI? . thanks in advance. Nicola

Nicola

Spigonico
  • 137
  • 1
  • 10
  • 3
    It's not clear what you want, but `cut()` is going to feature in the answer. – Andrie Aug 14 '12 at 14:14
  • Have you checked out [this question](http://stackoverflow.com/q/11261325/1086688)? – nograpes Aug 14 '12 at 14:16
  • Assuming your dataset is named Dat. You can do something like this `newdat <- transform(Dat, mediacut = cut(Dat$media, c(50, 55, max(Dat$media)), include.lowest = TRUE)))` – dickoa Aug 14 '12 at 14:23
  • @dickoa thank you so much. It's almost close to what i'm looking for. Do you think is it possible to split the media data by it's quantile instead fixed values? thank you again. – Spigonico Aug 14 '12 at 14:59

1 Answers1

5

cut gives you a factor where the levels are the groupings you've specified.

table takes a vector and tells you how many elements are in each level.

Combine the two, and you should be able to do what you want:

> media <- rnorm(10,2.5)+57
> media
 [1] 60.13145 58.78920 61.01821 60.35878 59.20806 57.75657 61.12825 59.67605
 [9] 59.29902 58.70735
> ct <- cut( media, seq(50,65,2.5), include.lowest=TRUE ) 
> ct
 [1] (60,62.5] (57.5,60] (60,62.5] (60,62.5] (57.5,60] (57.5,60] (60,62.5]
 [8] (57.5,60] (57.5,60] (57.5,60]
Levels: [50,52.5] (52.5,55] (55,57.5] (57.5,60] (60,62.5] (62.5,65]
> table(ct)
ct
[50,52.5] (52.5,55] (55,57.5] (57.5,60] (60,62.5] (62.5,65] 
        0         0         0         6         4         0 

Because table returns a vector, you can have your output in percents if you want:

> table(ct)/length(ct)*100
ct
[50,52.5] (52.5,55] (55,57.5] (57.5,60] (60,62.5] (62.5,65] 
        0         0         0        60        40         0

Since ordering of vectors is preserved, you can add the groupings back to your data.frame by storing ct as a new column. If your data.frame is called dat, then:

dat$group <- ct

should do it.

Ari B. Friedman
  • 71,271
  • 35
  • 175
  • 235
  • Thanks Friedman, I would like also to associate the correct "immagine" number to the groups. that's what i'm really looking for. thank you again – Spigonico Aug 14 '12 at 14:30
  • 2
    Revised. Please note that a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) would go a long way in getting better help in the future. – Ari B. Friedman Aug 14 '12 at 15:06
  • thanks this is what i'm looking for. I have to go deep with the "cut" function. Nik – Spigonico Aug 14 '12 at 15:14
  • I was looking also to divide the data by its confidence interval. Is it possibile to do with a cut function in order to have different groups dived by the 95% CI? . thanks in advance. Nicola – Spigonico Aug 16 '12 at 14:07