-1

enter image description here

I have a data frame as given.

The image contains two columns NAME and RANGE. range have values starting from 50000 to 70000 I want to dived Range in the group of 2000 like from 5000 to 52000 whichever ever value comes that should fall in that group and then I want to find the standard deviation of that group.

I was using the following code

tally(group_by(df,RANGE=cut(RANGE,breaks = seq(50000,70000,by=2000,))) %>%
 ungroup() %>% 
  spread(RANGE,n,fill = 0)

but I am not able to calculate S.d from this

I want my output as follow

RANGE   FREQ S.D
50K-52K 10   1.2
52K-54K 5    0.8
....
...
68K-70K 4    2
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
xyz
  • 79
  • 1
  • 8

2 Answers2

1

You could try to cut RANGE in groups and then take sd of each group.

library(dplyr)

df %>%
  group_by(group = cut(RANGE, breaks = seq(50000,70000,by=2000))) %>%
  summarise(sd = sd(RANGE), 
            Freq = n())

Or similar using base R aggregate

df$groups <- cut(df$RANGE,breaks = seq(50000,70000,by=2000))
aggregate(RANGE~groups, df, function(x) c(sd(x), length(x)))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • and what about the frequency which falls in that group – xyz Sep 14 '19 at 07:38
  • @ Ronak Shah how can I remove the scientific notation from the group column – xyz Sep 16 '19 at 11:04
  • @xyz run `options(scipen = 999)` in console. – Ronak Shah Sep 16 '19 at 12:31
  • @ Ronak Shah options(scipen = 999) I have tried this already but this is not working – xyz Sep 17 '19 at 08:33
  • I am not able to reproduce the issue since after running `options(scipen = 999)` I am not seeing any scientific notation as mentioned in this post. https://stackoverflow.com/questions/5352099/how-to-disable-scientific-notation – Ronak Shah Sep 17 '19 at 08:43
1

We can use data.table

library(data.table)
setDT(df)[, .(sd = sd(RANGE), Freq = .N), 
        .(group =  cut(RANGE, breaks = seq(50000,70000,by=2000)))]
akrun
  • 874,273
  • 37
  • 540
  • 662