-1

Hi I have a with three columns: a name column, a numerical column A and a numerical column B. What I am trying to do is bin the rows by their values in column A, and then find the standard deviation of each binned group's values in column B.

I have currently figured out how to set up the bins into equal length intervals

sapply(df, function(x) {

bins <- with(x, cut(x$a, breaks = seq( floor(min(x$a)/10)*10, ceiling(max(x$a)/10)*10, 
        by= 10), include.lowest = T, labels = labels))})

but I'm stuck on how to apply the bins to the dataframe and calculate the standard deviation of each binned group's set of values from column B.

Does anyone have any suggestions? Thank you...

massisenergy
  • 1,764
  • 3
  • 14
  • 25
hamhung
  • 53
  • 8
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Feb 20 '20 at 21:49

2 Answers2

1
library(tidyverse)

set.seed(1)

df <- 
  tibble(
    type = rep(c(1, 2), each = 5),
    value = sample.int(20, size = 10)
  )

df %>% 
  group_by(type) %>% 
  summarize(sdev = sd(value))

# A tibble: 2 x 2
   type  sdev
  <dbl> <dbl>
1     1  4.83
2     2  6.26
cardinal40
  • 1,245
  • 1
  • 9
  • 11
  • Thanks, but how do I actually bin the column type into categories of equal length? Say I want to bin type values 1 and 2 into bins 1 and 2 – hamhung Feb 20 '20 at 22:05
  • I understand now... looks like someone below has gotten there for you. – cardinal40 Feb 20 '20 at 22:15
  • I figured out how to get bins with cut() but don't know how to get the cut() result into the df – hamhung Feb 20 '20 at 22:15
0

Here is a example to calculate sd of b based on binned a, where ave was applied, i.e.,

df <- within(df, bins <- cut(a, breaks = seq( floor(min(a)/10)*10, ceiling(max(a)/10)*10, by= 10), include.lowest = T))
df <- within(df, b.sd <- ave(b,bins,  FUN = sd))

such that

> df
           a           b     bins      b.sd
1  26.550866 -0.05612874  (20,30] 0.4936856
2  37.212390 -0.15579551  (30,40] 0.5699030
3  57.285336 -1.47075238  (50,60] 1.5255800
4  90.820779 -0.47815006 (90,100] 0.6159115
5  20.168193  0.41794156  (20,30] 0.4936856
6  89.838968  1.35867955  (80,90] 1.0255606
7  94.467527 -0.10278773 (90,100] 0.6159115
8  66.079779  0.38767161  (60,70] 1.3737593
9  62.911404 -0.05380504  (60,70] 1.3737593
10  6.178627 -1.37705956   [0,10] 0.4163140
11 20.597457 -0.41499456  (20,30] 0.4936856
12 17.655675 -0.39428995  (10,20] 0.5249513
13 68.702285 -0.05931340  (60,70] 1.3737593
14 38.410372  1.10002537  (30,40] 0.5699030
15 76.984142  0.76317575  (70,80] 0.7825274
16 49.769924 -0.16452360  (40,50] 1.2127445
17 71.761851 -0.25336168  (70,80] 0.7825274
18 99.190609  0.69696338 (90,100] 0.6159115
19 38.003518  0.55666320  (30,40] 0.5699030
20 77.744522 -0.68875569  (70,80] 0.7825274
21 93.470523 -0.70749516 (90,100] 0.6159115
22 21.214252  0.36458196  (20,30] 0.4936856
23 65.167377  0.76853292  (60,70] 1.3737593
24 12.555510 -0.11234621  (10,20] 0.5249513
25 26.722067  0.88110773  (20,30] 0.4936856
26 38.611409  0.39810588  (30,40] 0.5699030
27  1.339033 -0.61202639   [0,10] 0.4163140
28 38.238796  0.34111969  (30,40] 0.5699030
29 86.969085 -1.12936310  (80,90] 1.0255606
30 34.034900  1.43302370  (30,40] 0.5699030
31 48.208012  1.98039990  (40,50] 1.2127445
32 59.956583 -0.36722148  (50,60] 1.5255800
33 49.354131 -1.04413463  (40,50] 1.2127445
34 18.621760  0.56971963  (10,20] 0.5249513
35 82.737332 -0.13505460  (80,90] 1.0255606
36 66.846674  2.40161776  (60,70] 1.3737593
37 79.423986 -0.03924000  (70,80] 0.7825274
38 10.794363  0.68973936  (10,20] 0.5249513
39 72.371095  0.02800216  (70,80] 0.7825274
40 41.127443 -0.74327321  (40,50] 1.2127445
41 82.094629  0.18879230  (80,90] 1.0255606
42 64.706019 -1.80495863  (60,70] 1.3737593
43 78.293276  1.46555486  (70,80] 0.7825274
44 55.303631  0.15325334  (50,60] 1.5255800
45 52.971958  2.17261167  (50,60] 1.5255800
46 78.935623  0.47550953  (70,80] 0.7825274
47  2.333120 -0.70994643   [0,10] 0.4163140
48 47.723007  0.61072635  (40,50] 1.2127445
49 73.231374 -0.93409763  (70,80] 0.7825274
50 69.273156 -1.25363340  (60,70] 1.3737593

DATA

set.seed(1)
df <- data.frame(a = runif(50,0,100),
                 b = rnorm(50))
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
  • Great, is there also a way to get the binned cuts onto the dataframe itself so I can later graph both the binned categories and each bin category's SD? Thank you! – hamhung Feb 20 '20 at 22:17
  • @hamhung Great! If you think my answer is helpful, please feel free to upvote/accept it, thanks! – ThomasIsCoding Feb 20 '20 at 22:42