1

I have this data:

12.1 12.5 12.6 12.7 12.8 13.0 13.2 13.2 13.2 13.3 13.3 13.3 
13.4 13.4 13.5 13.5 13.7 13.7 13.7 13.8 13.9 14.1 14.1 14.2 
14.3 14.3 14.3 14.4 14.4 14.5 14.6 14.6 14.6 14.8 14.8 14.9 
14.9 14.9 15.2 15.2 15.3 15.3 15.5 15.6 15.6 15.7 15.8 15.9 
16.1 16.1 16.3 16.4 16.4 16.5 16.7 16.9 17.0

and I'd like to put it into these bins:

12.1 12.5 12.6 12.7 12.8 13.0 13.2 13.3 13.4 13.5 13.7 13.8 
13.9 14.1 14.2 14.3 14.4 14.5 14.6 14.8 14.9 15.2 15.3 15.5 
15.6 15.7 15.8 15.9 16.1 16.3 16.4 16.5 16.7 16.9 17.0

So for example, the 13.2 and 13.3 bins would have 3 items, etc.

I should mention that the dataset has other columns I want to follow this numeric data into the bins.

I'm new to R and trying to figure out binning.

Here is code to setup my data, and the unique values:

test <- function() {
    data <- c(12.1,12.5,12.6,12.7,12.8,13.0,13.2,13.2,13.2,13.3,13.3,13.3,
13.4,13.4,13.5,13.5,13.7,13.7,13.7,13.8,13.9,14.1,14.1,14.2,
14.3,14.3,14.3,14.4,14.4,14.5,14.6,14.6,14.6,14.8,14.8,14.9,
14.9,14.9,15.2,15.2,15.3,15.3,15.5,15.6,15.6,15.7,15.8,15.9,
16.1,16.1,16.3,16.4,16.4,16.5,16.7,16.9,17.0)

    unique_data = unique(data)

    print(unique_data)
}
Greg Lafrance
  • 768
  • 1
  • 7
  • 18

2 Answers2

1

Assuming "x" is your input vector and "y" is your vector of breaks points, you should just use cut:

cut(x, c(-Inf, y, Inf))

Here's an example of what the bin counts look like:

table(cut(x, c(-Inf, y, Inf)))
# 
# (-Inf,12.1] (12.1,12.5] (12.5,12.6] (12.6,12.7] (12.7,12.8]   (12.8,13] 
#           1           1           1           1           1           1 
#   (13,13.2] (13.2,13.3] (13.3,13.4] (13.4,13.5] (13.5,13.7] (13.7,13.8] 
#           3           3           2           2           3           1 
# (13.8,13.9] (13.9,14.1] (14.1,14.2] (14.2,14.3] (14.3,14.4] (14.4,14.5] 
#           1           2           1           3           2           1 
# (14.5,14.6] (14.6,14.8] (14.8,14.9] (14.9,15.2] (15.2,15.3] (15.3,15.5] 
#           3           2           3           2           2           1 
# (15.5,15.6] (15.6,15.7] (15.7,15.8] (15.8,15.9] (15.9,16.1] (16.1,16.3] 
#           2           1           1           1           2           1 
# (16.3,16.4] (16.4,16.5] (16.5,16.7] (16.7,16.9]   (16.9,17]   (17, Inf] 
#           2           1           1           1           1           0

You may have to tweak some of the arguments to get the values to fall in the bins you expect them to, but cut is generally the function for this, along with findInterval as a close relative.

A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
  • The data is actually a two column data frame. I need to bin the rows of the data frame based on the unique values of the 2nd column, but in the result, I still need the data frame two columns of data intact. Binning the 2nd column just helps me identify which rows have the same 2nd column value. – Greg Lafrance May 03 '14 at 07:16
  • @GregLafrance, please try to make a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), preferably one that is also (1) small, and (2) that includes an example of your desired output. – A5C1D2H2I1M1N2O1R2T1 May 03 '14 at 07:18
  • Added a function to create the data and get unique values. Now I need to find out how to bin the data by the unique values. – Greg Lafrance May 03 '14 at 07:49
  • @GregLafrance, what I meant was a small example that actually show what your `data.frame` might look like (you mention you have a two-column `data.frame` but that you want to do this exercise *by row* (?). – A5C1D2H2I1M1N2O1R2T1 May 03 '14 at 07:52
0

The dplyr package contains some handy tools for doing this sort of thing.

Assuming you have a data frame df where the values you've mentioned are in a column value, you can bin and count unique values using syntax like:

binned = df %.% group_by(value) %.% summarise(count=n())

binned will have columns value and count.

summarise lets you add other summary statistics. If you wanted to add the mean of some column other_value, you could do that like:

binned = df %.% group_by(value) %.% summarise(count=n(), mean_other_val=mean(other_val))

Now, binned will have columns value, count, and mean_other_val.

Tim Smith
  • 6,127
  • 1
  • 26
  • 32