-1

I have two columns of paired values in a data frame, I want to bin the data in one column using the cut2 function from the Hmisc package so that there are at least say 25 data points in each bin. I however need the corresponding values from the other column. Is there a convenient way for that using R? I have to bin the column B.

A           B
-10.834510  1.680173
11.012966  1.866603
-16.491415  1.868667
-14.485036  1.900002
2.629104  1.960929
-3.597291  2.005348
.........
Christopher Bottoms
  • 11,218
  • 8
  • 50
  • 99
WoA
  • 173
  • 2
  • 2
  • 12

1 Answers1

0

It's not clear what you mean by wanting the "corresponding values of the other column". The first part is easy to accomplish using the g (# of groups) argument:

dfrm$Agrp <- cut2(dfrm$A, g=trunc(length(dfrm$A)/25) )

You can aggregate means or medians of B within Agrp's using tapply or ave or one of the Hmisc summary functions. There are several worked examples in one of today's questions: How to get Summary statistics by group as well as many other examples of using those functions or aggregate or the pkg:plyr functions.

Given that the number of B values will not necessarily be constant across groups the only way I can think to deliver the individual values by A-grouped-value would be with split. I added an extra row to illustrate that a non-even split might need to return a list rather than a more "rectangular" object :

dat <- read.table(text="A           B
 -10.834510  1.680173
 11.012966  1.866603
 -16.491415  1.868667
 -14.485036  1.900002
 2.629104  1.960929
 -3.597291  2.005348\n 3.5943 3.796", header=TRUE)
 dat$Agrp <- cut2(dat$A, g=trunc(length(dat$A)/3) )
 split(dat$B, dat$Agrp)
 #-----    
$`[-16.49, 2.63)`
[1] 1.680173 1.868667 1.900002 2.005348

$`[  2.63,11.01]`
[1] 1.866603 1.960929 3.796000

If you want the vector of values on which the splits were done then that can be accomplished by using regex on levels(dat$Agrp).

Community
  • 1
  • 1
IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • I've no problem in binning the data (the first part). For the second part, as an example say if the first bin contains the first four values of column B, I need to get the four corresponding values of column A, preferably in a vector – WoA Mar 24 '12 at 22:57
  • Language can be somewhat imprecise. I was demonstrating binning the A values. Just reverse A and B if you were thinking of binning on the B values. Maybe if you make reference to the concrete example above I will understand what sort of "correspondence" you are thinking about. – IRTFM Mar 24 '12 at 23:11