0

I currently have such a dataframe (with 60 columns of values) and want to extract the values according to the following rules.

  1. Compare a1,b1,c1,d1. Find the highest value of the 4 numbers.
  2. If a1 is the highest value, then get the value of a6. Similarly, if b1 is the highest value, get the value of b6. If c1 is the highest value, get the value of c6. If d1 is the highest value, get the value of d6. (we need to extract the value at this stage)
  3. Compare a2,b2,c2,d2.Find the highest value of the 4 numbers.
  4. If a2 is the highest value, then get the value of a7. Similarly, if b2 is the highest value, get the value of b7. If c2 is the highest value, get the value of c7. If d2 is the highest value, get the value of d7,
  5. We do the above steps until we compare a5,b5,c5,d5 and get the highest value of the 4 values. Then find its corresponding cell a10,b10,c10,or d10.

Thank you very much!

a1    1          
a2    3
a3    4
a4    2
a5    3
a6    9
a7    2
a8    3
a9    4
a10   7 
b1    2
b2    4
b3    5
b4    8
b5    6
b6    5
b7    3
b8    2
b9    1
b10   8
c1    5
c2    11
c3    21
c4    14
c5    2
c6    0
c7    1
c8    16
c9    12
c10   16
d1    21
d2    22
d3    31
d4    33
d5    30
d6    24
d7    23
d8    25
d9    26
d10   27
  • 1
    Pure code-writing requests are considered off-topic on Stack Overflow. Questions here should relate to **specific** programming problems. Please provide a [**minimal reproducible example**](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610) where you show us [**the code you have tried**](http://whathaveyoutried.com), where you are stuck, and the expected output. This will also help us answer your question better. – Henrik Jul 23 '14 at 17:56
  • I am not sure how to write the code.....Sorry that I have a lot to learn about R.I have no idea where to start to write the code... –  Jul 23 '14 at 18:09
  • Is it your raw data or you've ended here after manipulations? Perhaps, an earlier step would require a more clean solution than something than can be attempted here? – alexis_laz Jul 23 '14 at 18:14
  • This is my raw data. Actually I have 60 columns of values and need to extract the 5 values for each column. I only present 1 column of data here. Thank you very much for your help! –  Jul 23 '14 at 18:30

1 Answers1

0

Using dat as the dataset in the example showed. (There would be easier ways...)

indx <- as.numeric(gsub("[[:alpha:]]","",dat$V1)) #V1 is the first column
l <- split(dat, indx)
l1 <- l[1:5]
l2 <- l[6:10]
indx2 <- sapply(l1, function(x) which.max(x$V2)) #V2 is the second column
sapply(seq_along(indx2), function(i) {x1 <- l2[[i]][2][indx2[i],]})
#[1] 24 23 25 26 27

Update

If you have more columns

dat1 <- dat
 set.seed(42)
dat1$V3 <- sample(1:25, 40,replace=TRUE)
indx <- as.numeric(gsub("[[:alpha:]]","",dat1$V1))
l <- split(dat1[,-1], indx)
l1 <- l[1:5]
l2 <- l[6:10]

indx2 <- lapply(l1, function(x) apply(x,2, function(y) y ==max(y)))

 t(sapply(seq_along(l2),function(i) {x1 <- l2[[i]]; x2<- indx2[[i]]; unique(x1[x2])}))
 #        [,1] [,2]
 #   [1,]   24   13
 #   [2,]   23   19
 #   [3,]   25   23
 #   [4,]   26   12
 #   [5,]   27   18
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thank you sooo much! I am just wondering how it works for 60 columns of data..Shall I use the same code? –  Jul 23 '14 at 19:10
  • @user3869846. No problem. Yes, you could use the same code. Please let me know if you encounter any errors. BTW, do you have missing values? – akrun Jul 23 '14 at 19:12
  • Thanks! There is no missing value. –  Jul 23 '14 at 19:48