0

This comes as an application to this question:Sum object in a column between an interval defined by another column

What I would like to know is how to adjust the answer if I want to sum the values in B, for ((A[i+1]-A[i]==0) or (A[i+1]-A[i]==1) or (A[i]-A[i-1]==0) or (A[i]-A[i-1]==1)) where i is the row index, so basically sum B rows for A-s that have the same value +/- 1, but not sum the same row twice?

I tried building a loop function but I get stuck when using row indices with data frames. Example: If the following data frame is given

df     
      A B
[1,]  1 4
[2,]  1 3
[3,]  3 5
[4,]  3 7
[5,]  4 3
[6,]  5 2

What I want to obtain is the next data frame:

df
      A B
[1,]  1 7
[2,]  3 15
[3,]  5 2

Moreover if a have a large data frame like this:

df
chr     start           stop            m       n       s
chr1    71533361        71533362        23      1       -
chr1    71533361        71533362        24      26      -
chr1    71533361        71533362        25      1       -

and I want my result to look like this (I chose the row for which the value in column m is max):

df
chr1    71533361        71533362        24      28      -
Community
  • 1
  • 1
Nanami
  • 3,319
  • 3
  • 19
  • 19
  • Can you please be more clear about how the second df emerges from the first one? I you want a moving window sum with A +-1, it is easy, but what do you mean by "not sum the same row twice"? I have difficulty seeing what you need. – Maxim.K May 02 '13 at 08:34
  • So for A=1 I just sum the values in B for row 1 and 2 in the first df, for 3 and 4 i sum rows 3,4,5 from first df. As row 5 has already been added, row 6 remains the same. – Nanami May 02 '13 at 09:01

2 Answers2

1

Try the following, assuming your original dataframe is df:

df2 <- df # create a duplicate df to destroy
z <- data.frame(nrow=length(unique(df$A)), ncol=2) # output dataframe
names(z) <- c("A","B")
j <- 1 # output indexing variable
u <- unique(df$A) # unique vals of A
i <- u[1]
s <- TRUE # just for the while() loop
while(s){
    z[j,] <- c(i,sum(df2[df2$A %in% c(i-1,i,i+1),2]))
    df2 <- df2[!df2$A %in% c(i-1,i,i+1),]
    j <- j + 1 # index the output
    u <- u[!u %in% c(i-1,i,i+1)] # cleanup the u vector
    if(length(u)==0) # conditionally exit the loop
        s <- FALSE
    else
        i <- min(u) # reset value to sum by
}

I know that's kind of messy code, but it's a sort of tough problem given all of the different indices.

Thomas
  • 43,637
  • 12
  • 109
  • 140
  • This works fine for a data frame with only two columns, but what if df would have 4 columns and i want to have this information in the new df, z? I will add an example to the question. – Nanami May 02 '13 at 10:49
  • Do you want to sum each of the columns separately? Then you would basically just add columns to `z` and change this line `z[j,] <- c(i,sum(df2[df2$A %in% c(i-1,i,i+1),2]))` to have additional elements referring to each column of the original dataframe, like: `z[j,] <- c(i,sum(df2[df2$A %in% c(i-1,i,i+1),2]), sum(df2[df2$A %in% c(i-1,i,i+1),3]))` to get the sums for columns 2 and 3, respectively. – Thomas May 02 '13 at 10:53
0

I would create a for loop that tests whether A[i] - A[i-1] meets your criteria.

If that is true it adds b[i] to a sum variable and repeats its way through.

Because i is just iterating through A[] it shouldn't count anything from B[] twice.

Keith
  • 1