1

I have a large data frame of 1129 rows and 4662 columns. I want to sum the row values in a data frame at intervals of every 3 columns, and then return 1 for each of these sums if the row sum every 3 columns was >0, or return 0 if the sum<1. I have added a small reproducible example below. I would like to sum the row values of column 1 to column 3, and then the row values from column 4 to column 8 (and so on in my real data).

df <- read.table(text ="     2005-09-23_2005-09-26  2005-09-27_2005-10-30  2005-10-07_2005-10-08  2005-10-09_2005-10-10  2005-10-11_2005-10-12  2005-10-13_2005-10-14
1  1       0     1     1     1     1           
2  1       1     0     0     0     0     
3  NA      NA    NA     NA     NA     0", header = TRUE)

The result I am after would be this:

result <- read.table(text ="     2005-09-23_2005-10-08  2005-10-09_2005-10-14
1  1       1           
2  1       0     
3  NA      0", header = TRUE)

I looked for similar questions and it seems that rollapply (R: summing over an interval of rows) OR rowsum could work (R: summing over an interval of rows), but I can't find a way to sum rows using columns as intervals instead of rows, nor how to do it in a repetitive sequence. Would someone be so kind to help me with some code for doing this? Thank you very much!

AnnK
  • 189
  • 1
  • 10
  • 2
    I need more clarity - (a) *"sum the row values.. at intervals of every 7 columns"* do you mean sum columns 1 to 7, columns 8 to 14, etc? Or do you mean sum column 1 + 8 + 15 + ..., columns 2 + 9 + 16 +...? (b) *"and then return 1 for each of these sums if the row sum every 3 columns was >0 or 0 if this sum<1.*" Where does `3` come from? Is this a typo and is supposed to be `7`? If the sum is `0.5`, it is both >0 and <1, so what should the answer be? (c) Your example data has less than 7 columns, not sure how to proceed... – Gregor Thomas Jun 16 '20 at 14:56
  • 1
    Hi Gregor, sorry for not being clear, I have edited my question. So I would like to sum the row values from columns 1 to 3, then row values from columns 4 to 6, and then row values from columns 7 to 9, etc. Pretty much I need to have my data clumped or aggregated every 3 columns, which in my real data represent days...that is aggregate the data as explained above in intervals of three days. I hope this helped clarify. – AnnK Jun 16 '20 at 15:40

1 Answers1

1

This works only if the number of columns is divisible by the interval.

+(sapply(split.default(df,unlist(lapply(1:(ncol(df)/3),rep,3))),rowSums) > 0)
   1  2
1  1  1
2  1  0
3 NA NA

maybe someone else can find a more elegant way of creating the split other than
unlist(lapply(1:(ncol(df)/3),rep,3))

Daniel O
  • 4,258
  • 6
  • 20
  • Rather than `split`ting, you could use `apply` with most any of [these answers](https://stackoverflow.com/q/15265512/903061)... not sure that it would be any more elegant in the end. Something like `apply(df, 1, function(x) colSums(matrix(x, nrow = 3))) > 0`, but then you'd need to manipulate the output to get it in the right format... – Gregor Thomas Jun 16 '20 at 15:45