Sum number of 1s and 0s in a dataset into five bins in r - vector length varies

Question

forgive if this is obvious, but I am very new to R.

What I would need to do is to divide a dataset consisting of a series of 0s and 1s to five chunks, summing up the 1s in each chunk.

So,

1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1

should result in:

2,1,1,0,3

The thing that makes this slightly tricky is that there is variation in the exact number of characters per vector, so instead of 25 ones and zeros like in the example, some might be 21, some 26, some 23, etc.

Regardless of the varying length of the vectors, I would need the resulting sums in five bins.

The reason for doing this is that I work in linguistics and digital humanities with medieval and early modern texts. I am testing whether abbreviations are more likely to occur towards the end of the line in manuscripts and early printed books. What I want to find out whether the number in the fifth column ends up being larger than the rest, and run a chi-square test to determine whether the results are statistically relevant.

Thank you very much in advance!

EDIT: Thanks for linking to the previous thread, Cath. My question differs from it, because I need to sum up the bins (so, not by much, I guess...)

How should the bins look like when the number of characters is not divisible by 5? — LAP, Oct 19 '17 at 10:55
LAP: the function you posted adds uneven number of 0s or 1s first to the first bin. This is acceptable for me. — Alpo Honkapohja, Oct 19 '17 at 16:09
zx8754: if the length is 10, I would still want 5 chunks. However, I am leaving out lines of very irregular length. I will still end up with a few thousand lines of data. — Alpo Honkapohja, Oct 19 '17 at 16:10

LAP · Answer 1 · 2017-10-19T11:14:21.993

A possible solution to divide a vector into five chunks would be:

test <- rep(c(0,1,0), 7)
chunk2 <- function(x,n) split(x, cut(seq_along(x), n, labels = FALSE)) ## stolen from here: 
https://stackoverflow.com/questions/3318333/split-a-vector-into-chunks-in-r

> test
 [1] 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0

Use the chunk2 function on your vector, choose 5 bins:

chunks <- chunk2(test, 5)
> chunks
$`1`
[1] 0 1 0 0 1

$`2`
[1] 0 0 1 0

$`3`
[1] 0 1 0 0

$`4`
[1] 1 0 0 1

$`5`
[1] 0 0 1 0

Then just lapply sum over the list:

> lapply(chunks, sum)
$`1`
[1] 2

$`2`
[1] 1

$`3`
[1] 1

$`4`
[1] 2

$`5`
[1] 1

Thank you kindly, this seems to do what I need it to do! – Alpo Honkapohja Oct 19 '17 at 16:06 — Alpo Honkapohja, Oct 19 '17 at 16:06

Sum number of 1s and 0s in a dataset into five bins in r - vector length varies

1 Answers1