0

I'm fairly new here and also fairly new to R so apologies if anything is unclear.

Basically, I have a csv table of numbers for each person, 1 number for each week for 38 weeks.

For example, Anthony has number 6 in week 1, 12 in week 2 and so on, these numbers are fairly random and range from 1-20.

I have taken the numbers from the table and saved them into a string, hence Anthonys string when printed would look like

"6 12 18 7 17 4 16 11 20 15 3 5 19 10 8 9 1 14 13 19 11 16 18 4 17 7 6 12 14 1 10 13 20 15 3 5 8 9"

What I'm trying to do with this is find/count the amount of times a number between 1 and 10 occurs in groups of 3 consecutively and then groups of 4 consecutively and possibly 5.

For example, in this string 8, 9 and 1 occur consecutively and then 3, 5, 8 and 9 occur consecutively, meaning the amount of occurrences is 2.

I've tried using str_count from the stringr package and also tried a few different functions located here - Count the number of overlapping substrings within a string

I can't seem to find a method/function to get this to output what I want (a simple count of the number of occurrences).

If anyone could provide any insight/help it would be greatly appreciated.

Community
  • 1
  • 1
Emmett
  • 13
  • 5
  • Are you looking at strings, at numbers, or strings of numbers? A [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) would be very useful here, at a minimum `dput(head(x,n=20))` and what code you've already tried (even if it errors or gives incorrect results). – r2evans Mar 22 '17 at 20:14
  • `length(regmatches(s, gregexpr("(?=\\b\\d(?:\\s\\d){2}\\b)", s, perl=TRUE))[[1]])` – Wiktor Stribiżew Mar 22 '17 at 20:18

1 Answers1

1

It would be easier to keep these as numbers. Here I use scan() to turn your string into a vector of values indicating if each number is less than 10 or not then I call rle() on it to calculate run lenths

x <- "6 12 18 7 17 4 16 11 20 15 3 5 19 10 8 9 1 14 13 19 11 16 18 4 17 7 6 12 14 1 10 13 20 15 3 5 8 9"
rr <- rle(scan(text=x)<10)

Now I can mangle this into a data.frame and see which runs were longer than 2

subset(as.data.frame(unclass(rr)), values==T & lengths>2)
#    lengths values
# 9        3   TRUE
# 17       4   TRUE

So we can see that we had a run of 3 and a run of 4.

I could clean this up by defining a function to turn the rle into a data.frame more easily and track the starting indexes

as.data.frame.rle <- function(x) {
    data.frame(unclass(x), start=head(cumsum(c(0,rr$lengths))+1,-1))
}

and can then run

subset(as.data.frame(rle(scan(text=x)<10)), values==T & lengths>2)
#    lengths values start
# 9        3   TRUE    15
# 17       4   TRUE    35

so we can see those runs start at positions 15 and 35.

MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • And 9 and 17 are row names from the `data.frame`. You could interpret those rows as being from the 9th and 17th "group" but that doesn't really matter. To find the total lenght you'd want to save some intermediate step. You could take the `length()` of the result from `scan()` or you could sum the `lengths` values in the `data.frame`. Just save the value to a variable before calling `subet()`. – MrFlick Mar 22 '17 at 21:36