2

I have a problem to find patterns in this data. My data looks like shown here.

df <- read.table(text="16000                                                  1
             16000                                                  2
             16000                                                  3
             16000                                                  5
             10000                                                  6
             10000                                                  9
             10000                                                 12
             10000                                                  12
             13000                                                  2 
             14000                                                 4",h=F,strin=F, col.names = c("Amount", "Month"))

In this case I would like to find patterns in the payments. So same amount and consecutive months:

 16000     1
 16000     2
 16000     3

This is a pattern that I look for. The payment in month=4 is not the same amount and the payment in month =5 does not belong to the pattern, because there was no payment in month =4. (Minimum 3 consecutive months with equal amounts.)

Other possible patterns are every three months. In the posted data:

10000     6
10000     9
10000     12

EDIT: I made a really bad solution looking for a better one.

 df<-df[order(df$Amount),]
 df$pattern <- FALSE
    for (i in 2:nrow(df)){
   if (df[i,1] == df[i -1 ,1] & df[i,1] == df[i +1 ,1]){
     if(df[i,2] == df[i-1,2]+1 & df[i,2] == df[i+1, 2]-1 ){
       df[i-1,3]<-TRUE
       df[i,3]<-TRUE
       df[i +1,3]<-TRUE
     }
   }
 }

This solution is bad because it is very slow and has a small mistake. It does not find the pattern in this situation, because the third time a value appears ordering does not work anymore. In addition to that the solution has problems to find the pattern for the last value of the data.

  df <- read.table(text="16000                                                        1
                   16000                                                  2
                   16000                                                  3
                   16000                                                  3
                   16000                                                  5
                   10000                                                  6
                   10000                                                  9
                   10000                                                  12
                   10000                                                  12
                   13000                                                  2 
                   14000                                                             
                   4",h=F,strin=F, col.names = c("Amount", "Month"))

@Moody_Mudskipper The expected Output can be a TRUE/False oder even better a distinct number for each pattern. If a row appears more then ones it is not part of the pattern. I want a marker/column for each pattern.

  • 1
    This seems more an algorithmic question and not about programming, which suggests it might get a better response on [CrossValidated](http://stats.stackexchange.com/). What code and/or heuristics have you tried so far? – r2evans Mar 04 '18 at 23:05
  • The problem is that the data is not very small and i can do it only in a very slow way. With building a sorted liste of values and that looking for i +1. Other then this I could not find anything. I am not sure if it is a statistical problem, because i know the pattern, but am not able to code it. – Grigorij Abramov Mar 04 '18 at 23:12
  • @GrigorijAbramov, so what you call a pattern is the same payment and months of the form n, n+a, n+2a, ..., n+ma, where m>=2, right? Or is there some other option? – Julius Vainora Mar 04 '18 at 23:30
  • What's the pattern then ? because you gave a couple examples, but the issue is still ambiguously defined as far as I can tell. Like does a pattern start at 3 or 2, what if a row is included in several patterns, what if a row is duplicated... And, of course, what is the expected output :)? – moodymudskipper Mar 04 '18 at 23:36
  • @Julius yes you are right; I will update my post, so you guys can see it better. – Grigorij Abramov Mar 04 '18 at 23:39
  • If you do `dplyr::mutate(df,d1=c(diff(Month),NA),d2=lag(d1))`, you can visually see the grouping based on the common differences in the two columns. I suggest that as a starting point in place of doing cell-wise comparisons. (In general, try to do things vectorized, not cell-wise, you'll get *huge* improvements in performance.) – r2evans Mar 05 '18 at 00:05

0 Answers0