25

I'm looking for a way to check if a string is periodic or not using JavaScript.

Sample string to match can be 11223331122333. Whereas, 10101 should not match.

Coming from python, I used the RegEx

/(.+?)\1+$/

But it is quite slow. Are there any string methods that can do the trick?

Bhargav Rao
  • 50,140
  • 28
  • 121
  • 140
  • 1
    Would `112233311223331122333` also match? And I'm guessing `112233311223331` wouldn't? – James Thorpe Dec 31 '15 at 13:25
  • @JamesThorpe Yep. Correct. The first matches but the second doesn't – Bhargav Rao Dec 31 '15 at 13:25
  • 1
    You need `^` in the beginning of the regex, or it will match: `"11010"`. – Andreas Louv Dec 31 '15 at 13:32
  • 1
    Have you tried making `.+` greedy - ie drop the `?` - it [seems to work ok](https://regex101.com/r/mX3sB6/1), not sure how it will compare speed wise, but may reduce backtracking a lot? (or may not, I'm not sure, could easily make it worse...) – James Thorpe Dec 31 '15 at 13:33
  • @BhargavRao i thought this was about regular expressions so i added the regex tag but then i read a little closer and figured that it wasn't really about regular expressions (i think the argument for the tag is still relevant but personally i didn't feel as strongly) hence the rollback. My bad for the edit in the first place – d0nut Dec 31 '15 at 19:51
  • Haha No problem @iismathwizard. Now people can revert the downvotes also!! :D – Bhargav Rao Dec 31 '15 at 19:52
  • How do you define "quite slow"? I tried this on a 500-character string with a repeating length of 6, and it took 57 microseconds. Greedy/non-greedy made no difference. –  Jan 01 '16 at 17:39
  • @torazaburo, my data is around 4kb. I dunno how to measure the no of chars. I'll reply back when I'm at home again. Thanks. – Bhargav Rao Jan 01 '16 at 18:01
  • A 7K string composed of the "1122333" segment took 70 microseconds to validate on my sort-of-fast desktop machine in Chrome. Non-greedy was about 30% slower. –  Jan 01 '16 at 18:09
  • @torazaburo, that's interesting. When I tested it on my system, the browser hung. I've tried it many times but it did hang. – Bhargav Rao Jan 01 '16 at 18:12
  • Firefox took 820µs. Non-greedy was twice as fast. Go figure. –  Jan 01 '16 at 18:17
  • @torazaburo I've been using Firefox. I don't have Chrome on my system as it is Ubuntu (non admin access). I'll try it out again on Chrome. Thanks for the info. – Bhargav Rao Jan 01 '16 at 18:24
  • @BhargavRao maybe you were using the regex without the ^ when your browser hung? Apart from being wrong, as pointed out by dev-null, it is O(n^3) instead of the O(n^2) of the correct regex(es). – Walter Tross Jan 01 '16 at 19:44
  • @WalterTross Yep, I was using it without the `^`. I am not much into regexes. I will test it out with the `^` and report the results. – Bhargav Rao Jan 01 '16 at 19:46
  • I have done http://jsperf.com/periodic-strings-1/2 . This time the test strings are random, of random length. Currently the length is evenly distributed in [4, 1000], and the characters are taken from "ATCG" (DNA...). Results are as expected: regexes are about 10 times slower than the function, with the greedy regex slightly slower than the lazy one. If feasible, you should adjust the test parameters to match your working conditions. – Walter Tross Jan 01 '16 at 23:26
  • Thank you @BhargavRao ! It would be useful if you could better define test conditions, though. The jsperf tests I have set up have 3 main parameters: min string length, max string length, and size of the alphabet. This may or may not be close to your working conditions, but if you tell us these 3 values, at least this kind of test is well defined. – Walter Tross Jan 10 '16 at 10:36

5 Answers5

26

The idea of the code below is to consider substrings of all lengths the original string can be divided into evenly, and to check whether they repeat across the original string. A simple method is to check all divisors of the length from 1 to the square root of the length. They are divisors if the division yields an integer, which is also a complementary divisor. E.g., for a string of length 100 the divisors are 1, 2, 4, 5, 10, and the complementary divisors are 100 (not useful as substring length because the substring would appear only once), 50, 25, 20 (and 10, which we already found).

function substr_repeats(str, sublen, subcount)
{
   for (var c = 0; c < sublen; c++) {
      var chr = str.charAt(c);
      for (var s = 1; s < subcount; s++) {
         if (chr != str.charAt(sublen * s + c)) {
            return false;
         }
      }
   }
   return true;
}

function is_periodic(str)
{
   var len = str.length;
   if (len < 2) {
      return false;
   }
   if (substr_repeats(str, 1, len)) {
      return true;
   }
   var sqrt_len = Math.sqrt(len);
   for (var n = 2; n <= sqrt_len; n++) { // n: candidate divisor
      var m = len / n; // m: candidate complementary divisor
      if (Math.floor(m) == m) {
         if (substr_repeats(str, m, n) || n != m && substr_repeats(str, n, m)) {
            return true;
         }
      }
   }
   return false;
}

Unfortunately there is no String method for comparing to a substring of another string in place (e.g., in C language that would be strncmp(str1, str2 + offset, length)).


Say your string has a length of 120, and consists of a substring of length 6 repeated 20 times. You can look at it also as consisting of a sublength (length of substring) 12 repeated 10 times, sublength 24 repeated 5 times, sublength 30 repeated 4 times, or sublength 60 repeated 2 times (the sublengths are given by the prime factors of 20 (2*2*5) applied in different combinations to 6). Now, if you check whether your string contains a sublength of 60 repeated 2 times, and the check fails, you can also be sure that it won't contain any sublength which is a divisor (i.e., a combination of prime factors) of 60, including 6. In other words, many checks made by the above code are redundant. E.g., in the case of length 120, the above code checks (luckily failing quickly most of the time) the following sublengths: 1, 2, 3, 4, 5, 6, 8, 10, 12, 15, 20, 24, 30, 40, 60 (in this order: 1, 60, 2, 40, 3, 30, 4, 24, 5, 20, 6, 15, 8, 12, 10). Of these, only the following are necessary: 24, 40, 60. These are 2*2*2*3, 2*2*2*5, 2*2*3*5, i.e., the combinations of primes of 120 (2*2*2*3*5) with one of each (nonrepeating) prime taken out, or, if you prefer, 120/5, 120/3, 120/2. So, forgetting for a moment that efficient prime factorization is not a simple task, we can restrict our checks of repeating substrings to p substrings of sublength length/p, where p is a prime factor of length. The following is the simplest nontrivial implementation:

function substr_repeats(str, sublen, subcount) { see above }

function distinct_primes(n)
{
   var primes = n % 2 ? [] : [2];
   while (n % 2 == 0) {
      n /= 2;
   }
   for (var p = 3; p * p <= n; p += 2) {
      if (n % p == 0) {
         primes.push(p);
         n /= p;
         while (n % p == 0) {
            n /= p;
         }
      }
   }
   if (n > 1) {
      primes.push(n);
   }
   return primes;
}

function is_periodic(str)
{
   var len = str.length;
   var primes = distinct_primes(len);
   for (var i = primes.length - 1; i >= 0; i--) {
      var sublen = len / primes[i];
      if (substr_repeats(str, sublen, len / sublen)) {
         return true;
      }
   }
   return false;
}

Trying out this code on my Linux PC I had a surprise: on Firefox it was much faster than the first version, but on Chromium it was slower, becoming faster only for strings with lengths in the thousands. At last I found out that the problem was related to the array that distinct_primes() creates and passes to is_periodic(). The solution was to get rid of the array by merging these two functions. The code is below and the test results are on http://jsperf.com/periodic-strings-1/5

function substr_repeats(str, sublen, subcount) { see at top }

function is_periodic(str)
{
   var len = str.length;
   var n = len;
   if (n % 2 == 0) {
      n /= 2;
      if (substr_repeats(str, n, 2)) {
         return true;
      }
      while (n % 2 == 0) {
         n /= 2;
      }
   }
   for (var p = 3; p * p <= n; p += 2) {
      if (n % p == 0) {
         if (substr_repeats(str, len / p, p)) {
            return true;
         }
         n /= p;
         while (n % p == 0) {
            n /= p;
         }
      }
   }
   if (n > 1) {
      if (substr_repeats(str, len / n, n)) {
         return true;
      }
   }
   return false;
}

Please remember that the timings collected by jsperf.org are absolute, and that different experimenters with different machines will contribute to different combinations of channels. You need to edit a new private version of the experiment if you want to reliably compare two JavaScript engines.

Walter Tross
  • 12,237
  • 2
  • 40
  • 64
  • Yep, also thanks for the explanation - does make it easier to follow (+1 from me too now!). @BhargavRao - just wondering how you're testing the speed of these? Is it something you could make public on jsperf.com or similar? Would be interesting to see some results of your original method compared to the alternatives. – James Thorpe Dec 31 '15 at 15:47
  • @JamesThorpe Nope. I have a few input files. I am just running the codes on them. And if possible (cc Walter), can you add a CW containing the different timings? (In [python] we do like that for a few questions). I don't know how to measure the times else I would have done it. – Bhargav Rao Dec 31 '15 at 15:50
  • [Here's the perf test](http://jsperf.com/periodicstrings/2) (cc @BhargavRao) - looks like an anchored lazy regex and the above functions far out perform the greedy one. The original unanchored lazy regex is also fast, but I'd be [dubious about it's results](https://regex101.com/r/mX3sB6/2). – James Thorpe Dec 31 '15 at 16:04
  • @JamesThorpe Thanks for that. I used to do an additional check, so I got good results on the original one. Anyway can you please post a CW answer? We all can edit it once we get other answers. – Bhargav Rao Dec 31 '15 at 16:06
  • @PAEz Thanks for that - though I'm on chrome 47 here, and the results are a bit different! – James Thorpe Dec 31 '15 at 18:42
  • thank you @PAEz, I built http://jsperf.com/periodicstrings/4 on yours. See my comment to the CW answer – Walter Tross Jan 01 '16 at 00:19
  • @WalterTross I've started a bounty, just to get more views. I'll award it to you if I don't get a better answer. – Bhargav Rao Jan 02 '16 at 13:46
  • Thank you, @BhargavRao. In the meantime I have found a much faster solution. Check out http://jsperf.com/periodic-strings-1/5 . It would be good, though, if you could provide some info on your input. How long are strings on average? What do they contain? How rare do you expect "periodic strings" to be? Etc. – Walter Tross Jan 03 '16 at 01:19
  • Hey that new version is actually good. Answers to questions. 1. (I am not damn sure as I don't have the server code) I just wrote a program to get the length, it is around 100000 characters. The browser did hang when I used the regex, so it might be a bit longer. 2. They contain alphanumeric characters. A-Z,0-9 (no lowercase) 3. I did not understand that question, The strings can be periodic or not at random intervals. – Bhargav Rao Jan 03 '16 at 08:44
  • @BhargavRao I've made http://jsperf.com/periodic-strings-1/6 which is the same as version 5 but with strings of the type you say, in the length range [19000,21000] (100 random strings per test). Above these lengths jsperf becomes totally unreliable. I've collected data on a Macbook Pro, this time. Regarding my question 3, I just wanted to know whether characters are totally random or not (and if not, in which way they are not). – Walter Tross Jan 03 '16 at 13:02
  • @WalterTross The characters are random. But I can tell one thing, the periodicity is usually around 100 to 120... I think this final version might gimme the correct output. I will try it out tomorrow (In about 6 to 8 hrs) and tell you a final time. (As it is working quite well with on my home system). I will accept and award you the bounty if it works. – Bhargav Rao Jan 03 '16 at 18:12
  • @WalterTross The third version worked in 23 out of 25 cases. (The other 2 which I suspect are faults from the server side and not with the code) Accepting this as an answer. But I will award the bounty on the last day. Also I found this in python [How can I tell if a string repeats itself in Python?](http://stackoverflow.com/q/29481088). Do check if it can be replicated in JS. Thanks a lot for your answer and time. – Bhargav Rao Jan 05 '16 at 16:34
12

One option is to continue using a regex, but to make it greedy by dropping the ?:

/^(.+)\1+$/

Depending on the exact input strings, it may reduce the amount of backtracking required and speed up the matching.

James Thorpe
  • 31,411
  • 5
  • 72
  • 93
  • @BhargavRao No worries - this potentially isn't a great answer as I think it will depend somewhat on the input strings being matched. There may still turn out to be a way that works better in the general case. – James Thorpe Dec 31 '15 at 13:39
  • I may be wrong, but I think that when there is no match, only the order of the attempts changes between the greedy and lazy version. When there is a match, I fear that on average it is found earlier by the lazy version. The real speedup here comes from the ^, which avoids all tests that are not anchored at the start of the string. – Walter Tross Dec 31 '15 at 15:58
  • @WalterTross Yes, I'm not sure that greedy matching is any better here either to be honest. Just putting together a jsperf test now - first time I've done one so may not be perfect... – James Thorpe Dec 31 '15 at 16:00
  • thanks for jsperf test. What really baffles me is that there seems to be no gain in anchoring. There must be an optimization going on. Sorry for not enhancing the set of test strings, to be honest I should be doing something else right now :-( – Walter Tross Dec 31 '15 at 16:44
  • 1
    @FengyangWang no, not true. Anyways, that missing gain was an artifact, possibly because of the order of tests. Now PAEz has added random test order in http://jsperf.com/periodicstrings/3, and the unanchored match has gone back to its place. In http://jsperf.com/periodicstrings/4 I have improved the test further by adding some test cases, and by removing the /.../gm modifiers that were unnecessary and probably detrimental for performance (if they were necessary, the OP would have used them). The only thing that's really missing is a realistic test set, but only the OP knows what that is. – Walter Tross Dec 31 '15 at 23:08
5

If a string is periodic:

  • The last element would be the last element of the period as well
  • The period length would divide the string length

So we can make a super greedy algorithm that takes the last element and checks for occurrences up until half of the length. When we find one, we check if the substring's length divides the main length and only after that we would test against the string.

function periodic(str){
    for(var i=0; i<=str.length/2; i++){
        if(str[i] === str[str.length-1] && str.length%(i+1) === 0){
            if (str.substr(0,i+1).repeat(str.length/(i+1)) === str){
        return true;
            }
        }
    }
    return false;
}
4

A direct approach is to divide the string into equal-sized chunks and test whether each chuck is the same as the first chunk. Here is an algorithm does that by increasing the chunk size from 1 to length/2, skipping chunk sizes that do not cleanly divide the length.

function StringUnderTest (str) {
    this.str = str;
    this.halfLength = str.length / 2;
    this.period = 0;
    this.divideIntoLargerChunksUntilPeriodicityDecided = function () {
        this.period += 1;
        if (this.period > this.halfLength)
            return false;
        if (this.str.length % this.period === 0)
            if (this.currentPeriodOk())
                return true;
        return this.divideIntoLargerChunksUntilPeriodicityDecided();
    };
    this.currentPeriodOk = function () {
        var patternIx;
        var chunkIx;
        for (chunkIx=this.period; chunkIx<this.str.length; chunkIx+=this.period)
            for (patternIx=0; patternIx<this.period; ++patternIx)
                if (this.str.charAt(patternIx) != this.str.charAt(chunkIx+patternIx))
                    return false;
        return true;
    };
}

function isPeriodic (str) {
    var s = new StringUnderTest(str);
    return s.divideIntoLargerChunksUntilPeriodicityDecided();
}

I have not tested the speed, though...

jpsecher
  • 4,461
  • 2
  • 33
  • 42
4

There is an answer which deserves mentioning for its sheer beauty. It's not mine, I have only adapted it from the Python version, which is here: How can I tell if a string repeats itself in Python?

function is_periodic(s)
{
   return (s + s.substring(0, s.length >> 1)).indexOf(s, 1) > 0;
}

Unfortunately, the speed is not on par with the beauty (and also the beauty has suffered a bit in the adaptation from Python, since indexOf() has a start parameter, but not a stop parameter). A comparison with the regex solution(s) and with the functions of my other answer is here. Even with strings of random length in [4, 400] based on a small 4 character alphabet the functions of my other answer perform better. This solution is faster than the regex solution(s), though.

This solution might be called the “phaseshift solution”. The string is treated like a wave which is identical to itself when shifting its phase.

The advantage of this solution over the ones of my other answer is that it can be easily adapted to return the shortest repeating substring, like this:

function repeating_substr(s)
{
    period = (s + s.substring(0, s.length >> 1)).indexOf(s, 1);
    return period > 0 ? s.substr(0, period) : null;
}
Community
  • 1
  • 1
Walter Tross
  • 12,237
  • 2
  • 40
  • 64
  • A small confession, I have already used your other code and pushed for the changes. So unfortunately I cannot test this code (Nor the other 2 [new](http://stackoverflow.com/a/34624246/4099593) [answers](http://stackoverflow.com/a/34642101/4099593)) and I feel very sad for that. However seeing the perf measurements I feel a bit happy that your other answer did perform better (The *grapes* were indeed *sour*). Hopefully it will be of help to others. – Bhargav Rao Jan 07 '16 at 09:19
  • Hi, for you reference http://jsperf.com/periodic-strings-1/10 (I'm using jsperf for the first time for this problem) – Bhargav Rao Jan 07 '16 at 21:18
  • @BhargavRao your test setup was wrong, I fixed it: http://jsperf.com/periodic-strings-1/11 – Walter Tross Jan 07 '16 at 22:02