4

Take for example the following strings

0.714285714285714285714285714285714285714285
0.111111111111111111111111111111111111111111
0.166666666666666666666666666666666666666666

I want to find the sub string that is repeating repetition for each.

714285
1
6

How can I do this in python. Using regex is okay, I tried the following:

import re

testString = "0.714285714285714285714285714285714285714285"
print(re.search(r"(.+)\1", testString).group(1)) 

This gives me the (wrong) output:

714285714285714285

It should be 7814285

How do I fix this? Is there way to improve my regex or is regex the wrong tool for this job? Maybe python has an awesome built in for this? Is there anyway to do use this with or without regex?

EDIT Before posting an answer check with the test case

0.0022271714922048997772828507795100222717149220489977728285077951002227171492204899777282850779510022

It should return 00222717149220489977728285077951

  • 3
    Try this [`(.+?)\1+`](https://regex101.com/r/nB0lX0/1) – Tushar May 31 '16 at 15:10
  • 3
    And if the string is: `1122112211221122` what must be the result? – Casimir et Hippolyte May 31 '16 at 15:12
  • Possible duplicate of [Find longest repetitive sequence in a string](http://stackoverflow.com/questions/11090289/find-longest-repetitive-sequence-in-a-string) – leovp May 31 '16 at 15:14
  • @CasimiretHippolyte I think `1122`. I am trying to find what is the repeating part in a decimal – user89239213892389 May 31 '16 at 15:14
  • What about `0.714285714285714285714285714285714285714286`? I. e. what if the last repetition is incomplete or rounded? – Tim Pietzcker May 31 '16 at 15:14
  • @TimPietzcker That one should still match `7142857` I won't have a situation where it's rounded but it is very likely one to have parts cut off – user89239213892389 May 31 '16 at 15:16
  • OK, then what about `0.01234567901234567` (=1/81)? Or can you be sure that the repeating part occurs at least twice in its completeness? – Tim Pietzcker May 31 '16 at 15:18
  • @TimPietzcker Actually it should match even if repeating part does not occour completely twice – user89239213892389 May 31 '16 at 15:19
  • 2
    Possible duplicate of [How can I tell if a string repeats itself in Python?](http://stackoverflow.com/q/29481088)? – Bhargav Rao May 31 '16 at 15:28
  • Out of curiosity, what is this routine being developed for? What are you ultimately trying to make? – amphetamachine May 31 '16 at 15:32
  • @amphetamachine https://projecteuler.net/problem=26 – user89239213892389 May 31 '16 at 15:34
  • I don't think it's possible if you don't even have one proper repetition. How should you be able to tell from `0.121` whether it's going to continue as `0.121212` or as `0.12345`? – Tim Pietzcker May 31 '16 at 15:34
  • Can you please explain that what's the logic behind your desire outputs. You can give a complete explanation with examples in comments. – Mazdak May 31 '16 at 15:35
  • In the projecteuler's problem they show when the number will repeat using parentheses – Washington Guedes May 31 '16 at 15:36
  • This question is getting unclear. After your last edit your last string is contradictory with the firsts since the last has a greedy match but you want to get the non greedy for the first cases. How do you define what is greedy and what is non greedy? For instance, for `0.1666` you want `6` but for `002227` you don't them but want the most greedy part? – Federico Piazza May 31 '16 at 15:36
  • @BhargavRao This question is not completely about a string that repeats itself. OP wants to find the repeating part in decimal part which can also be contain different strings – Mazdak May 31 '16 at 15:36
  • @Kasravand Thanks,I was unsure and hence I did not Hammer. Perhaps the OP can take a cue or two from the linked post. – Bhargav Rao May 31 '16 at 15:38
  • @FedericoPiazza I apologize if the questions has gotten convoluted, it's become much more complicated than I imagined. I'm just trying to get the repeating part of a decimal. – user89239213892389 May 31 '16 at 15:38
  • @TimPietzcker I think you are right, lets assume the string always has atleast 2 repetitions. – user89239213892389 May 31 '16 at 15:39
  • @user89239213892389 As I told you, before your question gets close, update your question with a complete explanation. – Mazdak May 31 '16 at 15:41
  • You've discarded several answers answering your question. I assume that's because you've failed to give all the criteria for your "search". `(.+?)\1` actually works, only it finds the first repeating part, which in, for example `1122112211221122` *is* `11`, even if there is a longer repeat later. In your other example, you want the "short" repeat `714285714` even though there's a longer (which you've already discovered) - `714285714714285714714285714` - since there are more that six+ repetitions of `714285714`. – SamWhan May 31 '16 at 15:41
  • 3
    @BhargavRao Yeah, but I thinks we should vote to closing the question as unclear :-). – Mazdak May 31 '16 at 15:41
  • So based on your last example, the first example should return `714285714285714285`? – Mazdak May 31 '16 at 15:51
  • Please check my answer. – Pedro Lobito May 31 '16 at 15:52
  • @Kasravand I don't understand what you mean by that, I don't get where the contradiction is? It should always return the repeating part of the decimal. – user89239213892389 May 31 '16 at 16:07
  • There are a lot of forms of repetition and you didn't clarified what kind of it is your mean. – Mazdak May 31 '16 at 16:15
  • @Kasravand I'm using the same definition of repetition as project euler 26. a `recurring cycle in its decimal fraction part` – user89239213892389 May 31 '16 at 16:16
  • Give a look in this [post.](http://codegolf.stackexchange.com/a/78938/52956) – Washington Guedes May 31 '16 at 17:28

2 Answers2

1

You can give a try to this pattern:

(?=(\d+)\1+(.*))(\d+?)\3+\2$

demo

or to obtain the substring as whole match (group 0):

(?=(\d+)\1+(.*))(\d+?)(?=\3+\2$)

What does exactly the pattern?

It returns, for a position in the string, the smallest repeated substring that spans the larger part of the string.

How does it work?

In a lookahead is described the largest repeated substring with a greedy quantifier (i.e. (\d+)), followed by its repetitions \1+, followed by the end of the string captured in group 2.

Then, once the lookahead closed, (\d+?)\3+ searches this time the smallest repeated substring with a non-greedy quantifier but with a condition: after the repetitions, the end of the string must be the same than the one captured in the lookahead.

This ensures that the substring in group 3 can't be sliced into a smaller repeated substring.

Results

The searched substring is in the group 3.

If you use the pattern as it (i.e. non-anchored), the first repeated substring on the left is returned.

Obviously, if you only want a result that starts after the dot you need to anchor the pattern with it:

\.(?=(\d+)\1+(.*))(\d+?)\3+\2$ # immediately after the dot

or

\..*?(?=(\d+)\1+(.*))(\d+?)\3+\2$ # the first after the dot

If you want to research repeated substrings for each positions in the string (for example to find the largest whatever the starting position), you need to enclose all the second part in a lookahead too and to use re.findall:

(?=(\d+)\1+(.*))(?=(\d+?)\3+\2$)

(Then feel free to sort the result list, if you want to obtain the largest string whatever the starting position)

Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
  • fails when all same, I expected 1 gave the length. – kelalaka Oct 07 '18 at 00:07
  • @kelalaka: I don't understand your comment. Can you give an example string and explain clearly what you think is wrong or unexpected (and with which pattern)? – Casimir et Hippolyte Oct 07 '18 at 12:12
  • for example, if my string all 0 of any length >1, then the period is expected 1. I didn't get the result. – kelalaka Oct 07 '18 at 13:56
  • @kelalaka: The patterns (as asked in the question) don't return a length (or a "period") but a match (the repeating substring), and the patterns return the expected result: 0 (in capturing group 3 for the [first pattern](https://regex101.com/r/UXz2hE/2), the whole match for the [second pattern](https://regex101.com/r/UXz2hE/1)). – Casimir et Hippolyte Oct 07 '18 at 14:10
  • yes, I tried to use it to determine the period of my sequences with rho structure. – kelalaka Oct 07 '18 at 14:20
0

To match your last update:

(\d\d)(\d+?)(?=\1)

http://ideone.com/OpqJ9c


Note:
The answer below was valid before your last update. You may have to consider using something more than a regex, which isn't a programming language.


import re
testString = "0.714285714285714285714285714285714285714285"
print(re.search(r"(\d)(\d+?)(?=\1)", testString).group(0)) 
#714285

Demo

(\d)(\d+?)(?=\1)

Match the regex below and capture its match into backreference number 1 «(\d)»
   Match a single character that is a “digit” (ASCII 0–9 only) «\d»
Match the regex below and capture its match into backreference number 2 «(\d+?)»
   Match a single character that is a “digit” (ASCII 0–9 only) «\d+?»
      Between one and unlimited times, as few times as possible, expanding as needed (lazy) «+?»
Assert that the regex below can be matched, starting at this position (positive lookahead) «(?=\1)»
   Match the same text that was most recently matched by capturing group number 1 (case insensitive for A-Z; fail if the group did not participate in the match so far) «\1»
Pedro Lobito
  • 94,083
  • 31
  • 258
  • 268