0

I would like to create a regex for a 24h unix timestamp starting from, say: 01/01/2015 00:00:00 **(1420066800)** to 01/01/2015 23:59:59 **(1420153199)**, which is a difference of 86399 sec. in the unix time stamp format.

I'm using the range_regex python lib, but it's buggy for such a huge ranges. The range_to_pattern method (range_to_pattern(1420066800, 1420153199)) would produce a regex of: 1420[0-1][5-6][3-6][1-8]\\d{2} This is fine for the static bounds to create the regex, but when it comes to values like: 1420159111 since the 7 digit (9) from the left is not in the third range group ([3-6]).

Can someone provide a better python3 lib or a workaround on how to create a regex for 86400 sec. of a day?

Ralph Lo
  • 33
  • 1
  • 10
  • 3
    Why a regex? Just use integer comparison. – Phylogenesis Jul 26 '17 at 12:31
  • 2
    Possible duplicate of [Using regular expressions to validate a numeric range](https://stackoverflow.com/questions/22130429/using-regular-expressions-to-validate-a-numeric-range) – Sebastian Simon Jul 26 '17 at 12:32
  • It is probably not a regex problem. – not_python Jul 26 '17 at 12:32
  • This kind of regex is uneadable, FYI, the first element of it is: `1420(?:066[89][0-9]|06[7-9][0-9]{2}|0[7-9][0-9]{3}` – Toto Jul 26 '17 at 12:35
  • @Phylogenesis: It is a regex problem. I have a thousand files where the filename contains a unix timestamp. What I want to do is to have an efficient way to collect all the files within 24h and put them to an archive. The fastes way I can think of to find these files is using a regex. – Ralph Lo Jul 26 '17 at 12:41
  • @Toto I know it is unneadable. Since I wouldn't like to invent the wheel again, do you know a python lib which creates the regex you mentioned for me? – Ralph Lo Jul 26 '17 at 12:43
  • @RalphHeerich [this library](https://github.com/dimka665/range-regex) claims to do what you want, but I haven't tested it at all. – Phylogenesis Jul 26 '17 at 12:43
  • @Phylogenesis: It is the lib I'm already using. As I said, it's buggy for ranges like what I have. But anyway, thanks for the suggestion. – Ralph Lo Jul 26 '17 at 12:46
  • 1
    Sorry, I don't know other lib. But I'd do a script that read the directory, transform the timestamp to date then compare with a reference date. – Toto Jul 26 '17 at 12:49
  • You can use a regex to identify the pattern, but by extracting the timestamp as a group you can easily extract the timestamp and use datetime to perform the comparison on the extracted group, as @Toto suggested. In addition to likely being faster (as a simpler regex), it will make your intent much clearer. – K. Nielson Jul 26 '17 at 12:59
  • Looking at the library, you should be using `range_to_regex()` rather than `range_to_pattern()`. – Phylogenesis Jul 26 '17 at 13:06

2 Answers2

1

As per my comment above, you are using the wrong function from that library.

You should use the following:

range_to_regex(1420066800, 1420153199)

This returns the correct regex:

142006680\d|14200668[1-9]\d|14200669\d{2}|142006[7-9]\d{3}|14200[7-9]\d{4}|14201[0-4]\d{4}|142015[0-2]\d{3}|1420153[0-1]\d{2}
Phylogenesis
  • 7,775
  • 19
  • 27
  • I just saw that the newest source code on github is more recent than the one I installed with pip. I might just clone the github lib and try to get along with that. Thank you very much! – Ralph Lo Jul 26 '17 at 13:13
1
# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"1420([0]([6]([6]([8]([0][0-9])|[9][0-9]{2})|[7-9][0-9]{3})|[7-9][0-9]{4})|[1]([5]([3]([1]([9][0-9]|[0-8][0-9]{1})|[0][0-9]{2})|[0-2][0-9]{3})|[0-4][0-9]{4}))"

test_str = ("01/01/2015 00:00:00 (1420066800) до 01/01/2015 23:59:59 (1420153199)\n\n"
    "1420016799     -no\n"
    "1420066799     -no\n"
    "1420066800     -yes\n"
    "1420066801     -yes\n"
    "1420067820     -yes\n"
    "1420067920     -yes\n"
    "1420073199     -yes\n"
    "1420103199     -yes\n"
    "1420152191     -yes\n"
    "1420153181     -yes\n"
    "1420153199     -yes\n"
    "1420153200     -no\n"
    "1420163199     -no")

matches = re.finditer(regex, test_str)

for matchNum, match in enumerate(matches):
    matchNum = matchNum + 1

    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1

        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

Online: https://regex101.com/r/blnST4/1