0

I have an input string consisting of a sequence of real numbers separated by a single space. It is also acceptable for the string to contain only one real number (no spaces). My goal is to check whether the string structure matches the following (in this order):

  • optional (0/1): minus (-)
  • 1/more digits
  • optional (1+): a period and 1/more digits
  • optional (0+): a group consisting of a space and the first group (the first three bullet points)

It should describe the string completely. If not, it should print an error message and exit.
My current regular expression is ^(-?\d+(\.?\d)*)( \1)*$ which I thought would be okay, but even the first group doesn't match all the real numbers individually. And I need it to check the string from the beginning to the end, including the spaces.

My code for this function looks like this:

import re
def structure_check(string):
    structure = r"^(-?\d+(\.?\d)*)( \1)*$"
    if re.match(structure,string):
        return("OK")
    else:
        print("Input error")
        exit()

It should accept strings like: 15 35 -45 8 -2.3 4564.18 56 etc., but it doesn't correspond to changes in the input (doesn't match) at all. It shouldn't match if there is too many spaces, incorrectly placed . or -, or if there are other characters than digits, periods, dashes (-) and spaces.

I could also do this with just the first group while iterating over a list created by splitting the input string by space, but I would prefer to check it according to my main goal, since I wouldn't have to split the input in the validation function and also to save some more code lines by checking the input alltogether (eg. for excess spaces, or unsupported characters, which I'd have to otherwise check separately).

Sorry if I missed any answered questions, I couldn't find any appropriate for my problem in Python. If you know about any, feel free to link them, please. And thank you, I am a beginner and started learning regex for a project just about yesterday.

Ad.H.Jes.
  • 31
  • 6
  • Define **too many spaces**? – dawg Feb 02 '21 at 14:30
  • @dawg Meaning two or more (I need this to be consistent for the future split). The broader regex would probably also filter out a redundant space(s) at the end of the string. – Ad.H.Jes. Feb 02 '21 at 14:34
  • @Ad.H.Jes. Was your intent to reuse a part of the regex pattern? – The fourth bird Feb 02 '21 at 18:15
  • @the-fourth-bird Well, yes, as a possibility, but not necessarily. Since my pattern wasn't that long, I had no problem repeating it, just thought it would be a more elegant solution. The main goal was to check the strict structure of the whole input string. – Ad.H.Jes. Feb 03 '21 at 01:32

4 Answers4

3

You can use:

^((?:[+-]?\d+(?:[.]\d+)?)(?:[ \t]|$))*$ 

Demo and explantation

I added + to the optional sign. If you only want to match with no sign or -, just remove that from the optional character class.

dawg
  • 98,345
  • 23
  • 131
  • 206
2

You could also use an unrolled version to prevent matching a space at the end.

^-?\d+(?:\.\d+)?(?: -?\d+(?:\.\d+)?)*$

Regex demo


The backreference \1 will match exactly what is matched in group 1 and for your pattern will match for example 123 123 123

If you want to repeat the group, you could recurse the first group using the PyPi regex module and (?1)

^(-?\d+(?:\.\d+)?)(?: (?1))*$

See a Python example

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
1

In JavaScript you can use the method .test of regex. The regex should work in python.

let ok = /^(([+\-]?\d+(\.\d+)?)( |$))+$/.test("15 35 -45 8 -2.3 4564.18 56");

console.log(ok);

Explanation: (.\d+)? You must make the whole group optional. The number can be followed by a space or the end of a string ( |$). The pattern is repeated throughout the string so I wrapped the entire expression in a group. Insert ^ at the beginning of the regex and $ at the end of the regex to force the regex to check the string completely.

Andre Marasca
  • 270
  • 1
  • 6
1

Problem is in your regexp, to be specific, in ( \1)* part. This, described, means: space and string that was matched in group 1 zero or more times Thus, your regexp will match for the following, for example:
15 15 15
-5.3 -5.3 -5.3 -5.3

And so on.

To fix the regexp, I would replace the group reference with the actual group, like so:
^(-?\d+(\.?\d)*)( -?\d+(\.?\d)*)*$

I would also point out that this regexp allows the numbers to have multiple decimal dots, (e.g. 1.2.3 passes) however I'm not sure if that's intended or not.

Kryštof Vosyka
  • 566
  • 3
  • 15
  • Thanks, I didn't notice the multiple decimal dots problem, I suppose that replacing the `*`s with `?`s after the group with the dot shloud fix this: `^(-?\d+(\.?\d)?)( -?\d+(\.?\d)?)*$` – Ad.H.Jes. Feb 02 '21 at 14:55
  • Your answer has helped me a lot, since I'm a beginner. I also found another mistake - the regex would only match numbers with only one digit after the decimal point. I've added `+` after the `\d` where needed (2x). ```^(-?\d+(\.?\d+)?)( -?\d+(\.?\d+)?)*$``` – Ad.H.Jes. Feb 02 '21 at 17:12
  • Last edit: I got rid of the 0/1 rule (`?`) for the decimal dots. `^(-?\d+(\.\d+)?)( -?\d+(\.\d+)?)*$` – Ad.H.Jes. Feb 04 '21 at 23:29