-2

I need some help with the Regular Expressions. I need a RegEx that matches with characters if they are after a semicolon AND in the same line of a previous word.

Let me explain that:

enter image description here

I need something like this. I have to make a function that does not allow to introduce character after a semicolon in the same line, and I think I could do it with this sort of RegEx.

Thank you.

Carlos Pérez
  • 167
  • 1
  • 10
  • Have you tried anything before asking? There are lots of tutorials of regular expressions out there. – Miguel Feb 26 '21 at 13:26

3 Answers3

2

I am not sure I understood your question, but would something like this help? This regular expression

2

Well, you've got two ways to do it:

  • A: Create a regular expression to validate correct input.

  • B: Create a regular expression to find incorrect input.

I would use option 1, but it depends on what you need to do.

A: Regex to validate correct lines

In this case, we'll use the m modifier to set the regex engine to search by line (m = multiline). This means that ^ matches the beginning of a line and $ matches the end of a line.

Then we want to match some characters which are not the semicolon itself. To do this we use the [^ ] group meaning "anything which is not in the provided list of characters". So to say any char except the semicolon we'll have to use [^;].

Now, this char is not alone as they'll be probably many of them. To do that we can either use the * or + operators that respectively mean "0 or more times" and "1 or more times". If the data before the semicolon is mandatory then we'll use the + operator. This leads to [^;]+ to say any char which is not a semicolon, 1 or more times.

Then we'll capture this with the () operators. This will let us have direct access to this value without having to take the line and remove the semicolon with a truncation by our own.

After this capturation, we have the semicolon and then maybe some empty spaces or not and then the end of the line. For the spaces after, it's up to you. It would be \s* to say any kind of space, tab or blank char 0 or n times.

At the end we get this regex: ^([^;]+);\s*$ with the m and g flags

m for multiline and g for global, which means don't stop at the first match but look for all of them.

Test it here: https://regex101.com/r/sT59eu/1/

B: Regex to find invalid lines

Well, this could be rather easy too: ;.+$

. means any char. So here we'll find the lines with something behind the semicolon.

Test it here: https://regex101.com/r/ocDofm/1/

But you will NOT find lines with missing semicolons!

Patrick Janser
  • 3,318
  • 1
  • 16
  • 18
1

if I understand it correctly, (?<=;)[A-Za-z]+ might does your work. The python documentation is helpful: https://docs.python.org/3/library/re.html