-2

I've got a document that looks something like this:

# Document ID 8934
# Last updated 2018-05-06
52 84 12 70 23 2 7 20 1 5
4 2 7 81 32 98 2 0 77 6
(..and so on..)

In other words, it starts off with a few comment lines, then the rest of the document is just a bunch of numbers separated by spaces.

I'm trying to write a regex that gets all digits on all lines that don't start with #, but I can't seem to get it.

I've read over answers such as

  1. Regular Expressions: Is there an AND operator?

  2. Regex: Find a character anywhere in a document but only on lines that begin with a specific word

    and pawed through sites such as http://regular-expressions.info, but I still can't get an expression that works (the best I can get is a lengthy version of ^[^#].*

So how can I match digits (or text, or whatever) in a string, but only on lines that don't start with a certain character?

Community
  • 1
  • 1
Grayda
  • 1,870
  • 2
  • 25
  • 43
  • @rkta That just gets all lines that start with a hash. I want to combine that with something like `\d+` to get all digits on lines that don't start with a hash, as my question stated – Grayda May 06 '18 at 07:51
  • which is the context of this? Probably, it would be a better idea to first check if the line begins with `#`, and then, use `\d+` on each line that doesn't begin with `#` – alsjflalasjf.dev May 06 '18 at 08:06
  • Question to ask yourself is : why use regex when the problem can be solved writing two lines of code? – Yassin Hajaj May 06 '18 at 08:28
  • @YassinHajaj part of this is because it's a challenge that'd test my regex knowledge. I've worked around it by using two lines, but now I'm curious about a one-regex answer. – Grayda May 06 '18 at 08:33
  • @Grayda Oh alright that's fine – Yassin Hajaj May 06 '18 at 08:35
  • As easy as `(?<!^#.*?)\d+`, use with PyPi Python `regex` module, .NET regex, or in the latest versions of Chrome (JavaScript compatible with ECMAScript 2018 standard). – Wiktor Stribiżew May 06 '18 at 09:41

2 Answers2

1

Your regex ^[^#].* uses a negated character class which matches not a # from the start of the string ^ and after that matches any character zero or more times. This would for example also match t test

What you might do is use an alternation to match a whole line ^#.*$ that starts with a # or capture in a group one or more digits (\d+)

Your digits are captured group 1. You could change the (\d+) to for example a character class ([\w+.]+) to match more than only digits.

(?:^#.*$|(\d+))

Details

  • (?: Non capturing group
    • ^#.*$ Match from the start of the line ^ a # followed by any character zero or more times .* until the end of the string $
    • | Or
    • (\d+) capture one or more digits in a group
  • ) Close non capturing group
The fourth bird
  • 154,723
  • 16
  • 55
  • 70
0

I think a way simpler method would be to replace the lines with "" first with this regex:

^#.*

And then you can just match all the numbers with this:

-?\d+ (-? is for negative)
Sweeper
  • 213,210
  • 22
  • 193
  • 313
  • My code eventually used two lines as you suggested (though slightly different, using `^[^#]+` to catch all non-hashed lines then simply `\d+` for the digits), though I was interested to see if there was a "one liner", which there was via @the fourth bird's answer. – Grayda May 06 '18 at 12:37