Can't extract substring from string using regexp

Question

My first post here and it "obviously" has to be about regexp (the nightmare of all beginner devs)

I have a string: s = "Shadowborn Apostle \r\nCreature — Human Cleric \r\nA deck can have any number Of \r\ncards named Shadowborn Apostle. \r\ne, Sacrifice six creatures named \r\nShadowborn Apostle: Search your \r\nlibrary for a Demon creature card \r\nand put it onto the battlefield. Then \r\nshuffle your library. \r\n"

I would like to extract only this part Shadowborn Apostle(space)

I use .match to get the substring I want: s.match(/^[^\\]+/)

Unfortunately, MatchData = the whole string. And I'm not sure why. Any help would be appreciated.

Thanks!

The fourth bird · Accepted Answer · 2018-05-26T09:22:19.920

0

Your regex ^[^\\]+ matches from the start of the string until it encounters the first backslash, including the whitespace because the negated character class matches not a backslash one or more times.

Maybe you can match any character one or more times non greedy .+? and use a positive lookahead ^.+?(?= \\r)

Demo

If you want to match Shadowborn Apostle followed by a whitespace in the text you could also use a word boundary \b at the start to make sure it is not part of a longer match and use a positive lookahead at the end (?= ) to assert what follows is a whitespace.

\bShadowborn Apostle(?= )

Demo

edited May 26 '18 at 09:22

answered May 26 '18 at 09:16

The fourth bird

154,723
16
55
70

1

Thank you so much! This solves my problem. All I had to do, in my case, is to properly use `.scan` method instead of match/gsub etc. (which made me ask the question - I wasn't sure which method I should use) Also, your regex is obviously much cleaner. Even if I used mine I'd still have to remove the trailing white space. – May 26 '18 at 10:28
Unfortunately, I have a problem - an empty array when I try to run .scan on the string: Demo [http://tpcg.io/mcb7Ry] – May 26 '18 at 10:55
1

The answer: https://stackoverflow.com/questions/40027321/ruby-scan-method-returns-empty-using-regex Your regex used \\r which was incorrectly parsed by Ruby's method for some reason. Thanks again! – May 26 '18 at 11:07
The OP wants to return a string that ends with a space character if the space immediately precedes the line feed character, `\r`. That will not be done if the lookahead `(?= \r)` (not `(?= \\r)`) is used. Also you should not refer to "whitespace" when you mean "space", as they are not the same. – Cary Swoveland May 26 '18 at 18:12
@CarySwoveland Thank you for your comment. I have removed my previous comment and will keep that in mind for the future. – The fourth bird May 26 '18 at 19:04

Cary Swoveland · Answer 2 · 2018-05-27T04:47:08.463

Your regular expression /^[^\\]+/ attempts to match one or more characters at the beginning of a line that are not backslashes. The backslash character (ASCII 92) is written 92.chr #=> "\\", whereas the line feed character (ASCII 13) is written 13.chr #=> "\r".¹

You therefore want /\A[^\r]+/.

Note that I've used the beginning of string anchor, \A, rather than the beginning of line anchor, ^. Consider the following.

"\r\ndog \r".match(/\A[^\r]+/) #=> nil
"\r\ndog \r".match(/^[^\r]+/)  #=> #<MatchData "dog ">

Whether to use \A or ^ depends on what you want to achieve. Henceforth I will assume it is \A that you want. (You should make that clear, however, by editing the question. As written, the desired substring need not start at the beginning of the string or a line.)

Continuing,

r = /\A[^\r]+/
m = s.match(r) #=> #<MatchData "Shadowborn Apostle ">
m[0] #=> "Shadowborn Apostle "

or (in place of m[0]):

$&   #=> "Shadowborn Apostle "

or simply:

s[r] #=> "Shadowborn Apostle "

See MatchData#[] and String#[].

If the ending space is optional this is fine. If, however, the string must end with a space, we must make a slight adjustment to the regex:

r = /\A[^\r]+ /

Lastly, here is another way to obtain the desired substring that does not use a regular expression:

 idx = s.index(" \r")
   #=> 18
 idx.nil? ? nil : s[0, idx+1]
   #=> "Shadowborn Apostle "

 idx = "How now, brown cow".index(" \r")
   #=> nil
 idx.nil? ? nil : s[0,idx+1]
   #=> nil

See String#index.

^{1 Why not a single backslash (/^[^\]+/)? Because Ruby would start the character class ("["), read 'negate' ("^") an escaped right bracket "\]" (interpreted as the character "]"), and "+". As the next character, "/", terminates the regular expression, she would conclude that the character class was not closed and therefore raise an exception (SyntaxError).}

Thank you for taking the time to throughly explain the problem and solution. Have a great Sunday! — , May 27 '18 at 12:06

Can't extract substring from string using regexp

2 Answers2