How to understand gsub(/^.*\//, '') or the regex

Question

Breaking up the below code to understand my regex and gsub understanding:

str = "abc/def/ghi.rb"
str = str.gsub(/^.*\//, '')
#str = ghi.rb

^ : beginning of the string

\/ : escape character for /

^.*\/ : everything from beginning to the last occurrence of / in the string

Is my understanding of the expression right?

How does .* work exactly?

Actually, `^` is the anchor for begin of line. Begin of string is `\A`. For single-line strings, both work the same. — undur_gongor, Dec 21 '15 at 12:20
Nothing wrong with your regex, but `File.basename(str)` might be more appropriate. — Stefan, Dec 21 '15 at 13:47
This is well documented around the internet and in the Regexp documentation. http://meta.stackoverflow.com/questions/261592/how-much-research-effort-is-expected-of-stack-overflow-users — the Tin Man, Dec 21 '15 at 21:01

ndnenkov · Accepted Answer · 2015-12-21T12:49:45.230

4

Your general understanding is correct. The entire regex will match abc/def/ and String#gsub will replace it with empty string.

However, note that String#gsub doesn't change the string in place. This means that str will contain the original value("abc/def/ghi.rb") after the substitution. To change it in place, you can use String#gsub!.

As to how .* works - the algorithm the regex engine uses is called backtracking. Since .* is greedy (will try to match as many characters as possible), you can think that something like this will happen:

Step 1: .* matches the entire string abc/def/ghi.rb. Afterwards \/ tries to match a forward slash, but fails (nothing is left to match). .* has to backtrack.
Step 2: .* matches the entire string except the last character - abc/def/ghi.r. Afterwards \/ tries to match a forward slash, but fails (/ != b). .* has to backtrack.
Step 3: .* matches the entire string except the last two characters - abc/def/ghi.. Afterwards \/ tries to match a forward slash, but fails (/ != r). .* has to backtrack.
...
Step n: .* matches abc/def. Afterwards \/ tries to match a forward slash and succeeds. The matching ends here.

edited Dec 21 '15 at 12:49

answered Dec 21 '15 at 12:08

ndnenkov

35,425
9
72
104

If the steps in quoted formatting are from another site or person, it's important to give proper attribution to them. If they're not quoted from another or person, then using quoting is the wrong format used. – the Tin Man Dec 21 '15 at 20:56
@theTinMan, I see quoting often times used for other things (like long lines of logs). If you have a suggestion on how the entire block of text can look as if it is in different highlighting while at the same time leaves me with the option to **bold** and `code` inside I will use that. – ndnenkov Dec 21 '15 at 21:00
Even long logs shouldn't be quoted. People use that because it forces wrapping, rather than extract the bare minimum necessary to demonstrate the problem, so don't emulate that thinking. Is it necessary to change the background color? Quoting does that to call out it's from another source. If it's your content then allow it to be the normal background and use the existing formatting for numbered lists. That way, *when* they change the CSS for the site, the information fits the look and feel, not what you thought it should have been a particular day. – the Tin Man Dec 21 '15 at 21:04
@theTinMan, so you offer no alternative. You have to understand that people will use quoting for practical purposes given no other option. For example, this answer would be a lot less readable if it was all in white background. And sometimes people can't extract the essential information in their log and that is the point of their question. Should logs never be used in questions? – ndnenkov Dec 21 '15 at 21:09
There are alternates, it's just people don't want to believe those work. The vast majority of the site uses very simple formatting and it works nicely. The guidelines specify to reduce input information to the minimum necessary, and almost every log can be reduced greatly to provide the necessary information. That people don't reduce is more an example that they didn't want to expend the effort to figure out what is essential. But that's a different topic that's often discussed on [meta]. – the Tin Man Dec 21 '15 at 22:05

sawa · Answer 2 · 2015-12-21T12:00:08.267

No, not quite.

^: beginning of a line
\/: escaped slash (escape character is \ alone)
^.*\/ : everything from beginning of a line to the last occurrence of / in the string

.* depends on the mode of the regex. In singleline mode (i.e., without m option), it means the longest possible sequence of zero or more non-newline characters. In multiline mode (i.e., with m option), it means the longest possible sequence of zero or more characters.

score 1 · Answer 3 · edited Dec 21 '15 at 20:57

1

Your understanding is correct, but you should also note that the last statement is true because:

Repetition is greedy by default: as many occurrences as possible 
are matched while still allowing the overall match to succeed.

Quoted from the Regexp documentation.

edited Dec 21 '15 at 20:57

the Tin Man

158,662
42
215
303

answered Dec 21 '15 at 11:55

Ivaylo Strandjev

69,226
18
123
176

1

Rather than say "here" for anchor text, provide something a bit more useful. See http://www.w3.org/QA/Tips/noClickHere and http://www.w3.org/TR/WCAG10-HTML-TECHS/#link-text for explanations why. – the Tin Man Dec 21 '15 at 20:59
@theTinMan thank you for the advice. I believe it is useful and I will try to follow it in my future posts. – Ivaylo Strandjev Dec 22 '15 at 07:06

score 0 · Answer 4 · edited Dec 21 '15 at 20:59

0

Yes. In short, it matches any number of any characters (.*) ending with a literal / (\/).

gsub replaces the match with the second argument (empty string '').

edited Dec 21 '15 at 20:59

the Tin Man

158,662
42
215
303

answered Dec 21 '15 at 11:55

Piotr Kruczek

2,384
11
18

the Tin Man · Answer 5 · 2015-12-22T18:14:26.147

Nothing wrong with your regex, but File.basename(str) might be more appropriate.

To expound on what @Stefen said: It really looks like you're dealing with a file path, and that makes your question an XY problem where you're asking about Y when you should ask about X: Rather than how to use and understand a regex, the question should be what tool is used to manage paths.

Instead of rolling your own code, use code already written that comes with the language:

str = "abc/def/ghi.rb"
File.basename(str) # => "ghi.rb"
File.dirname(str) # => "abc/def"
File.split(str) # => ["abc/def", "ghi.rb"]

The reason you want to take advantage of File's built-in code is it takes into account the difference between directory delimiters in *nix-style OSes and Windows. At start-up, Ruby checks the OS and sets the File::SEPARATOR constant to what the OS needs:

File::SEPARATOR # => "/"

If your code moves from one system to another it will continue working if you use the built-in methods, whereas using a regex will immediately break because the delimiter will be wrong.

How to understand gsub(/^.*\//, '') or the regex

5 Answers5

Linked