-3

I'm looking for a way to perform a regex match on a string in Ruby and get the first match sub-string, and assign in to a variable. I have checked different solutions here in stack overflow but couldn't find a proper solution so far.

This is my string

/usr/share/filebeat/reports/ui/local/20200904_151507/API/API_Test_suite/20200904_151508/20200904_151508.csv

I need to get the first sub-string of 20200904_151507. well, this file path can change time to time. And also the sub-string. But the pattern is, date_time. In the regex below, I tried to get the first eight(8) numbers, _ and last six(6) numbers. here are the solutions I tried,

report_path[/^[0-9]{8}[_][0-9]{6}$/,1]
report_path.scan(/^[0-9]{8}[_][0-9]{6}$/).first

above report_path varibale has the full file path I have mentioned above. What did I do wrong here?

  • 1
    You misunderstood anchors. Remove them, and use `report_path[/[0-9]{8}_[0-9]{6}/]`, or `report_path[/(?<![0-9])[0-9]{8}_[0-9]{6}(?![0-9])/]` – Wiktor Stribiżew Sep 17 '20 at 11:12

1 Answers1

1

scan will return all substrings that matches the pattern. You can use match, scan or [] to achieve your goal:

report_path = '/usr/share/filebeat/reports/ui/local/20200904_151507/API/API_Test_suite/20200904_151508/20200904_151508.csv'

report_path.match(/\d{8}_\d{6}/)[0]
# => "20200904_151507"

report_path.scan(/\d{8}_\d{6}/)[0]
# => "20200904_151507"

# String#[] supports regex
report_path[/\d{8}_\d{6}/]
# => "20200904_151507"

Note that match returns a MatchData object, which may contains multiple matches (if we use capture groups). scan will return an Array containing all matches.

Here we're calling [0] on the MatchData to get the first match


Capture groups:

Regex allow us to capture multiples substring using one patern. We can use () to create capture groups. (?'some_name'<pattern>) allow us to create named capture groups.

report_path = '/usr/share/filebeat/reports/ui/local/20200904_151507/API/API_Test_suite/20200904_151508/20200904_151508.csv'

matches = report_path.match(/(\d{8})_(\d{6})/)
matches[0]       #=> "20200904_151507"
matches[1]       #=> "20200904"
matches[2]       #=> "151507"


matches = report_path.match(/(?'date'\d{8})_(?'id'\d{6})/)
matches[0]       #=> "20200904_151507"
matches["date"]  #=> "20200904"
matches["id"]    #=> "151507"

We can even use (named) capture groups with []

From String#[] documentation:

If a Regexp is supplied, the matching portion of the string is returned. If a capture follows the regular expression, which may be a capture group index or name, follows the regular expression that component of the MatchData is returned instead.

report_path = '/usr/share/filebeat/reports/ui/local/20200904_151507/API/API_Test_suite/20200904_151508/20200904_151508.csv'

# returns the full match if no second parameter is passed
report_path[/(\d{8})_(\d{6})/]
# => 20200904_151507

# returns the capture group n°2
report_path[/(\d{8})_(\d{6})/, 2]
# => 151507

# returns the capture group called "date"
report_path[/(?'date'\d{8})_(?'id'\d{6})/, 'date']
# => 20200904
Sumak
  • 927
  • 7
  • 21