Regex for First Line (Only) that Contains a String

Question

I have a bunch of phone numbers with one per line:

[Home] (202) 121-7777 
C (202) 456-1111
[mobile] 55 55 5 55555 
[Work] (404) 555-1234 
[Cell] (505) 555-1234
W 303-555-5555
M 777-555-5555
c 12346567s

I want to grab the first one that contains the letter "c" upper or lower case.

So far, I have this /^.*[C].*$/i and that matches C (202) 456-1111, [Cell] (505) 555-1234 and c 12346567s. How do I return only the first? In other words, the match should only be C (202) 456-1111.

I have been blindly putting question marks everywhere without success.

I am using Ruby if it makes a difference http://www.rubular.com/r/h6ReB9IN8t

Edit: Here is another question that Hrishi pointed to but I cannot figure out how to adapt it to match the whole line.

maybe you should look at this question:http://stackoverflow.com/questions/519572/return-first-match-of-ruby-regex — Hrishi, Aug 21 '13 at 10:57
Thanks. I saw that one too but I must be missing something obvious. I will add it to the references, but it did not solve my problem. — JHo, Aug 21 '13 at 11:06
is `[Cell] (505) 555-1234` number your candidate solution ?? i mean it starts does not start with C. — Sahil Dhankhar, Aug 21 '13 at 11:41

user2631151 · Answer 1 · 2013-08-21T15:31:42.247

2

Try match method. Here is an example:

list = <<EOF
[Home] (202) 121-7777 
C (202) 456-1111
[mobile] 55 55 5 55555 
[Work] (404) 555-1234 
[Cell] (505) 555-1234
W 303-555-5555
M 777-555-5555
c 12346567s
EOF

Update

#match line with "c" letter in line, even that are part of word
puts list.match(/^.*C.*$/i) 

#match line with "c" letter in line, that are not a part of word
puts list.match(/^\W*C\W.*$/i)

edited Aug 21 '13 at 15:31

answered Aug 21 '13 at 11:20

user2631151

70
3

this does not even work :) please have a look at : http://www.rubular.com/r/h6ReB9IN8t – Sahil Dhankhar Aug 21 '13 at 11:29
I tryed it in my rubby and it returns only first match. Regexp does match 3 instances, but match method returns only first. – user2631151 Aug 21 '13 at 11:34
this also matches [Cell] (505) 555-1234 , i am not very sure if the question is considering this as a valid input. If yes your solution is correct and i will upvote you :) – Sahil Dhankhar Aug 21 '13 at 11:44
If you want to match only "c" letters, that are not in word you can try this regexp: http://www.rubular.com/r/O7YdvZhCXR – user2631151 Aug 21 '13 at 12:06
`/^\W*C\W.*$/i` will be better, than my previous regexp. – user2631151 Aug 21 '13 at 15:22

score 1 · Answer 2 · edited Aug 21 '13 at 13:18

1

Split the string by the new line characters, and select the substring which matches your requirements and grab the first one:

str = '[Home] (202) 121-7777 
C (202) 456-1111
[mobile] 55 55 5 55555 
[Work] (404) 555-1234 
[Cell] (505) 555-1234
W 303-555-5555
M 777-555-5555
c 12346567s'

p str.split(/\n/).select{|el| el =~ /^.*[C].*$/i}[0]

or use match:

p str.match(/^.*[C].*$/i)[0]

EDITED:

Or, in case you want to find the first chunk that exactly starts with C try this:

p str.match(/^C.*$/)[0]

edited Aug 21 '13 at 13:18

the Tin Man

158,662
42
215
303

answered Aug 21 '13 at 11:21

Yevgeniy Anfilofyev

4,827
25
27

this also matches `[Cell] (505) 555-1234` , i am not very sure if the question is considering this as a valid input. If yes your solution is correct and i will upvote you :) – Sahil Dhankhar Aug 21 '13 at 11:44
FYI, `[0]` is not preferred ruby syntax. You should be using `.first` – Dan Grahn Aug 21 '13 at 11:44
str.match(/^.*[C].*$/i)[0] works. I was hoping for a purely regular expression answer but this gets the job done in one step, without an additional loop. – JHo Aug 21 '13 at 11:49
@sahildhankhar , did you run the code? Both parts return `C (202) 456-1111` ;) – Yevgeniy Anfilofyev Aug 21 '13 at 12:00
@screenmutt , could you provide reference to `preferred` syntax, please? – Yevgeniy Anfilofyev Aug 21 '13 at 12:01
@YevgeniyAnfilofyev [Top Ruby Style Guide](https://github.com/bbatsov/ruby-style-guide) and [Ruby convention for accessing first and last array elements](http://stackoverflow.com/questions/18212240/ruby-convention-for-accessing-first-last-element-in-array). – Dan Grahn Aug 21 '13 at 12:03
1

@screenmutt , there are just opinions and styles. I didn't see any reasonable arguments. I could have my own style as long as I don't violate syntax http://ruby-doc.org/core-2.0/Array.html#method-i-5B-5D – Yevgeniy Anfilofyev Aug 21 '13 at 12:14
@YevgeniyAnfilofyev It's about readability. `first` is preferred because it is easier to read than `[0]`. When posting code to SO, you should follow language standards. See [this answer](http://stackoverflow.com/a/18212600/1669208). – Dan Grahn Aug 21 '13 at 12:17
@YevgeniyAnfilofyev it depends if the input is str = '[Cell] (505) 555-1234[Home] (202) 121-7777 C (202) 456-1111 [mobile] 55 55 5 55555 [Work] (404) 555-1234 W 303-555-5555 M 777-555-5555 c 12346567s' the output will be different and yes i did run the code :) – Sahil Dhankhar Aug 21 '13 at 12:20
I can only quote the reason why this topic was closed: `Many good questions generate some degree of opinion based on expert experience, but answers to this question will tend to be almost entirely based on opinions, rather than facts, references, or specific expertise.` – Yevgeniy Anfilofyev Aug 21 '13 at 12:23
@sahildhankhar , I understood this question as `find the first occurence with C, c or [C in any case`... – Yevgeniy Anfilofyev Aug 21 '13 at 12:25
@YevgeniyAnfilofyev yes make sense, i guess this is what the asker wants. you have my upvote now :) – Sahil Dhankhar Aug 21 '13 at 12:57
@sahildhankhar , I'm not so sure. Here could be a scenario where record has a lot of phones and I need to find first cell phone. And each cell phone could be coded in different way: C , c or [Cell]. So... – Yevgeniy Anfilofyev Aug 21 '13 at 13:03

Dan Grahn · Answer 3 · 2013-08-21T11:49:26.430

EDIT Added two more ways of handling this. The last one is preferable.

This will do what you want. It will search for matches of your regex, and then get the first one. Please note that this will produce an error if string does not have any matches.

string = "[Home] (202) 121-7777 
C (202) 456-1111
[mobile] 55 55 5 55555 
[Work] (404) 555-1234 
[Cell] (505) 555-1234
W 303-555-5555
M 777-555-5555
c 12346567s"

puts string.match(/^(.*[C].*)$/i).captures.first
puts string.match(/^(.*[C].*)$/i)
puts string[/^(.*[C].*)$/i]

Ruby Docs String#match.

the Tin Man · Accepted Answer · 2013-08-21T14:48:23.660

I'd go about this a bit differently. I prefer to reduce regular expressions to very simple patterns:

str = <<EOT
[Home] (202) 121-7777
C (202) 456-1111
[mobile] 55 55 5 55555
[Work] (404) 555-1234
[Cell] (505) 555-1234
W 303-555-5555
M 777-555-5555
c 12346567s
EOT

Finding the right line to work with is easily done using either select or find:

str.split("\n").select{ |s| s[/c/i] }.first # => "C (202) 456-1111"
str.split("\n").find{ |s| s[/c/i] } # => "C (202) 456-1111"

I'd recommend find because it only returns the first occurrence.

Once the desired string is found, use scan to grab the numbers:

str.split("\n").find{ |s| s[/c/i] }.scan(/\d+/) # => ["202", "456", "1111"]

Then join them. When you have phone numbers stored in a database you don't really want them to be formatted, you just want the numbers. Formatting occurs later when you're outputting them again.

phone_number = str.split("\n").find{ |s| s[/c/i] }.scan(/\d+/).join # => "2024561111"

When you need to output the number, break it into the right grouping based on the regional phone-number representation. You should have some idea where the person is located, because you've usually also got their country code. Based on that you know how many digits you should have, plus the groups:

area_code, prefix, number = phone_number[0 .. 2], phone_number[3 .. 5], phone_number[6 .. 9] # => ["202", "456", "1111"]

Then output them so they're displayed correctly:

"(%s) %s-%s" % [area_code, prefix, number] # => "(202) 456-1111"

As far as your original pattern /^.*[C].*$/i, there are some things wrong with your understanding of regex:

^.* says "start at the beginning of the string and find zero or more characters", which is no more effective than saying /[C].
Using [C] creates an unnecessary character-set which means "find one of the letters in the set "C"; It does nothing useful, so just use C as /C.
.*$ artificially finds the end of the string also, but since you're not capturing it there's nothing accomplished, so don't bother with it. The regex is now /C/.
Since you want to match upper and lower-case, use /C/i or /c/i. (Or you could use /[cC]/ but why?)

Instead:

To find a "c" or "C" anywhere in the string, just use /c/i. That's all that's needed. http://rubular.com/r/uPyxACOWls
To find "c", "C" or "cell" or "Cell", you can use /c(?:ell)?/. http://rubular.com/r/TkSRPWG2y6
To find "c", "C", "cell" or "Cell" as a separate word, use word-break markers like /\bc(?:ell)?\b/. http://rubular.com/r/Smo0bFs9w8

You can get a whole lot more complicated, but if you're not accomplishing anything with the additional pattern information, you're just wasting the regex-engine's CPU-time, and slowing your code. A confused regex-engine can waste a LOT of CPU-time, so be efficient and aware of what you're asking it to do.

This provides the answer and instruction on improving the code. Thanks for the effort. — JHo, Aug 22 '13 at 12:30

Regex for First Line (Only) that Contains a String

4 Answers4