4

This is one of my interview questions. I didn't come up with a good enough solution and got rejected.

The question was

What is the one regex to match all urls that contain job(case insensitive) in the relative   
path(not domain) in the following list:

    - http://www.glassdoor.com/job/ABC
    - https://glassdoor.com/job/
    - HTTPs://job.com/test
    - Www.glassdoor.com/foo/bar/joBs
    - http://192.168.1.1/ABC/job
    - http://bankers.jobs/ABC/job

My solution was using lookahead and lookbehind, /(?<!\.)job(?!\.)/i. This works fine in above lists. However, if the url is HTTPs://jobs.com/test, it will not work.

I am wondering what is the correct answer for this question. Thanks in advance for any suggestions!

mitchelllc
  • 1,607
  • 4
  • 20
  • 24
  • For a full URL: http://stackoverflow.com/questions/161738/what-is-the-best-regular-expression-to-check-if-a-string-is-a-valid-url – Jess Jan 28 '14 at 16:15
  • 2
    Wow, never thought there are people who would test you on your regex skills ... Anyways, it depends on the language you're using. If you're using PCRE you might use `(?im)^(?://|[^/])++/(?=.*job).*$`, [see demo](http://regex101.com/r/aM3kQ8). In ruby `^(?:\/\/|[^\/])++\/(?=.*job).*$`, [see demo](http://rubular.com/r/fWhbEYsG9Z) – HamZa Jan 28 '14 at 16:16
  • Another full URL example: http://stackoverflow.com/a/15640775/1804678 – Jess Jan 28 '14 at 16:22
  • 1
    HamZa: Why would you not test on regex skills? Regexes are a programming language on their own, and if the job requires frequent use of regexes, it's perfectly reasonable to test a candidate's proficiency. – Andy Lester Jan 28 '14 at 16:57
  • @HamZa Could you please explain the meaning of `++`, I understand one `+` means one and unlimited times. So what's meaning of `++`? – mitchelllc Jan 28 '14 at 17:00
  • @AndyLester I would love to have a job full of regexes :) – HamZa Jan 28 '14 at 17:08
  • @mitchelllc It's a possessive quantifier, it doesn't backtrack. Check [this](http://stackoverflow.com/q/5319840) and [this](http://stackoverflow.com/q/1117467) out – HamZa Jan 28 '14 at 17:08
  • @Andy Lester - The problem is not the regex skill test, its the tester's skill. –  Jan 28 '14 at 18:54

4 Answers4

2

Try this regex:

/\b(?:https?:\/\/)?[^\/:]+\/.*?job/gmi

Online Demo: http://regex101.com/r/rV3oP8

anubhava
  • 761,203
  • 64
  • 569
  • 643
2

If you don't need to validate the url, just focus on 'job'

 #  /(?i)(?<=\/)job(?=\/|[^\S\r\n]*$)/

 (?i)
 (?<= / )
 job
 (?= / | [^\S\r\n]* $ )
  • This answer might be more efficient than the others because it does not have to parse a valid URL first. – Jess Jan 28 '14 at 19:23
  • @Jess - Yep, there is a URL regex about 20K bytes to do that I guess. –  Jan 29 '14 at 02:00
1

Here is one that I came up with:

^(?:.*://)?(?:[wW]{3}\.)?([^:/])*/.*job.*

It matches all of your examples, but not the ones with job.com or jobs.com. (jobs is only in the path.)

I tested this in sublime text which is nice b/c the regex result is highlighted as you type.

Jess
  • 23,901
  • 21
  • 124
  • 145
0

i was also asked this question during the interview and here is my solution: /./+job/?./i it works well on Rubular.com

Zzz...
  • 291
  • 6
  • 19