0

i need get all urls from the text file using regex. But not all url, url that start by some template. For example. I have text:

{"Field_Name1":"http://google.ru","FieldName2":
"["some text", "http://example.com/view/...&id..&.."]",
"FieldName3": "http://example.com/edit/&id..."}someText"
["some text", "http://example.com/view/...&id..&.."]",
"FieldName3": "http://example.com/view/&id..."}someText2{..}someText.({})

I need take all urls like http://example.com/view/..... I try use this regex, but it doesn't work. Maybe i have some mistake in it.

 ^(http|https|ftp)\://example\.com\/view\/+[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(:[a-zA-Z0-9]*)?/?[^\.\,\)\(\s]$

I'm not need pure url checker, I need checker that can get url that start by some template

handless
  • 311
  • 4
  • 12
  • See [*JavaScript Regex to match a URL in a field of text*](http://stackoverflow.com/questions/8188645/javascript-regex-to-match-a-url-in-a-field-of-text). – Wiktor Stribiżew Oct 28 '15 at 09:00
  • Possible duplicate of [regex for URL including query string](http://stackoverflow.com/questions/2343177/regex-for-url-including-query-string) – Christoph Brückmann Oct 28 '15 at 09:01
  • @stribizhev but what about "example.com/VIEW/...." i think with that part i have problem in my regex – handless Oct 28 '15 at 09:08
  • Wouldn't it be better to parse the json? – Yaron Oct 28 '15 at 11:43
  • @Yaron but if it's not valid json 'cause i have something like this "{json valid}Some text" so with "Some text" we have not valid json. and in each iteration Some text have different length – handless Oct 28 '15 at 12:10

3 Answers3

1

What about this?

((ftp|http[s]?):\/\/example.com\/view\/.*?)\"

The first part until "/view/" should be clear. The rest ".*?)\"" means, show me everything before a double quote.

Buxmaniak
  • 460
  • 2
  • 4
0

I think this will work! I gave it a go on regexr.com and it seemed to select just the url part, given that the text string doesn't actually have multiple periods in a row.

(?!")h.+.+[a-z]*

EDIT: Made a better one, or at least I think I did. Basically the expression says: "look for a quotation mark, and if the next character is an h then include that in the match and also make that the starting point, and then include any characters after that leading to a single period, followed by any lower case letters. There could be a million of them. As long as there was a period before it, you're good, and it wont select beyond that unless theres another period after the string.

shaunxer
  • 52
  • 11
0

Universal:

/(ftp|http|https)\:\/\/([\d\w\W]*?)(?=\")/igm 

Template:

/(ftp|http|https)\:\/\/example\.com\/view\/([\d\w\W]*?)(?=\")/igm 
HovyTech
  • 299
  • 5
  • 19
  • While this regex may answer the question, providing additional context regarding why and/or how this code answers the question improves its long-term value. – JAL Oct 28 '15 at 14:02