1

I'm trying to search a field in a database to extract URLs. Sometimes there will be more than 1 URL in a field and I would like to extract those in to separate variables (or an array).

I know my regex isn't going to cover all possibilities. As long as I flag on anything that starts with http and ends with a space I'm ok.

The problem I'm having is that my efforts either seem to get only 1 URL per record or they get only 1 the last letter from each URL. I've tried a couple different techniques based on solutions other have posted but I haven't found a solution that works for me.

Sample input line: Testing http://marko.co http://tester.net Just about anything else you'd like.

Output goal $var[0] = http://marko.co $var[1] = http://tester.net

First try: if ( $status =~ m/http:(\S)+/g ) { print "$&\n"; }

Output: http://marko.co

Second try: @statusurls = ($status =~ m/http:(\S)+/g); print "@statusurls\n";

Output: o t

I'm new to regex, but since I'm using the same regex for each attempt, I don't understand why it's returning such different results.

Thanks for any help you can offer.

I've looked at these posts and either didn't find what I was looking for or didn't understand how to implement it:

This one seemed the most promising (and it's where I got the 2nd attempt from, but it didn't return the whole URL, just the letter: How can I store regex captures in an array in Perl?

This has some great stuff in it. I'm curious if I need to look at the URL as a word since it's bookended by spaces: Regex Group in Perl: how to capture elements into array from regex group that matches unknown number of/multiple/variable occurrences from a string?

This one offers similar suggestions as the first two. How can I store captures from a Perl regular expression into separate variables?

Solution: @statusurls = ($status =~ m/(http:\S+)/g); print "@statusurls\n";

Thanks!

Community
  • 1
  • 1
McLuvin
  • 125
  • 2
  • 10

1 Answers1

3

I think that you need to capture more than just one character. Try this regex instead:

m/http:(\S+)/g
gpojd
  • 22,558
  • 8
  • 42
  • 71
  • Your answer was the push I needed in the right direction. I used what you posted and moved the ( to the left of the http because I needed that in the result. Thanks! – McLuvin Aug 04 '11 at 13:34