1

I have a url that contains a filename. I would like to create a function that uses a regular expression to isolate a file name and then save it as a variable. Setting up the function, and saving the string as a variable is fairly straight forward. I am struggling with regular expression to isolate the string.

Below is an example of a url that I am working with.

http://some-website.s3.amazonaws.com/lovecraft-05.epub?AWSAccessKeyId=KJHFHGFDSXF&Expires=3568732&Signature=%3JHF%3KUHF%2Bnuvnu%5LHF%3D

I would like to grab the filename located in between "/" and "?"

So the value I am looking for is "lovecraft-05.epub"

  • 1
    What programming language do you use, as regex has different flavors? – MaxZoom Jun 29 '15 at 22:38
  • possible duplicate of [Regex Match all characters between two strings](http://stackoverflow.com/questions/6109882/regex-match-all-characters-between-two-strings) – l'L'l Jun 29 '15 at 22:40
  • I'm not sure. I am using WordPress. In the past, I search until I find something, then simply copy, paste and tweak until it works. I have never declared a flavor, or at least that I know of. I hope that doesn't sound to incredibly stupid. – George Teichmann Jun 29 '15 at 22:46
  • @George please accept an answer when applicable – MaxZoom Jul 03 '15 at 12:53

4 Answers4

0

Text

http://some-website.s3.amazonaws.com/lovecraft-05.epub?AWSAccessKeyId=KJHFHGFDSXF&Expires=3568732&Signature=%3JHF%3KUHF%2Bnuvnu%5LHF%3D

Regex (with Perl):

\.com\/(.*)\?

Output

Match 1:    .com/lovecraft-05.epub?     32      23
Group 1:    lovecraft-05.epub       37      17
SerCrAsH
  • 440
  • 5
  • 14
0

This regex selects substring after string amazonaws.com and before ? character:

amazonaws.com\/([^\?]+)

When coding you need to find group(1) match.
See DEMO for explanation.

MaxZoom
  • 7,619
  • 5
  • 28
  • 44
0

You can use /\/([^\/?]+)\?/:

The perl one-liner

echo "http://some-website.s3.amazonaws.com/lovecraft-05.epub?AWS?AccessKeyId=KJHFHGFDSXF&Expires=3568732&Signature=%3JHF%3KUHF%2Bnuvnu%5LHF%3D" \
| perl -ne 'print $1 if m=/([^/?]+)\?='

returns lovecraft-05.epub0.

sebnukem
  • 8,143
  • 6
  • 38
  • 48
0

I see two ways to do that:

function get_filename_from_url($url) {
    return ltrim(strrchr(parse_url($url, PHP_URL_PATH), '/'), '/');
}

or with preg_match:

function get_filename_from_url($url) {
    return preg_match('~(?<!:/)/\K[^/]*?(?=[?#]|$)~', $url, $m) ? $m[0] : '';
}

where the pattern means:

~           # pattern delimiter
(?<!:/)     # not preceded by :/
/           # literal slash
\K          # discard character(s) on the left from the match result
[^/]*?      # zero or more characters that are not a slash
(?=[?#]|$)  # followed by a ? or a # or the end of the string
~

Note that I have choosen to return the empty string by default when the url isn't well formatted, obviously you can choose a different behaviour.

In the regex way, testing # or the end of the string in addition of the question mark is needed since the query part of an url may be optional. If the query part is not here, the filename can be followed by the fragment part or the end of the string.

Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125