PHP - Extract data from string with regex

Question

I need help to do this operation. I Have a string like this:

<!doctype html> <html> <head> <meta charset="utf-8"> <title>Formatting the report</title><meta http-equiv="refresh" content="5;url=/file/xslt/download/?fileName=somename.pdf"> </head>

I need to extract the fileName parameter. How to do this?

I thing that is possible with regex, but I do not know well this.

Thanks!

Mandatory link to read (twice): http://stackoverflow.com/a/1732454/393701 — SirDarius, Feb 13 '14 at 10:36
@SirDarius Did you read it (twice)? And did you read the question? Do you think he want's to write a html parser or has a clearly definable problem which can easily be solved by using a quick regex? It's fatiguing and annoying reading this thrown in piece over and over again where it is absolutely unfitting. — Jonny 5, Feb 13 '14 at 11:11
@Jonny5 This link has an obvious value, if only for its humoristic stance. The problem I have with this specific question lies within its title. Extract data from string **with regex**. The question can be solved with a regular expression, but there is a clear assumption that it is the best way to do so, so no other solution should be even considered. The input string here is HTML, so it is probably better to properly locate the `content` attribute first, and then use a regexp on the attribute value only. — SirDarius, Feb 13 '14 at 11:30

user3064914 · Accepted Answer · 2014-02-13T11:53:03.493

1

Try this..

This will capture the filename

The Pattern is given below

/fileName=(.+?)\"/

<?php
$subject = "<!doctype html> <html> <head> <meta charset="utf-8"> <title>Formatting the report</title><meta http-equiv="refresh" content="5;url=/file/xslt/download/?fileName=somename.pdf"> </head>";
$pattern = '/fileName=(.+)"/';
preg_match($pattern, $subject, $matches, PREG_OFFSET_CAPTURE, 2);
print_r($matches);
?>

$1->Contains the file name

demo

edited Feb 13 '14 at 11:53

answered Feb 13 '14 at 10:29

user3064914

921
1
7
18

This work, but in the output there is the end part of tag (">), how I can remove this? This is the output: somename.pdf"> – carlo9987 Feb 13 '14 at 10:58
$1 will have somename.pdf see the demo. – user3064914 Feb 13 '14 at 11:02
With $1 I have that problem: filename and this "> – carlo9987 Feb 13 '14 at 11:08
1

I extract the filename without extension (extension is not necessary for me) with this: '/fileName=(.+).pdf/'. Thank you very much! – carlo9987 Feb 13 '14 at 11:31
By default [quantifiers](http://www.regular-expressions.info/repeat.html) are [greedy](http://www.rexegg.com/regex-greed.html), to make them ungreedy (lazy), add a `?` after the quantifier e.g. `(.*?)` or `(.+?)` to eat up as few as possible to meet `"`. Instead could use the [U (PCRE_UNGREEDY)](http://php.net/manual/en/reference.pcre.pattern.modifiers.php) [modifier](http://www.regular-expressions.info/modifiers.html). – Jonny 5 Feb 13 '14 at 11:48

score 0 · Answer 2 · answered Feb 13 '14 at 10:22

Try something along the lines of:

$str = '<!doctype html> <html> <head> <meta charset="utf-8"> <title>Formatting the report</title><meta http-equiv="refresh" content="5;url=/file/xslt/download/?fileName=somename.pdf"> </head>';

preg_match('@fileName=(.*)"@', $str, $matches);

print_r($matches);

score 0 · Answer 3 · answered Feb 13 '14 at 10:25

0

php simple html dom is clean and good way for trace html and find html elements by selector's like Jquery selectors.

answered Feb 13 '14 at 10:25

Mahmoud.Eskandari

1,460
3
20
32

PHP - Extract data from string with regex

3 Answers3