0

My text is:

u0026itag=22\u0026url=http%3A%2F%2Fr5---sn-q4f7dnel.c.youtube.com\u0026sig

And I want to parse u0026url My PHP code:

preg_match('/u0026itag=22\u0026url=(.*?)/', $part, $A);
echo $A[1];

But it dont gives me result

Kieran
  • 2,200
  • 1
  • 15
  • 17
  • Whats with the encoding - do you have the original string? – MDEV Aug 21 '13 at 18:42
  • 1
    What an URL is is actually defined. In your example strings there are for sure no single URL. – hakre Aug 21 '13 at 18:42
  • Orginal string is too long – Ulker Ibrahimova Aug 21 '13 at 18:44
  • It looks like you simply have an encoded query string. Have you tried `urldecode()` and then something like `parse_str()` to get this value? I don't think there is any need for regex here. – Mike Brant Aug 21 '13 at 18:47
  • No i dont decoded url. I just used only parse_str() – Ulker Ibrahimova Aug 21 '13 at 18:48
  • double check if the original string isn't JSON or perhaps a javascript string literal? My guess is that if you run `json_decode` first you might go much steps further. parse_str should then work again, at least why I tried with the fragment looks promising. Also see my answer below I left some more explanations. – hakre Aug 21 '13 at 19:01

2 Answers2

1

First a tip: If you enable errors, PHP tells you about what is going wrong. When I first copied your code verbatim I got the following warning reported:

Warning: preg_match(): Compilation failed: PCRE does not support \L, \l, \N{name}, \U, or \u at offset 14

So this is already a hint that some sequence you put in there verbatim is being parsed as the regex and therefore it requires escaping or quoting. I prefered quoting here, that is wrapping in \Q at the beginning and \E at the end (QuotE):

$result  = preg_match('/^\Qu0026itag=22\u0026url=\E(.*?)/', $subject, $matches);
                         ##                      ##

Which then prevents to see an error and turning $result to 1 which means it did work. However you still don't see any matches in group 1 or better said you see an empty match:

  [0] => string(22) "u0026itag=22\u0026url="
  [1] => string(0) ""

That is because you turned the repetition with star from greedy to lazy. Let's make it possesive instead:

  /^\Qu0026itag=22\u0026url=\E(.*+)/
                                 # plus means posessive

  [1] => string(52) "http%3A%2F%2Fr5---sn-q4f7dnel.c.youtube.com\u0026sig"

Okay this looks better and it contains the "URL".

Which brings me to the point: Even I explained you some for the regex (hopefully this gave some pointers), it's actually probably not the right tool to use first.

What you have here looks like some JSON string that indeed contains something that is URL encoded.

I therefore suggest that you first decode the JSON string with json_decode, then parse the URL with parse_url and all you then need to do is to obtain the url query parameter. That is with parse_str.

Community
  • 1
  • 1
hakre
  • 193,403
  • 52
  • 435
  • 836
0

Have you tried using greedy matching http://regex101.com/r/lT1wK5? Remember to escape the \u also, otherwise you'll get

Compilation failed: PCRE does not support \L, \l, \N{name}, \U, or \u

Example:

$var = "u0026itag=22\u0026url=http%3A%2F%2Fr5---sn-q4f7dnel.c.youtube.com\u0026sig";

preg_match('/u0026itag=22\\\u0026url=(.*)/', $var, $A);
echo $A[1];
Kieran
  • 2,200
  • 1
  • 15
  • 17