-1

I am looking to try an determine a regular expression to parse out the interior url from this google alert redirect

http://www.google.com/url?sa=X&q=http://weheartit.com/entry/29409069&ct=ga&cad=CAcQARgAIAEoATAAOABAo5aK_gRIAlgBYgVlbi1VUw&cd=jRWL16jvo8k&usg=AFQjCNGbJMqWtbCxpcJdu4PGD6RToU6NTw

What I need to do is remove the first part that is

http://www.google.com/url?sa=X&q=

And I would also remove the trailing string which is

&ct=ga&cad=CAcQARgAIAEoATAAOABAo5aK_gRIAlgBYgVlbi1VUw&cd=jRWL16jvo8k&usg=AFQjCNGbJMqWtbCxpcJdu4PGD6RToU6NTw

So the ending url I would end with is

http://weheartit.com/entry/29409069

I just wanted to update this thanks for the help. This was an issue in the link module for drupal and it has been fixed

naeluh
  • 951
  • 11
  • 17
  • We have some duplicates on this. The regex is super simple. And the alternative with `parse_url` and `parse_str` even more so. – mario May 31 '12 at 01:36
  • Well it might super simple to you but i am having alot of trouble anything to work the best I could come up with is field_web_screenhot['und'][0]['url']; $url = preg_replace('.*?(http)(.)(.)((?:\\/[\\w\\.\\-]+)+)', '', $url); $node->field_web_screenhot['und'][0]['url'] =$url; return ; } ?> – naeluh May 31 '12 at 01:36
  • possible duplicate of [Extract URL from string](http://stackoverflow.com/questions/4390556/extract-url-from-string) – mario May 31 '12 at 01:38
  • possible duplicate of [Extract URL parameters with regex - repeating a capture group](http://stackoverflow.com/questions/7762626/extract-url-parameters-with-regex-repeating-a-capture-group) – mario May 31 '12 at 01:39
  • Use the [edit link](http://stackoverflow.com/posts/10826259/edit), not comments, for additions to your question. – mario May 31 '12 at 01:40
  • not a duplicate exactly because there are 2 urls with the redirect so just detecting the url is not going to help me I need to remove a url and string. Thanks for those other links I appreciate but I am still having trouble – naeluh May 31 '12 at 01:40

2 Answers2

1

Still unclear what you are trying to accomplish, whether it's extracting or removal of the surrounding parts, it's not really difficult:

preg_match('#q=(http://[^&]+)#', $source, $result);
print $result[1];

Or otherwise:

= preg_replace('#^.+q=([^&]+).+$#', '$1', $source);

Would work.

And again, the alternative lies in parse_url and parse_str

Community
  • 1
  • 1
mario
  • 144,265
  • 20
  • 237
  • 291
  • I have drupal 7 install that I have a Link module field that is populated by rss feed from google alerts the url that is saved in the field is the google redirect url because thats what google returns from the feed I have another module that formats the url into a url for a screenshot server that does not support redirects properly so I need to get the main url and not url with redirect for the screenshot server to work properly sorry I should have said in the first place thanks for you help Mario does make more sense at all – naeluh May 31 '12 at 01:54
  • What I am doing is making a module that works with the field I have to change and save the new url – naeluh May 31 '12 at 01:57
1

If you really want to strip the URL to pieces manually, you can...

$ cat parseurl.php 
#!/usr/local/bin/php
<?php

$url="http://www.google.com/url?sa=X&q=http://weheartit.com/entry/29409069&ct=ga&cad=CAcQARgAIAEoATAAOABAo5aK_gRIAlgBYgVlbi1VUw&cd=jRWL16jvo8k&usg=AFQjCNGbJMqWtbCxpcJdu4PGD6RToU6NTw";

# Parts of this section could be replaced with parse_url()
$junk = explode("?", $url);
$parts = explode("&", $junk[1]);
$gvar = array();
foreach ($parts as $thisone) {
  $junk = explode("=", $thisone);
  $gvar[$junk[0]]=$junk[1];
}

print_r($gvar);

printf("Embedded URL: %s\n", $gvar["q"]);

$ ./parseurl.php 
Array
(
    [sa] => X
    [q] => http://weheartit.com/entry/29409069
    [ct] => ga
    [cad] => CAcQARgAIAEoATAAOABAo5aK_gRIAlgBYgVlbi1VUw
    [cd] => jRWL16jvo8k
    [usg] => AFQjCNGbJMqWtbCxpcJdu4PGD6RToU6NTw
)
Embedded URL: http://weheartit.com/entry/29409069
$ 

To do this with parse_url() and parse_str, you might use something like:

<?php

$url="http://www.google.com/url?sa=X&q=http://weheartit.com/entry/29409069&ct=ga&cad=CAcQARgAIAEoATAAOABAo5aK_gRIAlgBYgVlbi1VUw&cd=jRWL16jvo8k&usg=AFQjCNGbJMqWtbCxpcJdu4PGD6RToU6NTw";

parse_str( parse_url($url, PHP_URL_QUERY), $gvar );
printf("Embedded URL: %s\n", $gvar['q']);

This definitely seems like the easier way to go, but I'll leave the first version so you can see what is (likely) happening "under the hood". :-)

Graham
  • 1,631
  • 14
  • 23
  • Don't forget that parse_str() takes a second argument, an array into which parsed variables will be inserted. This eliminates your down side. – ghoti May 31 '12 at 02:34
  • DOH! Right you are, ghoti. I didn't see that before. Thanks, I'll update my answer. – Graham May 31 '12 at 02:34
  • Wow thanks very much for your help I am formulating how to use this with drupal right now. – naeluh May 31 '12 at 02:43
  • What I thought might work is not -- field_web_screenhot['und'][0]['url']; $url = parse_str( parse_url($url, PHP_URL_QUERY), $gvar ); $node->field_web_screenhot['und'][0]['url'] =$url; return; } – naeluh May 31 '12 at 02:45