11

I have many strings (twitter tweets) from which I would like to remove the links when I echo them .

I have no control over the string and even though all the links start with http, they can end with a "/" or a ";" not, and be followed or not by a space. Also, sometimes there is not space between the link and the word just before it.

One example of such string:

The Third Culture: The Frontline of Global Thinkinghttp://is.gd/qFioda;via @edge

I have try to play around with preg_replace, but couldn't come up with a solution that fit all the exceptions:

<?php echo preg_replace("/\http[^)]+\;/","",$feed->itemTitle); ?>

Any idea how I should proceed?

Edit: I have tried

<?php echo preg_replace('@(https?://([-\w\.]+[-\w])+(:\d+)?(/([\w/_\.#-]*(\?\S+)?[^\.\s])?)‌​?)@', ' ', $feed->itemTitle); ?>

but still no success.

Edit 2: I found this one:

<?php echo preg_replace('^(ht|f)tp(s?)\:\/\/[0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*(:(0-9)*)*(\/?)([a-zA-Z0-9\-‌​\.\?\,\'\/\\\+&amp;%\$#_]*)?$^',' ', $feed->itemTitle); ?>

which remove the link as expected but it also deletes the entire string when there is not space between the link and the word that precedes it.

MagTun
  • 5,619
  • 5
  • 63
  • 104
  • 1
    Related: [What is the best regular expression to check if a string is a valid URL?](http://stackoverflow.com/q/161738/1937994) – gronostaj Jul 05 '14 at 16:38
  • @DavidThomas Sorry: a typo! Thanks Theftprevention! – MagTun Jul 05 '14 at 16:48
  • @gronostaj, Thanks for the link. My knowledge of Php if very limited and I am trying to find my way out of the most upvoted anser. – MagTun Jul 05 '14 at 16:48
  • @Arone you don't need that PHP code, just the regex to match URLs. – gronostaj Jul 05 '14 at 16:50
  • 3
    this is the most common regex i've seen that may fit for you too: `$feed->itemTitle = preg_replace('@(https?://([-\w\.]+[-\w])+(:\d+)?(/([\w/_\.#-]*(\?\S+)?[^\.\s])?)?)@', ' ', $feed->itemTitle);` – Burak Jul 05 '14 at 16:51
  • Are you retrieving the strings from the Twitter API? – theftprevention Jul 05 '14 at 16:57
  • No I use the Rss strings from http://twitrss.me/ and after I use an RSS reader (a joomla module) that I am trying to edit – MagTun Jul 05 '14 at 17:01
  • @Burak! Thanks for this! I have tried your suggestion but I can't make it work. Maybe I am making a basic mistake but I cannot figure it out. Please have a look at the edit in my question to double check my regex. Thanks a lot! – MagTun Jul 05 '14 at 17:07
  • @Arone i don't know why, yours and the one that i wrote are seeming the same but when i try the one with i wrote above, it removes the links, but when i try with yours, it doesn't, try to recopy the inside. – Burak Jul 05 '14 at 17:22
  • @gronostaj. One of the regex on your link does the job but it also removes the entire string when there is not space between the link and the word that precedes it. Please have a look at the edit2 in my question – MagTun Jul 05 '14 at 17:22
  • @Burak! I could make it work by removing `​?)`at the end of your regex. The links get removed but there are replaced by `amp;#xA0;`. I could removed them by adding 9 dots but I don't know if it's the recommended way to do that. – MagTun Jul 05 '14 at 17:34
  • `grep -c http /usr/share/dict/words` is 0 for me, so (at least for English text) starting at ` ?http` shouldn't cause too many false positives. – Mike Samuel Jul 05 '14 at 18:22

3 Answers3

22

If you want to remove everything, link and after the link, like via thing in your example, the below may help you:

$string = "The Third Culture: The Frontline of Global Thinkinghttp://is.gd/qFioda;via @edge";
$regex = "@(https?://([-\w\.]+[-\w])+(:\d+)?(/([\w/_\.#-]*(\?\S+)?[^\.\s])?).*$)@";
echo preg_replace($regex, ' ', $string);

If you want to keep them:

$string = "The Third Culture: The Frontline of Global Thinkinghttp://is.gd/qFioda;via @edge";
$regex = "@(https?://([-\w\.]+[-\w])+(:\d+)?(/([\w/_\.#-]*(\?\S+)?[^\.\s])?)?)@";
echo preg_replace($regex, ' ', $string);
Burak
  • 5,252
  • 3
  • 22
  • 30
3

I would do something like this:

$input = "The Third Culture: The Frontline of Global Thinkinghttp://is.gd/qFioda;via @edge";
$replace = '"(https?://.*)(?=;)"';

$output = preg_replace($replace, '', $input);
print_r($output);

It works for multiple occurances too:

$output = preg_replace($replace, '', $input."\n".$input);
print_r($output);
jamb
  • 146
  • 3
  • thanks @jamb for your answer, however, sometimes the link doesn't end with ";" so I need to find a more global regex. – MagTun Jul 05 '14 at 18:17
0

If your URL begins simply with www and no protocol, modify it like this to filter it:

$string = preg_replace('/\b((https?|ftp|file):\/\/|www\.)[-A-Z0-9+&@#\/%?=~_|$!:,.;]*[A-Z0-9+&@#\/%=~_|$]/i', ' ', $string);

Credits: https://gist.github.com/madeinnordeste/e071857148084da94891

Avatar
  • 14,622
  • 9
  • 119
  • 198