0

I have a string with some text and some URLs in it. My goal is to remove the following from the string:

$removeThis = array('http://', 'https://', 'www.', '.com', '.net');

BUT ONLY IF the word to be removed doesn't start with: http://good.com, http://www.good.com, https://good.com, or https://www.good.com.

In other words, I want to remove http|s|www.|.com|.net parts from the string (but only if they don't belong to good.com domain).


INPUT:

$string='Hello world, this is spamming: www.spam.com, spam.net, https://spam.com, https://spam.com/tester. And this is not spam so do not touch it: http://www.good.com/okay, http://good.com, and also https://good.com/well';

RESULT SHOULD BE:

Hello world, this is spamming: spam, spam, spam, spam/tester. And this is not spam so do not touch it: http://www.good.com/okay, http://good.com, and also https://good.com/well

I think preg_replace is needed here..

NonCoder
  • 235
  • 4
  • 10
  • Yes, you could use preg_replace here. Where have you gotten hung up with it? – chris85 Apr 27 '15 at 01:48
  • I don't understand how preg_replace works. I tried: str_replace (array('http','https','www.','',$string), but it removes everything. But I want to leave urls that include good.com domain.. – NonCoder Apr 27 '15 at 01:50
  • Yes your str_replace would remove everything. You already know that you need to use preg_replace, so why not just research it, instead of wasting someone else's time to give you the answer? If you run into certain issues with preg_replace, THEN ask a question relating to it, with the code you have tried thus far. The beauty of php is that *most* your questions can be answered by doing some reading and googling. –  Apr 27 '15 at 01:54
  • I just cannot understand regex, and I think preg_replace uses regex. Yes, I researched it but it's just to hard to grasp for me.. – NonCoder Apr 27 '15 at 01:55
  • 1
    On a side note, instead of preg_replace, you can call `explode()` to split the string into an tmp array. From there, I would create two new arrays `good` and `bad`, ..while in a foreach loop, separate the good and bad domains into their own arrays (look into `strpos()`). Do your `str_replace` on the bad domain array, and then merge both arrays back together, and `implode()` them back into a string if you like –  Apr 27 '15 at 01:57
  • Yes, preg_replace uses regexs. There are plenty of documents online detailing how they work. – chris85 Apr 27 '15 at 02:00

3 Answers3

1

try below:

  $preg = '/(?:(http|https):\/\/)?(?:www\.)?\w+\.(com|net)/i';

$str = preg_replace_callback($preg, function($matches) {
    $removeThis = array('/http:\/\//i', 'https://', 'www.', '.com', '.net');
    if (preg_match('/(http|https):\/\/(www\.)?good\.(com|net)/i', $matches[0])) return $matches[0];
    return preg_replace('/((http|https):\/\/|www\.|\.com|\.net)/i', '', $matches[0]);
}, $string);
Bo Chen
  • 436
  • 2
  • 5
  • change '/\bgood\b\.(com|net)/' use '/(http|https):\/\/(www\.)?good\.(com|net)/' It is more accurate to find good url and not to replace it – Bo Chen Apr 27 '15 at 03:10
  • and change $preg use $preg = '/(?:(http|https):\/\/)?(?:www\.)?\w+\.(com|net)/'; i hope it can help you! – Bo Chen Apr 27 '15 at 03:12
  • i'm sorry! I did not pay attention to see – Bo Chen Apr 27 '15 at 03:25
  • One more - what if user enters: WWW.TEST.coM (ie. capital letters - it seems the current code doesn't solve it, maybe you could update it again so that it works with both small and big caps..) – NonCoder Apr 27 '15 at 03:48
0

This might help you:

$url = "www.good.net/tooooo.php";
$regex = array('/(https?:..)/','/^www\./','/(\.com.|\.net.|\.co.)+([^\s]+)/');
$url = preg_replace($regex, '', $url);
echo $url;
Caal Saal VI
  • 192
  • 2
  • 12
0

You should use REGEX which are really powerful, here the step to do it pretty easily :

  1. Match all urls using preg_replace_callback
  2. In callback function, detect if it belongs to the whitelisted domain or not (preg_match or strrpos)
  3. Still in callback function : Treat the string in consequence and return it

Regex for urls :

#^(https?|ftp):\/\/(-\.)?([^\s\/?\.#]+\.?)+(\/[^\s]*)?$#