0

I use php and get the following text string from a textbox.

This is a string I have:

header1            header2             edit
b-1246431          12.01.13            1246431  |  blog.domain.net            1232,00 ‌‌
details
b-1312231          12.01.13            1246431  |  blog.domain.co.uk          12312,00
b-2344311          12.01.13            1246431  |  www.domain.com/             9129,00 ‌‌
b-2344322          12.01.13            1246431  |  http://abc.de              1332,00 ‌‌
b-2344322          13.01.13            1246431  |  www.cdf.de/                 21140,00             ‌‌edit
b-1233422          06.01.13            1246431  |  www.dto.de/site1      21110,00
b-1233542          06.01.13            1246431  |  www.ghj.ca/site2.html      28110,00             ‌‌             edit
b-1231242          06.01.13            1246431  |  www.another.de            2101,00             ‌‌
b-1231231          04.01.13            1246431  |  onlyme.info/  

I want this output:

blog.domain.net
blog.domain.co.uk
www.domain.com/
http://abc.de
www.cdf.de/
www.dto.de/site1
www.ghj.ca/site2.html
www.another.de
onlyme.info/  

The string will change. I always need only the urls extracted. The problem might be: sometimes urls start with www, http, or dont even have both. Still they should be seen as urls.

I already looked up these posts: extracting one or more urls from a string in php http://daringfireball.net/2010/07/improved_regex_for_matching_urls

... but nothing worked for my textstring...

Community
  • 1
  • 1
ItsMeDom
  • 540
  • 3
  • 5
  • 18
  • Looks organized enough. Why not explode twice with `| ` and a space? – Dave Chen Feb 20 '14 at 04:44
  • Every string will be different. The next string might not have '|' ... – ItsMeDom Feb 20 '14 at 04:49
  • When you don't have the `|`, will it be replaced by any other separator? If yes, you can split the string by space, and then retrieve the 5th column from the result as the URL will be in the 5th place. Also, will the URL's always be in string format and never IP? That helps, too, if all the contents before the URLs are digits. – Sutandiono Feb 20 '14 at 04:59

2 Answers2

3

Try it with a regular expression:

<?php
$input = "header1            header2             edit
b-1246431          12.01.13            1246431  |  blog.domain.net            1232,00 ‌‌
details
b-1312231          12.01.13            1246431  |  blog.domain.co.uk          12312,00
b-2344311          12.01.13            1246431  |  www.domain.com/             9129,00 ‌‌
b-2344322          12.01.13            1246431  |  http://abc.de              1332,00 ‌‌
b-2344322          13.01.13            1246431  |  www.cdf.de/                 21140,00             ‌‌edit
b-1233422          06.01.13            1246431  |  www.dto.de/site1      21110,00
b-1233542          06.01.13            1246431  |  www.ghj.ca/site2.html      28110,00             ‌‌             edit
b-1231242          06.01.13            1246431  |  www.another.de            2101,00             ‌‌
b-1231231          04.01.13            1246431  |  onlyme.info/";

preg_match_all('#[-a-zA-Z0-9@:%_\+.~\#?&//=]{2,256}\.[a-z]{2,4}\b(\/[-a-zA-Z0-9@:%_\+.~\#?&//=]*)?#si', $input, $result);

foreach ($result[0] as $url)
{
    echo $url . "<br />\n";
}

Or see my PHPFiddle here: PHPFiddle

Ruben
  • 5,043
  • 2
  • 25
  • 49
0

try this

$lines = explode("\n", $s);
foreach ($lines as $line) {
    if (strpos($line, "|") !== false) {
        $url = trim(explode(" ", trim(explode('|', $line)[1]))[0]);
        echo $url."<BR>";
    }
}

Works on php 5.4+

Maximus
  • 159
  • 6