-5

How do I write in regex that preg_match_all starts with "http"(without quotes) and ends with (") or (') or white space(tabs, space, line break)

I want to preg_match_all all the parts just starting with "http"

Wupload
http://www.wupload.com/file/CCCCCCC/NNIW-LiBRARY.part1.rar
http://www.wupload.com/file/VVVVVVVV/NNIW-LiBRARY.part2.rar
http://www.wupload.com/file/TTTTTTT/NNIW-LiBRARY.part3.rar

Fileserve
http://www.fileserve.com/file/WWWW/NNIW-LiBRARY.part1.rar
http://www.fileserve.com/file/TTTTT/NNIW-LiBRARY.part2.rar
http://www.fileserve.com/file/RRRRR/NNIW-LiBRARY.part3.rar

Uploaded.To
http://ul.to/AAAA/NNIW-LiBRARY.part1.rar
http://ul.to/BBBBB/NNIW-LiBRARY.part2.rar
http://ul.to/YYYYYY/NNIW-LiBRARY.part3.rar

Results must be like this
http://www.wupload.com/file/CCCCCCC/NNIW-LiBRARY.part1.rar
http://www.wupload.com/file/VVVVVVVV/NNIW-LiBRARY.part2.rar
http://www.wupload.com/file/TTTTTTT/NNIW-LiBRARY.part3.rar
http://www.fileserve.com/file/WWWW/NNIW-LiBRARY.part1.rar
http://www.fileserve.com/file/TTTTT/NNIW-LiBRARY.part2.rar
http://www.fileserve.com/file/RRRRR/NNIW-LiBRARY.part3.rar
http://ul.to/AAAA/NNIW-LiBRARY.part1.rar
http://ul.to/BBBBB/NNIW-LiBRARY.part2.rar
http://ul.to/YYYYYY/NNIW-LiBRARY.part3.rar

rasputin
  • 380
  • 5
  • 22
  • 1
    possible duplicate of [RegEx match open tags except XHTML self-contained tags](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – Wooble Jul 11 '11 at 12:40
  • I have tried $result = preg_split('/\bhttp:\/\/[\d.a-z-]+[]\d!"#$%&\'()*+,.\/:;<=>?@[\\\\_`a-z{|}~^-]*/i', $subject); – rasputin Jul 11 '11 at 12:44

3 Answers3

2

i suggest you use parse_url to fetch parts of urls! Take a look at php.net

EDIT :

$file = file_get_contents( YOUR FILE NAME );
$lines = explode("\r\n", $file);
foreach( $lines as $line ){
$urlParts = parse_url( $line );
if( $urlParts['scheme'] == 'http' ){
 // Do anything ...
}
}

CHANGE :

oOk, i don't know what's your code!if you want to scrape html to find links i suggest this to you, it return href values of a tag to you :

preg_match_all ( "/<[ ]{0,}a[ \n\r][^<>]{0,}(?<= |\n|\r)(?:href)[ \n\r]{0,}=[ \n\r]{0,}[\"|']{0,1}([^\"'>< ]{0,})[^<>]{0,}>((?:(?!<[ \n\r]*\/a[ \n\r]*>).)*)<[ \n\r]*\/a[ \n\r]*>/ is", $source, $regs );

for ( $x = 0; $x < count ( $regs [ 1 ] ); $x ++ ) {
$tmp_array [ "link_raw" ] = trim ( $regs [ 1 ] [ $x ] );
}

Then use parse_url to check thoes

Hamid
  • 1,700
  • 2
  • 13
  • 19
  • parse_url is not the thing I want. It can be used after extracting the url but I want to extract all urls in a given text. – rasputin Jul 11 '11 at 16:38
  • 1
    it's not problem! take a look at my answer! – Hamid Jul 11 '11 at 16:43
  • thanks for reply but there are no lines because the source is html code. So I have to write it as preg_match_all starts with (http) and ends with (") or (') or white space(tabs, space, line break) Do you have a suggestion for that? – rasputin Jul 11 '11 at 17:26
  • the problem is that I can't use just "href" because some links are given in normal text. So I have to find strings starting with "http" – rasputin Jul 11 '11 at 17:55
0

Do you mean you would like to remove the "Wupload", "Fileserve" and "Uploaded.To" titles and capture just the URLs in an array? If so, try the following:

preg_match_all('!^http://.*\n!m', $string, $matches);
echo "<pre>" . print_r($matches, 1) . "</pre>";
pb149
  • 2,298
  • 1
  • 22
  • 30
  • Actually, this is a html source code but I can't put it right to the post. By the way your code isn't working. You can check it from http://myregextester.com/index.php – rasputin Jul 11 '11 at 13:48
  • @rasputin: Why can't you put it "right to the post"? If you don't provide your actual input, then we can't give you the actual solution. – Lightness Races in Orbit Jul 11 '11 at 17:01
  • because of writing format of stackoverflow. I tried all the opportunities in the post menu(blockquoute, pre, code) but nothing helped me... – rasputin Jul 11 '11 at 17:58
0

This should do what you need:

<?php
$matches = array();
preg_match_all('@https?://([-\w\.]+)+(:\d+)?(/([-\w/_\.]*(\?\S+)?)?)?@', $string, $matches);
foreach ($matches[0] as $match) {
    // Do your processing here.
}
?>
EdoDodo
  • 8,220
  • 3
  • 24
  • 30