How to I preg_match_all starts with "http" and ends with (") or (') or white space(tabs, space, line break)

Question

How do I write in regex that preg_match_all starts with "http"(without quotes) and ends with (") or (') or white space(tabs, space, line break)

I want to preg_match_all all the parts just starting with "http"

Wupload
http://www.wupload.com/file/CCCCCCC/NNIW-LiBRARY.part1.rar
http://www.wupload.com/file/VVVVVVVV/NNIW-LiBRARY.part2.rar
http://www.wupload.com/file/TTTTTTT/NNIW-LiBRARY.part3.rar

Fileserve
http://www.fileserve.com/file/WWWW/NNIW-LiBRARY.part1.rar
http://www.fileserve.com/file/TTTTT/NNIW-LiBRARY.part2.rar
http://www.fileserve.com/file/RRRRR/NNIW-LiBRARY.part3.rar

Uploaded.To
http://ul.to/AAAA/NNIW-LiBRARY.part1.rar
http://ul.to/BBBBB/NNIW-LiBRARY.part2.rar
http://ul.to/YYYYYY/NNIW-LiBRARY.part3.rar

Results must be like this
http://www.wupload.com/file/CCCCCCC/NNIW-LiBRARY.part1.rar
http://www.wupload.com/file/VVVVVVVV/NNIW-LiBRARY.part2.rar
http://www.wupload.com/file/TTTTTTT/NNIW-LiBRARY.part3.rar
http://www.fileserve.com/file/WWWW/NNIW-LiBRARY.part1.rar
http://www.fileserve.com/file/TTTTT/NNIW-LiBRARY.part2.rar
http://www.fileserve.com/file/RRRRR/NNIW-LiBRARY.part3.rar
http://ul.to/AAAA/NNIW-LiBRARY.part1.rar
http://ul.to/BBBBB/NNIW-LiBRARY.part2.rar
http://ul.to/YYYYYY/NNIW-LiBRARY.part3.rar

possible duplicate of [RegEx match open tags except XHTML self-contained tags](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) — Wooble, Jul 11 '11 at 12:40
I have tried $result = preg_split('/\bhttp:\/\/[\d.a-z-]+[]\d!"#$%&\'()*+,.\/:;<=>?@[\\\\_`a-z{|}~^-]*/i', $subject); — rasputin, Jul 11 '11 at 12:44

Hamid · Accepted Answer · 2011-07-11T17:35:34.307

2

i suggest you use parse_url to fetch parts of urls! Take a look at php.net

EDIT :

$file = file_get_contents( YOUR FILE NAME );
$lines = explode("\r\n", $file);
foreach( $lines as $line ){
$urlParts = parse_url( $line );
if( $urlParts['scheme'] == 'http' ){
 // Do anything ...
}
}

CHANGE :

oOk, i don't know what's your code!if you want to scrape html to find links i suggest this to you, it return href values of a tag to you :

preg_match_all ( "/<[ ]{0,}a[ \n\r][^<>]{0,}(?<= |\n|\r)(?:href)[ \n\r]{0,}=[ \n\r]{0,}[\"|']{0,1}([^\"'>< ]{0,})[^<>]{0,}>((?:(?!<[ \n\r]*\/a[ \n\r]*>).)*)<[ \n\r]*\/a[ \n\r]*>/ is", $source, $regs );

for ( $x = 0; $x < count ( $regs [ 1 ] ); $x ++ ) {
$tmp_array [ "link_raw" ] = trim ( $regs [ 1 ] [ $x ] );
}

Then use parse_url to check thoes

edited Jul 11 '11 at 17:35

answered Jul 11 '11 at 13:38

Hamid

1,700
2
13
19

parse_url is not the thing I want. It can be used after extracting the url but I want to extract all urls in a given text. – rasputin Jul 11 '11 at 16:38
1

it's not problem! take a look at my answer! – Hamid Jul 11 '11 at 16:43
thanks for reply but there are no lines because the source is html code. So I have to write it as preg_match_all starts with (http) and ends with (") or (') or white space(tabs, space, line break) Do you have a suggestion for that? – rasputin Jul 11 '11 at 17:26
the problem is that I can't use just "href" because some links are given in normal text. So I have to find strings starting with "http" – rasputin Jul 11 '11 at 17:55

score 0 · Answer 2 · answered Jul 11 '11 at 12:55

0

Do you mean you would like to remove the "Wupload", "Fileserve" and "Uploaded.To" titles and capture just the URLs in an array? If so, try the following:

preg_match_all('!^http://.*\n!m', $string, $matches);
echo "<pre>" . print_r($matches, 1) . "</pre>";

answered Jul 11 '11 at 12:55

pb149

2,298
1
22
30

Actually, this is a html source code but I can't put it right to the post. By the way your code isn't working. You can check it from http://myregextester.com/index.php – rasputin Jul 11 '11 at 13:48
@rasputin: Why can't you put it "right to the post"? If you don't provide your actual input, then we can't give you the actual solution. – Lightness Races in Orbit Jul 11 '11 at 17:01
because of writing format of stackoverflow. I tried all the opportunities in the post menu(blockquoute, pre, code) but nothing helped me... – rasputin Jul 11 '11 at 17:58

EdoDodo · Answer 3 · 2011-07-12T21:31:32.930

0

This should do what you need:

<?php
$matches = array();
preg_match_all('@https?://([-\w\.]+)+(:\d+)?(/([-\w/_\.]*(\?\S+)?)?)?@', $string, $matches);
foreach ($matches[0] as $match) {
    // Do your processing here.
}
?>

edited Jul 12 '11 at 21:31

answered Jul 11 '11 at 12:57

EdoDodo

8,220
3
24
30

gives me error - Unknown modifier '/' Can you correct it, please? – rasputin Jul 11 '11 at 18:10

How to I preg_match_all starts with "http" and ends with (") or (') or white space(tabs, space, line break)

3 Answers3