Regular Expression to find mp3 URLs without a specific word

Question

I'd like to extract mp3 urls from a page source that does not have a specific word in them.

Here is the regular expression that I am using to search for mp3 urls:

https?:\/\/.+\.mp3

It works okay. Now I want to exclude those urls that have a specific word in them. So, I need urls that do not have a specific word in them.

How can I exclude a word between http and .mp3?

I will use it in Qt with C++, but as long as it works with https://regex101.com/ it is fine.

Possible duplicate of [Regular expressions: Ensuring b doesn't come between a and c](https://stackoverflow.com/questions/37240408/regular-expressions-ensuring-b-doesnt-come-between-a-and-c) — CertainPerformance, Jan 26 '19 at 03:37
@CertainPerformance - No, that is different. If you read the description, it says `contains 123 somewhere in the middle`. However, I want the expression NOT to contain a word. — NESHOM, Jan 27 '19 at 02:56
It's exactly the same - see the last part of the question, `and there are no other instances of abc or xyz in the substring besides the start and the end.` - just like the top answer prevents `abc` from occurring in the middle of the match, you just need to apply the same logic to your pattern. — CertainPerformance, Jan 27 '19 at 03:48

Nick · Accepted Answer · 2019-02-01T03:22:01.833

If you want to "exclude those urls that do not have a specific word in them", you can use a positive lookahead for the word (with some number of characters before it) e.g.

(?=.*Sing)

In Javascript:

const word = 'Sing';
const urls = ['http://I_like_to_sing.mp3', 'http://Another_song.mp3'];
let regex = new RegExp('https?:\/\/(?=.*' + word + ').+\.mp3', 'i');
console.log(urls.filter(v => v.match(regex)));

In PHP

$word = 'Sing';
$urls = ['http://I_like_to_sing.mp3', 'http://Another_song.mp3'];
$regex = "/https?:\/\/(?=.*$word).+\.mp3/i";
print_r(array_filter($urls, function ($v) use ($regex) { return preg_match($regex, $v); }));

Output:

Array ( 
    [0] => http://I_like_to_sing.mp3 
)

Demo on 3v4l.org

Update

To exclude those URLs that do have a specific word in them, you can use a negative lookahead instead e.g.

(?![^.]*Sing)

We use [^.] to ensure the word occurs before the .mp3 part. Here's a PHP demo:

$word = 'Song';
$string = "some words http://I_like_to_sing.mp3 and then some other words http://Another_song.mp3 and some words at the end...";
$regex = "/(https?:\/\/(?![^.]*$word).+?\.mp3)/i";
preg_match_all($regex, $string, $matches);
print_r($matches[1]);

Output:

Array ( 
    [0] => http://I_like_to_sing.mp3
)

Demo on 3v4l.org

@NESHOM you shouldn't mark this accepted, it doesn't answer your actual question. I had been meaning to revisit the question though and I've made an edit which I think will solve your problem. — Nick, Feb 01 '19 at 02:44
You ate right. It did help a bit, but not answered directly. So, please post your updated answer. Thanks. — NESHOM, Feb 01 '19 at 03:21

score 0 · Answer 2 · answered Jan 26 '19 at 03:54

0

I hope this can be a useful answer.

This a regular expression with use case on python3. So if you want to exclude a "word" between http & .mp3 you can do this.

import re

ref = "http://www.some_undesired_text_018/m102/1-225x338.mp3"

_del = re.findall(r'https?(.+)\.mp3', ref)[0]

out = ref.replace(_del, "")

#_del will contain the undesired word

answered Jan 26 '19 at 03:54

Franco Gil

323
3
11

I am not using python. – NESHOM Jan 27 '19 at 02:57

score 0 · Answer 3 · edited Oct 28 '21 at 09:42

A minor edit to Nick's answer. You can exclude the word by negating the value returned from the match in the filter function like so:

urls.filter(v => !v.match(regex));

This works and is much easier than the other one solution further down, which gives an unexpected result.

const word = 'Sing';
const urls = ['http://I_like_to_sing.mp3', 'http://Another_song.mp3'];
let regex = new RegExp('https?:\/\/(?=.*' + word + ').+\.mp3', 'i');
console.log(urls.filter(v => !v.match(regex)));

Regular Expression to find mp3 URLs without a specific word

3 Answers3