0

I need to get the paths from something like this:

<object>
<p>https://bla-bla-bla/thing.flv</p>
</object>

<p>level/thing.mp3</p>
<ul>
<li>https://thing/otherthing/thing.srt<li></ul>

On the other hand, the files can be anywhere inside the html file.

I tried some possibilities, but without success.

Any clue?

Thanks a lot!

I need to get some file names with the proper address and put them into an array:

myArray[0]='https://bla-bla-bla/othername.flv'
myArray[1]='/level/name.mp3'
myArray[2]='https://text/othertext/name.srt'

..and so on

I'm very close to solve it using regexp, I did:

var str = document.getElementById("content").innerHTML;

var res = str.match(/=http.*?.flv/gi);

In this case, I get the excerpt, but I get the whole thing around it. eg.

I need this:

'https://this/otherthing/thing.srt'

But I getting this

'more https stuff from other url ...https://this/otherthing/thing.srt even more text...'

uniques url's, not a giant string with the first http ending with the first .srt. I need a valid path.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Webill
  • 365
  • 3
  • 12
  • Try searching for regular expressions. If the different types are known, shouldn't be too hard at all. – Siddharth Aug 08 '14 at 21:30
  • What? Be more clear of what you want to do. What are you looking for in the html file exactly, what have you already tried and why those didn't work – Inox Aug 08 '14 at 21:32
  • I just added more details. – Webill Aug 11 '14 at 13:50
  • I don't understand, so are the links available in text or are you trying to discover links? – Daniel Cheung Aug 11 '14 at 14:18
  • Actually I need to get the files'names such as othername.flv, name.mp3 and name.srt, but these ones could be locally or in a url. So I need a generic approach in order to do it. – Webill Aug 11 '14 at 14:28
  • All the url or local file names are in the html file. – Webill Aug 11 '14 at 14:45
  • Are they randomly scattered or? How are they placed? Text? Variables? Please explain more. – Daniel Cheung Aug 11 '14 at 14:47
  • They are not ramdomly inserted, they are inside – Webill Aug 11 '14 at 15:27
  • Trying to do something like this: var res = str.match(/(?<=http).flv/gi); I know there is something wrong... but what? – Webill Aug 11 '14 at 18:36

1 Answers1

1

Since .* grabs as many matching characters as it can, you need to be more specific about what can and can't be in the middle.

Try:

var res = str.match(/https?:\/\/\S+\.flv/gi);

where \S grabs as many non-whitespace characters as it can.

To exclude certain characters, use [^...]:

var res = str.match(/https?:\/\/[^\s;]+\.flv/gi);

Alternatively, just make your .* lazy instead of greedy with a well-placed ?:

var res = str.match(/http.*?\.flv/gi);
Community
  • 1
  • 1
Blazemonger
  • 90,923
  • 26
  • 142
  • 180
  • OK. Very nice. I don't want ; (semicolon) in the middle. – Webill Aug 11 '14 at 19:56
  • WOW! Finally. After hours! Man I thank ou a lot for that. I'm afraid I can't give you a positive score because I'm still not allowed. But you rocked today. tks again. – Webill Aug 11 '14 at 20:10
  • Don't forget to [accept your favorite answers to your questions](http://meta.stackexchange.com/questions/5234/how-does-accepting-an-answer-work), even if you can't vote on them. – Blazemonger Aug 11 '14 at 20:29