0

How do I grab specific links in a document using regex? I have a html file that contains google drive links mixed in with a bunch of html code and other stuff. I am trying to grab the 50 links from the text by finding them all using RegEx to search for keywords they have in common which is drive, google, & sharing

Example:"https://drive.google.com/file/d/1wXbzf0nvddZ0vlz6-fdN7HV/view?usp=sharing"

I want to select the beginning and the end of the links and then be able to copy them all, paste them into another file or erase the other content and just keep those links inside the html document.

I have tried

http\:\/\/www\.[a-zA-Z0-9\.\/\-]+ & `.*?(http\:\/\/www\.[a-zA-Z0-9\.\/\-]+)`

I tried drive which resulted in finding nothing but http & www comes up with results to other links in the file that i am not trying to hit but atleast shows some results instead of me going for specific keywords that i listed.

Im not sure if this is the proper way to go about this and if I should be using another method such as javascript to achieve this etc etc.

I am using Sublime Text on Mac to try and figure this out. I am new to regular expressions.

2 Answers2

0

Following should work:

.*drive.google.com.*sharing
  • . means any character

  • * The character before can appear multiple times

revo
  • 47,783
  • 14
  • 74
  • 117
marcos
  • 26
  • 1
  • Pardon me if I am wrong. From what I think I understand is "." sets it to be any character and is not set to just specific ones unless I specify those? Then "*" means any character no matter how long or far the string is it'll close it in to that last thing? Thank you it works on a small file but in a bigger one its grabbing other links or keywords aswell. Example: `

    link: https://drive.google.com/file/d/oSmNg0pMUhTZl9zRGd2VUE/view?usp=sharing`

    – Giorgio Armani Nov 05 '18 at 14:07
  • Never mind, I see what you guys mean now by the characters. I was able to tighten the search by adding more or specific keywords in the beginning vice vera. Thank you! – Giorgio Armani Nov 05 '18 at 14:13
  • "https.*drive.google.com.*sharing" this should work. "." is like a placeholder. It can be any character, but in your case you do not know how many spaces are in between. That is why you need to append "*". It means take the character before can appear between 0 and Infinite times. – marcos Nov 05 '18 at 14:13
  • What if this happens? Where both get selected and then it captures

    & also? Example: `

    https://drive.google.com/file/d/0B3GNg0pNzNCWWdFSXNzd00/view?usp=sharing


    https://drive.google.com/file/oSmNg0pNzRjWEFyNDRzam8/view?usp=sharing

    ` Do I then add different ones to strict it? Basically what I am asking is how would i go about stopping the search at certain spots? If the conditions can at start https and end with sharing and nothing after that.

    – Giorgio Armani Nov 05 '18 at 14:18
0

It sounds like you are trying to do this in some editor in Mac, but the question is tagged with "perl", so here is one way you can do this in Perl.

First, it helps to have a full example input and output to make sure we understand the desired behavior, so here is an example input test.doc:

<p>https://drive.google.com/file/d/0B3GNg0pNzNCWWdFSXNzd00/view?usp=sharing</p><br /><p>https://drive.google.com/sharing/oSmNg0pNzRjWEFyNDRzam8/view?usp=sharing<br /></p></div>
<p>http://drive.google.com/file/d/0B3GNg0pNzNCWWdFSXNzd00/view?usp=sharing</p><br/><p>https://drive.google.com/file/sharing/view?usp=sharing<br /></p></div>
https://drive.abc.com/file/d/efg/view?usp=sharing
https://drive.apple.com/file/d/abc/efg/view?usp=sharing
https://drive.google.com/file/d/xyz/skipme?usp=sharing https://drive.google.com/file/d/ef/view?usp=sharing 

I'll assume links are enclosed in whitespace or *ml tags <> here. Here is a Linux one-liner that will take the input test.doc and spit out matching html links. The [^\s<>]+ part will capture one or more characters that aren't whitespace \s or <> (i.e. negated character class due to [^), to prevent it from running ahead and matching more than one link on the same line:

perl -ne '@m = $_ =~ m{(https?://drive\.google\.com/[^\s<>]+view\?usp=sharing)}g; print "$_\n" for @m;' test.doc

This would give the following output:

https://drive.google.com/file/d/0B3GNg0pNzNCWWdFSXNzd00/view?usp=sharing
https://drive.google.com/sharing/oSmNg0pNzRjWEFyNDRzam8/view?usp=sharing
http://drive.google.com/file/d/0B3GNg0pNzNCWWdFSXNzd00/view?usp=sharing
https://drive.google.com/file/sharing/view?usp=sharing
https://drive.google.com/file/d/ef/view?usp=sharing

If the above doesn't exactly cover what you need, then please give a different input/output text fragment and someone can chime in on how you'd change the one-liner to match it.

Automaton
  • 143
  • 8