How to download muliple files found in RegEx statement- Python

Asked Nov 26 '15 at 16:50

Active Nov 26 '15 at 16:51

Viewed 102 times

So I have a RegEx stament to download files from a website im scraping(personal prject of mine) and I would like to download the files found by the regex statements

Question: How could i do this using urllib or urllib2?

def GetImage():
    with open('TestPage.txt') as f:
        for line in f:
            v = re.findall(r'\w+\.jpg|\w+\.bmp|\w+.\gif', line)
            if v:
                os.system("wget v") # Could I replace this with urllib.retrieve(v)?
                #Code to download files found in v should go here.
                print v

def main():
    url = "http://testpage/~drc/drx/index.html"
    webpage = urllib2.urlopen(url)

    content = webpage.read()
    f = open('TestPage.txt', 'w')
    f.write(content)
    f.close()

I know that I can download one file from a specific URL, but downloading the images found in regex v is my problem

edited Nov 26 '15 at 16:51

K DawG

13,287
9
35
66

asked Nov 26 '15 at 16:50

Scott

Why not use requests? – K DawG Nov 26 '15 at 16:52
I wouldn't use regex to parse a url.... – R Nar Nov 26 '15 at 16:53
Im kind of set on using RegEx Statements - Sorry! – Scott Nov 26 '15 at 16:59
you first have to ensure that your matches are http addresses, but other than that, [this question](http://stackoverflow.com/questions/19602931/basic-http-file-downloading-and-saving-to-disk-in-python) may help. use a `for url in v:` loop to iterates through your matches – R Nar Nov 26 '15 at 17:07
How do I ensure that they are http matches? – Scott Nov 27 '15 at 13:11

How to download muliple files found in RegEx statement- Python

0 Answers0