How to get list of file's url using urllib.request?

Question

from urllib.request import urlopen
import re

urlpath =urlopen("http://blablabla.com/file")
string = urlpath.read().decode('utf-8')

pattern = re.compile('*.docx"')
onlyfiles = pattern.findall(string)

print(onlyfiles)

Target output

['http://blablabla.com/file/1.docx','http://blablabla.com/file/2.docx']

But I got this

[]

I get this error message when trying this.

re.error: nothing to repeat at position 0

score 2 · Accepted Answer · answered Mar 26 '20 at 01:05

The star from this line:

pattern = re.compile('*.docx"')

Apparently seems to be a python known bug:

Check out this related answers: regex error - nothing to repeat

Try this using word or a-z regexp:

pattern = re.compile('\w*.docx"')
# or
pattern = re.compile('[a-zA-Z0-9]*.docx"')

How to get list of file's url using urllib.request?

1 Answers1