I have a list of URLS from which I am trying to fetch just the id numbers. I am trying to solve this out using the combination of URLParse and regular expressions. Here is how my function looks like:
def url_cleanup(url):
parsed_url = urlparse(url)
if parsed_url.query=="fref=ts":
return 'https://www.facebook.com/'+re.sub('/', '', parsed_url.path)
else:
qry = parsed_url.query
result = re.search('id=(.*)&fref=ts',qry)
return 'https://www.facebook.com/'+result.group(1)
However, I feel that the regular expression result = re.search('id=(.*)&fref=ts',qry)
fails to match some of the URLs as explained in the below example.
#1
id=10001332443221607 #No match
#2
id=6383662222426&fref=ts #matched
I tried to take the suggestion as per the suggestion provided in this answer by rephrasing my regular expression as id=(.*).+?(?=&fref=ts)
which again matches #2 but not #1 in the above examples.
I am not sure what I am missing here. Any suggestion/hint will be much appreciated.