Let's take the link of this question which is
https://stackoverflow.com/questions/74119810/is-it-possible-in-python-to-capture-individual-parts-of-a-url-with-constant-stru
Now, if you see the pattern, after https://(just ignore it)
, we have 2 "/". So we can split it based on these.
In [1]: link = "https://stackoverflow.com/questions/74119810/is-it-possible-in-p
...: ython-to-capture-individual-parts-of-a-url-with-constant-stru"
Let's remove https first
In [3]: link[8:]
Out[3]: 'stackoverflow.com/questions/74119810/is-it-possible-in-python-to-capture-individual-parts-of-a-url-with-constant-stru'
Now split it
In [4]: link[8:].split('/')
Out[4]:
['stackoverflow.com',
'questions',
'74119810',
'is-it-possible-in-python-to-capture-individual-parts-of-a-url-with-constant-stru']
Now the question id is index number 2.
so
In [5]: link[8:].split('/')[2]
Out[5]: '74119810'
Let's wrap it into a function:
In [6]: def get_qid(link:str):
...: return link[8:].split('/')[2]
And test it on a separate link.
In [7]: get_qid("https://stackoverflow.com/questions/74119795/how-to-create-sess
...: ion-in-graphql-in-fastapi-to-store-token-safely-after-generati")
Out[7]: '74119795'
As far as Question Title is concerned, you need to do some web scraping or use some kind of API to do so. Even though you can extract it from the link, it wont be complete since link removes some of the part of the title.
As you can see in this example:
In [10]: ' '.join(link[8:].split('/')[-1].split('-'))
Out[10]: 'is it possible in python to capture individual parts of a url with constant stru'
The last element of the splited link is title, we split it based on '-' which represents the space, and join it via space using ' '.join
.
The returned title is not complete since it was not encoded completely in the link.