So for example I have 6 strings as follows:
https://twitter.com/test1
http://twitter.com/test2
https://www.twitter.com/test3?
https://www.mobile.twitter.com/test4
https://www.twitter.com/test5?lang=en
https://www.instagram.com/test1insta
And what I want to do is extract the twitter 'username' from these links. So in this case I would like to search each link with regex to get the username after twitter.com/
and in the cases where the links have a ?
for url parameters i would like to get everything before it.
For example it would come out like this:
test1
test2
test3
test4
test5
I have used search to get the pattern but I am struggling with how to get it to just extract the part I want. Here is what I have tried:
username = re.search(r'twitter.com\/(.*)\?', stringsList)
This results in only matching those strings that have a question mark after them which i understand. so just test3
and test5
.
I thought I would try making the question mark optional by doing this:
username = re.search(r'twitter.com\/(.*)\??', stringsList)
but instead that just returns all of the usernames with all the additional stuff I want, e.g:
test1
test2
test3?
test4
test5?lang=en
But I want it to still extract just the username as group 1 even though the ? should be optional.
What would my regex expression look like for me to do that or do I need to split this up and check if the string has a question mark first and use two different searches based on if its present or not?
I have a test bit of code here
and i've been trying to use this to determine the regex I would like