Use regular expressions to capture the information we want. Depending on which data object the data is stored in, and how the task is processed within the larger workflow, we can implement regular expressions a few different ways (Can look into further if needed).
To start out with, we’ll build a pattern that matches the string you’re looking for and extracts the section you want.
# regular expression library
import re
# expression pattern as p
p = ‘https*://(.+\.com)’
# input string as a
s = ‘https://www.stackoverflow.com’
# regular expression if conditional that captures the match within the parentheses
if re.search(p, s) is not None:
m = re.search(p, s)
print(m.group(1))
Returns:
www.stackoverflow.com
A couple notes on this code:
Note that this expression uses re.search; re.search scans the entire input string for the first instance of the match and then returns it. If we needed to match multiple returns with one pattern, we would need a different re
method.
The capture occurs with two parts: First, the parentheses in the expression pattern form a capture group. Second, the capture group is returned by calling the .group(1)
method of the re match object (which is the m
above). If we print the .group(0) method, then it will return the entire string match.
Let me know if this works, and we can look at implementation if needed. Hope this helps!