I'm trying to extract URLs but I am only getting the last portion like the "com" and not the full "amazon.com" or "google.com". I'm using the following regex:
data = [['website is amazon.com'], ['url is google.com']]
reviews = pd.DataFrame(data, columns = ['ALL_TEXT'])
reviews['regex_match'] = reviews['ALL_TEXT'].str.extract(r'[^@A-Z][-A-Z0-9:%_\+~#=]+\.(CO|COM|NET|ORG|GOV)\b', flags=re.IGNORECASE)
I tried to use a capture group around the full regex
reviews['regex_match'] = reviews['ALL_TEXT'].str.extract(r'([^@A-Z][-A-Z0-9:%_\+~#=]+\.(CO|COM|NET|ORG|GOV)\b)', flags=re.IGNORECASE)
but I get the error
Wrong number of items passed 2, placement implies 1