-1

I was using findall method using a regular expression object but i got entire expression match of my string although i had a group present in it.

I am using python 3.7.3

import re
def emailfinder(spam):
   emailregx=re.compile(r'''(
   [a-zA-Z0-9%_+-.]+
   @
   [a-zA-Z0-9.-]+
   (\.[a-zA-Z]{2,4})
   )''',re.VERBOSE)
   return emailregx.findall(spam)
print(emailfinder('tara9090@gmail.com blah monkey tanbajg@chscv.in'))

The output is [('tara9090@gmail.com', '.com'), ('tanbajg@chscv.in', '.in')]. but i was expecting it to be ['.com','.in']

Chris
  • 29,127
  • 3
  • 28
  • 51
  • 2
    you have two groups, perhaps the first one is typo? `r'''(` and `)'''` – Sundeep Apr 08 '19 at 05:45
  • 1
    http://regex101.com - put your text and expression in and read what it does in plain text – Patrick Artner Apr 08 '19 at 06:01
  • Valid email addresses are *by far* [more complicated](https://stackoverflow.com/questions/201323/how-to-validate-an-email-address-using-a-regular-expression) than one might think in the first place. Use a simple expression like `\S+@\S+` and actually write an email to that address. – Jan Apr 08 '19 at 06:16
  • Since you have wrapped the whole pattern with parentheses, you get the whole match, just what `re.findall` does. – Wiktor Stribiżew Apr 08 '19 at 07:02

2 Answers2

-1

You have redundant parenthesis, resulting two groups. Fixing it works:

import re
def emailfinder(spam):
   emailregx=re.compile(r'''
   [a-zA-Z0-9%_+-.]+
   @
   [a-zA-Z0-9.-]+
   (\.[a-zA-Z]{2,4}
   )''',re.VERBOSE)
   return emailregx.findall(spam)

print(emailfinder('tara9090@gmail.com blah monkey tanbajg@chscv.in'))
['.com', '.in']
Chris
  • 29,127
  • 3
  • 28
  • 51
-1

Grouping in re means that you want to catch only those parts. You have placed the grouping in the wrong place.

Python 3.7.1 (default, Dec 10 2018, 22:54:23) [MSC v.1915 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> pattern = re.compile(r'[a-zA-Z0-9%_+-.]+\@[a-zA-Z0-9.-]+(\.[a-zA-Z]{2,4})')
>>> all = pattern.findall('tara9090@gmail.com blah monkey tanbajg@chscv.in')
>>> print(all)
['.com', '.in']
>>>
Rarblack
  • 4,559
  • 4
  • 22
  • 33