I'm currently reading the book "Automate the boring stuff with Python" but got stucked in a line of the code in the project from CH7. I just cannot understand the author's logic here.
The problem can be found at the end. Project: Phone Number and Email Address Extractor. https://automatetheboringstuff.com/chapter7
The project outline is:
Your phone and email address extractor will need to do the following:
-Gets the text off the clipboard.
-Finds all phone numbers and email addresses in the text.
-Pastes them onto the clipboard.
Here's the code:
import re, pyperclip
#extracts phone number
phoneRegex = re.compile(r'''(
(\d{3}|\(\d{3}\))? # area code -> either 561 or (561)
(\s|-|\.)? # separator (if there is)
(\d{3}) # first 3 digits
(\s|-|\.) # separator
(\d{4}) # last 4 digits
(\s*(ext|x|ext.)\s*(\d{2,5}))? # extension
)''', re.VERBOSE)
#extracts email
emailRegex= re.compile(r'''(
[a-zA-Z0-9._%+-]+ # username
@ # @symbol
[a-zA-Z0-0._%+-]+ # domain name
(\.[a-zA-Z]{2,4}) # dot something
)''',re.VERBOSE)
# find matches in clipboard text.
text = str(pyperclip.paste()) #paste all string in to 'text' string
matches = []
for groups in phoneRegex.findall(text):
phoneNum= '-'.join([groups[1],groups[3],groups[5]]) #group 1- > area code, group 2-> separation, group 3 -> 699 etc
if groups[8] != ' ':
phoneNum += ' x' + groups[8]
matches.append(phoneNum)
for groups in emailRegex.findall(text):
matches.append(groups[0])
#Copy results to the clipboard. (our new string)
if len(matches) > 0:
pyperclip.copy('\n'.join(matches))
print('Copied to clipboard:')
print('\n'.join(matches))
else:
print('No phone numbers of email addresses found.')
Where I'm stucked is in this segment:
for groups in phoneRegex.findall(text):
phoneNum= '-'.join([groups[1],groups[3],groups[5]]) #area code, first 3 digits, last 4 digits of phone number
if groups[8] != ' ':
phoneNum += ' x' + groups[8]
matches.append(phoneNum)
The author explains that these are the area code, first 3 digits, and last 4 digits that was extracted from the phone number:
groups[1],groups[3],groups[5]
But this doesn't make sense to me. Notice that this for loop iterates through each element, 'groups' is not the whole list, its just one element of the list. So, groups[1] would be the second digit of the first element, not the actual element.
Just to illustrate my problem better, here's another example:
num= re.compile(r'(\d+)')
for groups in num.findall('Extract all 23 numbers 444 from 2414 at, 1'):
print(groups)
output:
23
444
2414
1
for groups in num.findall('Extract all 23 numbers 444 from 2414 at, 1'):
print(groups[0])
output:
2
4
2
1
So groups[0] is not the element, just the a digit of the element.
Hopefully this makes sense, because I'm having a lot of trouble understanding his reasoning. Any help would be appreciated.
UPDATE: Seems like groups[0] is the first element of the tupple
num= re.compile(r'(\d+)\D+(\d+)\D+(\d+)')
for groups in num.findall('Extract all 23 numbers 444 from 2414 at, 10,434,555'):
groups[0]
output:
23
10