EDIT: Figured it out. I just did the following:
import sys
sys.setrecursionlimit(1500) #This increases the recursion limit, ultimately moving
#up the ceiling on the stack so it doesn't overflow.
Check out this post for more info: What is the maximum recursion depth in Python, and how to increase it?
--------------ORIGINAL QUESTION-----------------
I'm currently scraping webpages for dates. As of now, I'm succesfully pulling the dates in the format I'm searching for using re.findall, but once I get to about the 33rd link, I get a "Maximum recursion depth exceeded while calling a Python object" error and it keeps pointing to the dates = re.findall(regex, str(webpage)) object.
From what I've read, I need to employ a loop within my code so that I can get rid of the recursion, but as a novice, I'm unsure how to change the piece of code dealing with the RegEx and re.findall from recursive to iterative. Thanks in advance for any insights.
import urllib2
from bs4 import BeautifulSoup as BS
import re
#All code is correct between imports and the start of the For loop
for url in URLs:
...
#Open and read the URL and specify html.parser as the parsing agent so that the parsing method remains uniform across systems
webpage = BS(urllib2.urlopen(req).read(), "html.parser")
#Create a list to store the dates to be searched
regex = []
#Append to a list those dates that have the end year "2011"
regex.append("((?:January|February|March|April|May|June|July|August|September|October|November|December|Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Sept|Oct|Nov|Dec)[\.]*[,]*[ ](?:0?[1-9]|[12][0-9]|3[01])[,|\.][ ](?:(?:20|'|`)[1][1]))")
#Join all the dates matched on the webpage from the regex by a comma
regex = ','.join(regex)
#Find the matching date format from the opened webpage
#[Recursion depth error happens here]
dates = re.findall(regex, str(webpage))
#If there aren't any dates that match, then go to the next link
if dates == []:
print "There was no matching date found in row " + CurrentRow
j += 1
continue
#Print the dates that match the RegEx and the row that they are on
print "A date was found in the link at row " + CurrentRow
print dates
j += 1