Trying to prefill a form from pdf data over here, and there's some part of regex match objects and dictionaries I'm stuck on.
"abbreviated" code:
import PyPDF2, regex, urllib.parse, webbrowser
#using regex instead of re as I was nesting lookarounds, but might not need to do this anymore.
### Define the field ids with sensible names
entering_email = 'field45865550'
uid_number = 'field45865570'
fname = 'field45865574-first'
lname = 'field45865574-last'
add_1 = 'field45865578-address'
city = 'field45865578-city'
state = 'field45865578-state'
zip = 'field45865578-zip'
skipping over the Py2PDF part, as this seems fine. please forgive my naming conventions.
### Open text file, search for field contents, and define them
with open (pdffile+'-text.txt', 'r') as text_file:
text = text_file.read()
entering_email_value = regex.search(r'(?<=Email:\|)(.*?)(?=\|)(?=.*\|Manager Information:)', text) or ["---"]
uid_number_value = regex.search(r'(?<=UID Number:\|)(.*?)(?=\|)', text) or ["---"]
fname_value = regex.search(r'(?<=First Name:\|)(.*?)(?=\|)', text) or ["---"]
lname_value = regex.search(r'(?<=Last Name:\|)(.*?)(?=\|)', text) or ["---"]
add_1_value = regex.search(r'(?<=Last Name:\|.*)(?<=Address:\|)(.*?)(?=\|)(?=.*Employee Information:)', text) or ["---"]
city_value = regex.search(r'(?<=Last Name:\|.*)(?<=City & State:\|)(.*?)(?=,)(?=.*Employee Information:)', text) or ["---"]
state_value = regex.search(r'(?<=Last Name:\|.*)(?<=, )(.*?)(?= )(?=.*Employee Information:)', text) or ["---"]
zip_value = regex.search(r'(?<=Last Name:\|.*)(?<=[A-Z][A-Z] )(.*?)(?=\|)(?=.*Employee Information:)', text) or ["---"]
getVars = {entering_email: entering_email_value.group(),
uid_number: uid_number_value.group(),
email: email_value.group(),
fname: iw_fname_value.group(),
lname: iw_lname_value.group(),
city: city_value.group(),
state: state_value.group(),
zip: zip_value.group()
}
webbrowser.open(url + urllib.parse.urlencode(getVars), new=0, autoraise=True)
The regex syntax might look weird, but works fine- I'm replacing "\n" with "|" because I didn't know about the DOTALL flag. My issue is that it looks like the OR statements appended to the regex.searches is being ignored. The source files will be missing info regularly, so default filling it with "" or "---" where there's no match is what I'm looking to do. I'm currently looking into list comprehensions to do this.
Basically, my question is am I doing this "right-ish"? I'm sure it's hacky.
My other question is one I'm trying to run down now, is a list comprehension the right answer for replacing none with "" ? - and any help with the structure & syntax - it seems like I should be able to roll it into the dictionary declaration?