Sorry to post another repetitive question, but I have been dealing with this fundamental concept, and despite trying to learn from others' examples I still do not understand it.
What I am trying to do is get the contents of a PDF using PyPDF2 and write them to a CSV, and I am slowly building and testing my program step by step. I am at the point where I want my program to do two things:
1 grab the text from the pdf file
- output the grabbed text to a single entry in a csv file.
Now here is where my lack of fundamental programming concepts starts to show. Here's the code:
import csv
import os
import PyPDF2
os.chdir('C:/Users/User/Desktop')
def getText(happy_file):
pdf_file_obj = open(happy_file, 'rb')
pdf_reader = PyPDF2.PdfFileReader(pdf_file_obj)
pdf_reader.numPages #optional
page_obj = pdf_reader.getPage(0)
return page_obj.extractText()
def writeToCSV(happy_file):
output_file = open('myfinalfile.csv', 'w', newline ='')
output_writer = csv.writer(output_file)
output_writer.writerow([str(getText())])
output_file.close()
I have two functions to accomplish this task getText and writeToCSV. My goal is to program it such that all I need to do is call writeToCSV('anyfile.pdf') and have it use both functions to extract the data and put it into the csv. happy_file is currently the argument for both functions but I know that needs to change. I am thinking that I need a third main() function that incorporates both functions in a way that the variables are contained inside main(). That might be the fundamental aspect that I am not seeing. Another hunch is that there has to be a way to make the return of getText a usable variable in writeToCSV (actually that is the whole purpose of this post). I have used the 'global' in front of a variable before to access variables in other functions but I have heard that it is a bad idea.
I get that I could just make it one function but as things get more complex (namely I want to loop through a bunch of pdfs), I would like to have my program in smaller chunks, each representing a step of the way. Maybe I am just really bad at understanding functions. Maybe seeing my actual code reformatted in the correct way will make it "click" for me.
Figuring this out would be a great step in the right direction of writing well structured programs rather than just one huge list of directions for the computer to carry out.
Here is a list of other posts I researched:
Python - Passing a function into another function
using the output of a function as the input in another function python new to coding
Python - output from functions?
Python: accessing returned values from a function, by another function
Thanks!