0

Sorry to post another repetitive question, but I have been dealing with this fundamental concept, and despite trying to learn from others' examples I still do not understand it.

What I am trying to do is get the contents of a PDF using PyPDF2 and write them to a CSV, and I am slowly building and testing my program step by step. I am at the point where I want my program to do two things:

1 grab the text from the pdf file

  1. output the grabbed text to a single entry in a csv file.

Now here is where my lack of fundamental programming concepts starts to show. Here's the code:

 import csv
 import os
 import PyPDF2

 os.chdir('C:/Users/User/Desktop')

 def getText(happy_file):
     pdf_file_obj = open(happy_file, 'rb')
     pdf_reader = PyPDF2.PdfFileReader(pdf_file_obj)
     pdf_reader.numPages #optional
     page_obj = pdf_reader.getPage(0)
     return page_obj.extractText()

 def writeToCSV(happy_file):
     output_file = open('myfinalfile.csv', 'w', newline ='')
     output_writer = csv.writer(output_file)
     output_writer.writerow([str(getText())])
     output_file.close()

I have two functions to accomplish this task getText and writeToCSV. My goal is to program it such that all I need to do is call writeToCSV('anyfile.pdf') and have it use both functions to extract the data and put it into the csv. happy_file is currently the argument for both functions but I know that needs to change. I am thinking that I need a third main() function that incorporates both functions in a way that the variables are contained inside main(). That might be the fundamental aspect that I am not seeing. Another hunch is that there has to be a way to make the return of getText a usable variable in writeToCSV (actually that is the whole purpose of this post). I have used the 'global' in front of a variable before to access variables in other functions but I have heard that it is a bad idea.

I get that I could just make it one function but as things get more complex (namely I want to loop through a bunch of pdfs), I would like to have my program in smaller chunks, each representing a step of the way. Maybe I am just really bad at understanding functions. Maybe seeing my actual code reformatted in the correct way will make it "click" for me.

Figuring this out would be a great step in the right direction of writing well structured programs rather than just one huge list of directions for the computer to carry out.

Here is a list of other posts I researched:

Python - Passing a function into another function

using the output of a function as the input in another function python new to coding

Python - output from functions?

Python: accessing returned values from a function, by another function

Thanks!

Community
  • 1
  • 1
Kevin
  • 391
  • 3
  • 6
  • 22
  • What exactly are you asking? Do you need a way to get what you have to work properly, or are you looking for a better/different way to solve your problem altogether? – Chrygore May 05 '17 at 16:15
  • @AndrewMcKernan I just need a way to make it work properly in order to help cement my understanding how how to use variables from one function in another. – Kevin May 07 '17 at 05:47

1 Answers1

2

You need to pass happy_file into the getText function within writeToCSV function.

You can then call writeToCSV as shown at the bottom of the code example.

 import csv
 import os
 import PyPDF2

 os.chdir('C:/Users/User/Desktop')

 def getText(happy_file):
     pdf_file_obj = open(happy_file, 'rb')
     pdf_reader = PyPDF2.PdfFileReader(pdf_file_obj)
     pdf_reader.numPages #optional
     page_obj = pdf_reader.getPage(0)
     return page_obj.extractText()

 def writeToCSV(happy_file):
     output_file = open('myfinalfile.csv', 'w', newline ='')
     output_writer = csv.writer(output_file)
     output_writer.writerow([str(getText(happy_file))])
     output_file.close()

writeToCSV("anyfile.pdf")

Alternatively, if for whatever reason you'd prefer a main() function you could do it like this:

 import csv
 import os
 import PyPDF2

 os.chdir('C:/Users/User/Desktop')

 def getText(happy_file):
     pdf_file_obj = open(happy_file, 'rb')
     pdf_reader = PyPDF2.PdfFileReader(pdf_file_obj)
     pdf_reader.numPages #optional
     page_obj = pdf_reader.getPage(0)
     return page_obj.extractText()

 def writeToCSV(happy_file):
     output_file = open('myfinalfile.csv', 'w', newline ='')
     output_writer = csv.writer(output_file)
     output_writer.writerow([str(getText(happy_file))])
     output_file.close()

 def main():
     writeToCSV("anyfile.pdf")

 if __name__ == "__main__":
     main()
Cov
  • 561
  • 5
  • 20