3

I have more than 200 files on Drive-mostly text, doc and pdf. I need to extract email address from them, into a spreadsheet.

Is there a script (Python, PHP?) that can make my life easier?

Mogsdad
  • 44,709
  • 21
  • 151
  • 275

1 Answers1

1

No it doesn't seem like that's trivially possible without a lot of code. What I would do personally is open up each file in google docs, search for @, and copy one at a time (how many emails do you have to find?).

If you need to do this with python you will have to download all the files (easy enough with google drive). Then go through each file format and use the specific method on those.

For text, just read in each file with with open("filename") as file: and search for emails line by line with regex. re.search.

For pdf use PyPDF, link.

For doc, first convert it to a file that python can open with catdoc, and then open it as text.

If you need to automate all of this entirely you may want to look into either web automation packages for downloading, or building a google drive/docs extension with the api.

Community
  • 1
  • 1
CornSmith
  • 1,957
  • 1
  • 19
  • 35