0

I've read an article in the link below:

Batch convert jpg to Google Docs

Batch convert jpg to Google Docs

in this, they posted a script that was designed for converting batch photos to google doc using OCR.

it was a great response. but what about 6 min limit? can you do something for that? or at least make it some way that in the second execution it wouldn't repeat the converted photos again and just continue where it left.

sorry I wouldn't bother you if knew programming. I've searched a lot and found a bunch of scripts but whatever I did I couldn't edit them so it fit the script I wanted.

thank you

  • At first, I deeply apologize that my answer was not useful for your situation. Unfortunately, from your question, I cannot understand about your current issue and your goal. I apologize for this. Can I ask you about the detail of them? – Tanaike Jan 24 '21 at 01:00
  • Thank you for replying. What is your current issue and your goal? – Tanaike Jan 24 '21 at 01:22
  • thank you for your consideration, sir. my problem is that if I have many jpeg files in my folder and when I run the script, because of the number of photos, it takes more than 6 minutes to convert them to docs. Google doesn't allow scripts for more than 6 minutes. only half of the files have been converted. if I run the script one more time it will do the same files again not the remained unconverted files. when you run the script for the second time it doesn't continue where it left off. – Amir Fotouhi Jan 24 '21 at 01:30
  • I read on some other page that with script you can increase this limitation (6minutes). here is the link:https://stackoverflow.com/questions/41971806/how-can-i-increase-the-6-minute-execution-limit-in-google-apps-script can you increase the limitation in this "converting jpg to doc" script using some sort of code? – Amir Fotouhi Jan 24 '21 at 01:30
  • sorry for the bad English. it's not my First language. – Amir Fotouhi Jan 24 '21 at 02:27
  • Thank you for replying. From your replying, I proposed a sample script as an answer. Could you please confirm it? Unfortunately, from your question, I cannot understand about the number of JPEG files and file size you want to convert. So I'm not sure whether above sample script can directly resolve your issue. If my proposed answer was not useful for your situation, I apologize. – Tanaike Jan 24 '21 at 02:51
  • 8:50:33 AM ---Notice--- Execution started--- 8:56:33 AM ---Error--- Exceeded maximum execution time--- This is my problem. I tried to convert 113 jpeg files total 45 Mb each ~500Kb to google doc. using this script: https://stackoverflow.com/questions/53687444/batch-convert-jpg-to-google-docs/65854662#65854662 – Amir Fotouhi Jan 24 '21 at 06:18
  • Thank you for replying. Unfortunately, from your replying, I couldn't understand about the response for my proposed answer to your this question. This is due to my poor English skill. I deeply apologize for this. Can I ask you about the detail of it? If my proposed answer was not useful for your situation, I apologize, again. – Tanaike Jan 24 '21 at 13:09

2 Answers2

0

I believe your goal as follows.

  • You want to convert JPEG files to Google Document using Google Apps Script.
  • In your situation, there are a lot of the JPEG files. So you want to reduce the process cost of the script (in Batch convert jpg to Google Docs ) for converting.

Issue and workaround:

In this case, I would like to propose to use the batch request for achieving your goal. You can see the sample script for the batch request at here. But, I thought that the creation of request body of the batch request is a bit complicate. So, in this answer, I would like to propose to achieve your goal using a Google Apps Script library (BatchRequest). Of course, if you don't want to use the library, you can create the script by modifying the sample script in here.

Usage:

1. Install Google Apps Script library.

You can see it at https://github.com/tanaikech/BatchRequest#how-to-install.

2. Sample script.

The sample script using the library is as follows. Please copy and paste the following script to the script editor and set the variables of srcFolderId and dstFolderId for your actual situation. And, please enable Drive API at Advanced Google services. And please run myFunction.

function myFunction() {
  const srcFolderId = "###";  // Please set the folder ID of the folder including JPEG files.
  const dstFolderId = "###";  // Please set the folder ID of the destination folder.

  // 1. Retrieve file list of JPEG files using files.list method in Drive API.
  const headers = {authorization: `Bearer ${ScriptApp.getOAuthToken()}`};
  const q = `'${srcFolderId}' in parents and mimeType='${MimeType.JPEG}' and trashed=false`;
  const url = `https://www.googleapis.com/drive/v3/files?pageSize=1000&q=${q}&fields=${encodeURIComponent("nextPageToken,files(id)")}`;
  let pageToken = "";
  let files = [];
  do {
    const res = UrlFetchApp.fetch(url + "&pageToken=" + pageToken, {headers: headers, muteHttpExceptions: true});
    if (res.getResponseCode() != 200) throw new Error(res.getContentText());
    const obj = JSON.parse(res.getContentText());
    files = files.concat(obj.files);
    pageToken = obj.nextPageToken || "";
  } while(pageToken);

  // 2. Convert JPEG files to Google Document using files.copy method in Drive API. In this case, this is run with the batch process.
  const requests = files.map(({id}) => ({
    method: "POST",
    endpoint: `https://www.googleapis.com/drive/v3/files/${id}/copy`,
    requestBody: {parents: [dstFolderId], mimeType: MimeType.GOOGLE_DOCS},
  }));
  const res = BatchRequest.EDo({batchPath: "batch/drive/v3", requests: requests});
  console.log(res);
}

Note:

  • When the batch request is used, each request is run with the asynchronous process. So I think that the process cost will become lower than that of https://stackoverflow.com/a/53698250. But, unfortunately, from your question, I cannot understand about the number of JPEG files and file size you want to convert. So I'm not sure whether above sample script can directly resolve your issue. When an error occurs, please show it. And, if there are the files which cannot be converted, they might not be able to be converted by Drive API. Please be careful this.

References:

Tanaike
  • 181,128
  • 11
  • 97
  • 165
  • thank you for your reply. I tried this twice. at the first time, it just converted 10 jpeg out of 113. and on the second try, it did convert 39 jpeg files out of 113. here is the log: https://docs.google.com/document/d/e/2PACX-1vRxmTcTfZQ6fSJQsaXHglYub5AhwstyrnVB2TkGBCbqacKKg_OXHM_rTUFykNP6hJHVOMRALh9LowN5/pub – Amir Fotouhi Jan 26 '21 at 18:45
  • @Amir Fotouhi Thank you for replying. I apologize for the inconvenience. Unfortunately,. I cannot replicate your situation. This is due to my poor skill. I deeply apologize for this. In order to correctly understand about your current issue, can you provide the detail information for replicating your issue? By this, I would like to confirm it. – Tanaike Jan 27 '21 at 00:33
0

Is it necessary you do this in google app script? If not, you can use google API to get this done. For example, you create an authentication token from Google developer and then use your favourite programming language to make request to google endpoint - this method has no timeout(unlike google app script)

Edit

Per your request. To begin, go here and follow step 1 and step 2 instructions to get the needed credentials and also, to install google client library for python (python 3.8 in my case)

I have broken the source code down into two files (so copy each section of below code and save them in their respected python file - .py file extension)

credential.py (file name)

from __future__ import print_function
import pickle
import os.path
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request

# If modifying these scopes, delete the file token.pickle.
SCOPES = ['https://www.googleapis.com/auth/drive']

def cred():
    
    creds = None
    # The file token.pickle stores the user's access and refresh tokens, and is
    # created automatically when the authorization flow completes for the first
    # time.
    if os.path.exists('token.pickle'):
        with open('token.pickle', 'rb') as token:
            creds = pickle.load(token)
    # If there are no (valid) credentials available, let the user log in.
    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file(
                'credentials.json', SCOPES)
            creds = flow.run_local_server(port=0)
        # Save the credentials for the next run
        with open('token.pickle', 'wb') as token:
            pickle.dump(creds, token)

    drive = build('drive', 'v2', credentials=creds)

    return drive

image_to_doc.py (file name)

"""
Convert image file to doc
"""

from apiclient import errors

from credentials import cred

def retrieve_all_files():
    """Retrieve a list of File resources. 
    Returns:
        List IDs of File resources.
    """
    FOLDER_ID = YOUR_FOLDER_ID # Make sure only image files are on this Folder.
    result_tmp = []
    page_token = None
    while True:
        try:
            param = {}
            if page_token:
                param['pageToken'] = page_token
            files = cred().children().list(folderId=FOLDER_ID, **param).execute()

            result_tmp.extend(files['items'])
            page_token = files.get('nextPageToken')
            if not page_token:
                break
        except errors.HttpError as error:
            print(f'An error occurred: {error}')
            break
    result = [r['id'] for r in result_tmp]
    return result

def convert(ids):
    """
    If you have deleted files(less than 30 days, if you have the feature turned on) on the folder,
    they will be included on the files to be converted. To get over this, delete the deleted file(s)
    from the trash folder(delete forever)
    """
    drive = cred()
    try:
        for id in ids:
            copy_file = drive.files().copy(
                fileId=id,
                body=None, ocr=True).execute()
            # print(copy_file['id'])
    except errors.HttpError as error:
        print(f'An error occurred: {error}')

def main():
    files = retrieve_all_files()
    convert(files)

if __name__ == "__main__":
    main()

You can then run it with python image_to_doc.py

What this does is, it create a google doc copy (using OCR) of each image in the folder

solomonk
  • 1
  • 2