1

I use Javascript and Google Drive API to get PDF, but I can't parse the file correctly: the number of pages is correct, but the content is blank.

I'm pretty sure the function which help the browser read typedarray in the pdf.js can work, because it's also used to convert PDF picked up from the HTML input tag.

I have found this Q&A, but this method is no longer useful. It seems that Google has changed the method of API.

I have also found this, and then I tried to use new File() rather than new Blob(). Unfortunately, it's still failed.

By the way, the webViewLink and the webContentLinl both work. I can use them to download or preview PDF.

Is there any other way to solve the problem? Thank you for the help.

Here's my code

        createPicker() {
            const view = new window.google.picker.View(window.google.picker.ViewId.DOCS);
            view.setMimeTypes('application/pdf');
            const picker = new window.google.picker.PickerBuilder()
                .enableFeature(window.google.picker.Feature.NAV_HIDDEN)
                .enableFeature(window.google.picker.Feature.MULTISELECT_ENABLED)
                .setDeveloperKey(process.env.GOOGLE_CONFIG.PICKER_API_KEY)
                .setAppId(process.env.GOOGLE_CONFIG.APP_ID)
                .setOAuthToken(this.accessToken)
                .addView(view)
                .addView(new window.google.picker.DocsUploadView())
                .setCallback(this.pickerCallback)
                .build();
            picker.setVisible(true);
        },
        async pickerCallback(data) {
            if (data.action === window.google.picker.Action.PICKED) {
                const document = data[window.google.picker.Response.DOCUMENTS][0];
                const fileId = document[window.google.picker.Document.ID];

                try {
                    const response = await window.gapi.client.drive.files.get({
                        fileId,
                        alt: 'media',
                    });
                    const blob = new Blob([response.body], { type: document.mimeType });
                    const arrayBuffer = await blob.arrayBuffer();
                    const typedarray = new Uint8Array(arrayBuffer);
                    // TODO: use pdf.js read file
                } catch (error) {
                    console.error('Error getting file content:', error);
                }
            }
        },
Jessie Ho
  • 13
  • 3
  • First, about `I have found this Q&A, but this method is no longer useful.`, I apologize my answer was not useful for your situation. About your question, I have a question. I cannot understand the relationship between `I use Javascript and Google Drive API to get PDF, but I can't parse the file correctly: the number of pages is correct, but the content is blank.` and your script. Can I ask you about the detail of it? – Tanaike Aug 01 '23 at 23:05
  • And also, can I ask you about the detail of `It seems that Google has changed the method of API.`? – Tanaike Aug 01 '23 at 23:05
  • @KJ Sorry, I'm not familiar with coverting the file to a different type of bit. How to check what kind of bit that I get from google? And if I wanna do deep research, what is the key word about this. Thank you. – Jessie Ho Aug 02 '23 at 04:35
  • @Tanaike I appreciate that you are willing to join the discussion. It's my fault that I didn't explain clearly about the relationship between my code and the context above. – Jessie Ho Aug 02 '23 at 07:19
  • @Tanaike In fact, I use Google Picker UI to get details of the file, as you can see above in my script, I wrote the function named `createPicker()`. And then I use those details from Google Picker API to got the response from Goolge Drive API. According to [Google documents](https://developers.google.com/drive/api/guides/manage-downloads), the response from Goolge Drive API are considered blob. In the end, I tried to convet the blob to Uint8Array so that pdf.js can read it and show the content on the webpage. – Jessie Ho Aug 02 '23 at 07:19
  • @Tanaike and the answer about `It seems that Google has changed the method of API.? ` is that we should use `files.get` rather than `files.export` to download a blob file stored on Google Drive. And more details are also mentioned in [Google documents](https://developers.google.com/drive/api/guides/manage-downloads). If you have any ideas or other questions, please contact me. – Jessie Ho Aug 02 '23 at 07:27
  • Thank you for replying. From your reply, I proposed a modified script as an answer. Please confirm it. If that was not useful, I apologize. – Tanaike Aug 02 '23 at 07:46

1 Answers1

0

From your reply in the comment and your showing script, how about the following modification?

From:

const response = await window.gapi.client.drive.files.get({
    fileId,
    alt: 'media',
});
const blob = new Blob([response.body], { type: document.mimeType });
const arrayBuffer = await blob.arrayBuffer();
const typedarray = new Uint8Array(arrayBuffer);

To:

const { body } = await window.gapi.client.drive.files.get({ fileId, alt: 'media' });
const typedarray = new Uint8Array(body.length).map((_, i) => body.charCodeAt(i));
  • When I tested this modified script with pdfjsLib.getDocument(typedarray).promise.then(({numPages}) => console.log(numPages)), I confirmed that the correct page number could be obtained.

  • If you want to retrieve the blob from this response, you can use const blob = new Blob([new Uint8Array(body.length).map((_, i) => body.charCodeAt(i))]).

Note:

  • In this modification, it supposes that you can retrieve the file ID of the PDF file on Google Drive and you can access the PDF file. Please be careful about this.
Tanaike
  • 181,128
  • 11
  • 97
  • 165
  • I'm so excited!!!!! It did work!!!!!!! Thank you very much for your kindness and assistance :) – Jessie Ho Aug 02 '23 at 08:22
  • @Jessie Ho Thank you for replying and testing it. I'm glad your issue was resolved. I could correctly understand your question with your cooperation. Thank you, too. – Tanaike Aug 02 '23 at 08:26