1

With reference to PDF to Github Flavored Markdown

Now with PDF support on GitHub, I have a PDF file (generated by my own txt2pdf converter) not shown correctly on GitHub, but okay when using Adobe Reader or Google Chrome.

Is it an issue with GitHub PDF preview, or my own converter? (I do not know which channel to report to, hence this post on SO)

My PDF file is v1.4.

Example PDF file: https://github.com/txt2pdf/pdfdump/blob/master/sample.pdf

Thanks @VonC and @mkl for both of your kind feedback. I have fixed the program and recalculate the xref table, but this sample2.pdf still has some unknown issue where online PDF repair tool could not detect.

https://github.com/txt2pdf/pdfdump/blob/master/sample2.pdf

LATEST UPDATE: I remove the "T*" from each text block (EDIT: and also use capital letter "/F1" instead of "/f1") when generating the output PDF file. Now it is shown correctly on GitHub. So the issue was with my converter, not GitHub's.

https://github.com/txt2pdf/pdfdump/blob/master/sample3.pdf

  • Please share an example PDF that illustrates the issue. – mkl Sep 30 '20 at 14:25
  • I should have stated that GitHub displays a blank PDF. @mkl Thanks for trying to solve the issue. The link to sample PDF is included now in my original post. –  Sep 30 '20 at 18:25
  • I checked out the project. The file in question is broken, the cross references are incorrect. It looks like they have been calculated counting line breaks as single bytes but in the final file they are represented as two bytes, CR LF. – mkl Oct 01 '20 at 10:45
  • This is a helpful remark. I redo the program and PDF repair tool reported no issue with the generated PDF files. But GitHub still displays the new ```sample2.pdf``` as blank page. –  Oct 01 '20 at 14:24
  • Is there an option of text2pdf which would avoid generating those T* in the first place? – VonC Oct 01 '20 at 15:37
  • The T* is a text positioning operator according to [PDF 1.7 file format spec](https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf). I have removed it. But I made two changes just now, which I forgot to mention the other changes: I have also use /F1 instead of /f1 in each text block so that PDF reader would render the font type correctly. BTW, the Text2PDF being mentioned in your answer is not mine. The actual Win32 program of mine is only hosted on GitHub as (pdfdump)[https://www.github.com/txt2pdf/pdfdump]. –  Oct 01 '20 at 15:50
  • Thank you. I have updated the answer accordingly. – VonC Oct 01 '20 at 16:01

1 Answers1

0

The uploaded sample.pdf, when downloaded... is opened as a blank page by the pdf reader in Chromium.

Check if you have a directive like git config core.autocrlf set to true. That could have changed eol (end of line) characters in the file.

Clone the repository, and check the sha1 of the cloned pdf:

git ls-files -s sample.pdf

Compare it with your original PDF (the one you can open with a content in it instead of a blank page)

git hash-object original/sample.pdf

That way, you will know if the file was somehow altered, when it was added/committed/pushed.

I have clone the repository, and do see the content of the pdf file when I open it with an Adobe Acrobat Reader.

As noted by the OP, the issue was with the txt2pdf (pdfdump) tools:

  • remove the "T*" from each text block (Text positioning operator from the PDF32000_2008.pdf ISO 32000-1:2008 specification): it moves to the start of the next line.
  • Use capital letter "/F1" instead of "/f1"

That being said, I would recommend git config --global core.autocrlf false to be sure that Git does not add any other modification.

VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
  • Hi @VonC. The cloned ```sample.pdf``` is identical to the original ```sample.pdf``` in my project folder. I use ```fc /b ``` and no binary differences encountered. What happens when you click "Raw" of the blank PDF file? It will start downloading and open as correct PDF content in my browser. My ```git config core.autocrlf``` was set to ```true```. –  Oct 01 '20 at 08:40
  • @BooKhanMing When I click "Raw" of the blank PDF file, It does indeed start downloading and... open as blank in my Chrome browser (Win10, Version 85.0.4183.121) – VonC Oct 01 '20 at 08:41
  • Thanks @VonC for the helpful feedback. It is strange, I can see the PDF content and I am using the same version (32-bit) of Chrome browser as yours. –  Oct 01 '20 at 08:48
  • @BooKhanMing Leaving Chrome aside, did you try and clone your repository, and open the pdf? – VonC Oct 01 '20 at 08:49
  • Yes @Vonc. While I am new to Git, I managed to ```git clone url```. Then I start a new command prompt window, change directory to the cloned folder, and type ```start sample.pdf```. It would be opened by the default PDF reader and I can see the content. The ```sample.pdf``` is 10357 bytes (both orginal and cloned are the same). –  Oct 01 '20 at 08:57
  • @BooKhanMing I agree, and I have updated the answer: the pdf reader of Chrome seems to be the culprit. – VonC Oct 01 '20 at 09:46
  • Thanks @VonC for the update. It does look like my txt2pdf's generated PDF is not perfect as well. –  Oct 01 '20 at 10:11
  • @Boo following mkl's comment, can you git em, then git add again the file, after typing `git config --global core.autocrlf false`? – VonC Oct 01 '20 at 10:49
  • Now I know that this ```autocrlf``` configuration setting could affect the final file size. My ```sample2.pdf``` is using Unix LF instead of Windows CR+LF. Yes, I set it to false and the file size unchanged, but if I set it to true the file size would be bigger. My previous ```sample.pdf``` is not affected by this setting because it is already generated with Windows CR+LF. –  Oct 01 '20 at 14:22
  • @BooKhanMing Always set it to false: `git config --global core.autocrlf false` (https://stackoverflow.com/a/50771174/6309) – VonC Oct 01 '20 at 14:28