0

A table containing almost four thousand records includes a mediumblob field for each record that contains the record's associated PDF report. Under both MySQL Workbench and phpMyAdmin the relevant DOCUMENT column displays the data as a BLOB button or link. In the case of phpMyAdmin the link also indicates the size of the data the Blob contains.

The issue is that when the Blob button/link is clicked, under MySQL Workbench opening any of the files using the SQL Editor only displays the raw Blob data and under phpMyAdmin th link only allows the Blob data to be saved as a .bin file instead of displaying or saving the data as a viewable PDF file. All previous attempts to retrieve the original PDFs using PHP have failed - see related earlier thread: Extract Pdf from MySql Dump Saved as Text.

The filename field in the table shows that all the stored files are PDF files. Further research and tests indicate that the mediumblob data has been stored as application/octet-streams.

My question is how can the original PDFs be retrieved as readable PDFs? Is it possible for a .bin file saved from the database to be converted or used to recover the original PDF file?

Any assistance would be greatly appreciated.

ridgedale
  • 190
  • 1
  • 1
  • 14
  • try ti rename the bin to pdf, but wouldn't get my hopes up, how didi you indert the data into the database ? – nbk Jan 13 '22 at 16:10
  • 1
    @ K J: Thanks for your reply. I've run a test encoding and decoding of a PDF file from the cmd line and can confirm that the base64 encoded file's text does does with JVBER... The file also decoded back to a readable PDF. I'm beginning to wonder if the files were added to the database as an emailed attachment and the mediumblob might include all that additional overhead. – ridgedale Jan 13 '22 at 16:58
  • @ nbk: Changing the file extension from .bin to .pdf does not aalow the file to be opened. Adobe Reader just splits out the following error: "Adobe Acrobat Reader could not open 'application.pdf' because it is either not a supported file type or because the file has been damaged (for example, it was sent as an email attachment and wasn't correctly decoded)." I did not insert the data. I'm trying to recover the files from a past supplied backup. – ridgedale Jan 13 '22 at 17:00
  • @ K J: Thanks again for your feedback. The relevant data itself only goes back to 2017, so it's not that old. I had assumed that blobs would contain data encoded in a common format. I'm also trying to get hold of one of the software developers to see if s/he may be able to shed some light on how the data was encoded. – ridgedale Jan 13 '22 at 19:03

2 Answers2

1

In line with my assumption and Isaac's suggestion the only solution was to be able to speak to one of the software developers. It transpires that the documents have been zipped using an third-party library as well as the header being removed before then being stored in the database. The third-party library used is version 2.0.50727 of Chilkat, available from www.chilkatsoft.com. That version no longer appears to be available, but hopefully at least one of the later versions may do the job. Thanks again for everyone's input and assistance.

ridgedale
  • 190
  • 1
  • 1
  • 14
  • @KJ: Perhaps I did not explain clearly in the feedback above. It was explained that the file headers were stripped before the before the files were zipped and the resulting data then added to the database mediumblob field. That is why the line starts were not constant. Hope that helps. – ridgedale Jan 21 '22 at 15:19
0

Based on the discussion in the comments, it sounds like you'll need to either refer to the original source code or consult with the original developer to determine exactly how the data was stored.

Using phpMyAdmin to download the mediumblob data as a file will download a .bin file in many cases, I actually don't recall how it determines content type (for instance, a PNG file will download with a .png extension, but most other binary files simply download as a .bin when phpMyAdmin isn't sure what the extension should be, PDF included). So the behavior you're seeing from phpMyAdmin is expected and correct, but since the .bin file doesn't work when it's renamed to .pdf that means something has probably gone wrong with the import and upload.

BLOB data is often stored in a pretty standardized way, but it seems your data doesn't follow that method.

Without us seeing the code directly, we can't guess what exactly happened with storing the data and would only be guessing.

Isaac Bennetch
  • 11,830
  • 2
  • 32
  • 43
  • Thank you for your reply, Isaac. I hve contacted the developer but have so far received no response. That does not bode well unless some obscure method was used to strip the filetype from the files inserted into the database.As It would appear that every mediumblob field contains unreadable data. No filetype information appears to be included. – ridgedale Jan 18 '22 at 15:00