0

Context: I am attempting to automate the inspection of eps files to detect a list of attributes, such as whether the file contains locked layers, embedded bitmap images etc.

So far we have found some of these things can be detected via inspection of the raw eps file data and its accompanying metadata (similar to the information returned by imagemagick.) However it seems that in files created by illustrator 9 and above the vast majority of this information is encoded within the "AI9_DataStream" portion of the file. This data is encoded via ascii85 and compressed. We have found some success in getting at this data by using: https://github.com/huandu/node-ascii85 to decode and nodes zlib library to decompress / unzip. (Our project is written in node / javascript). However it seems that in roughly half of our test cases / files the unzipping portion fails, throwing Z_DATA_ERROR / "incorrect data check".

Our method responsible for trying to decode:

export const decode = eps =>
   new Promise((resolve, reject) => {
     const lineDelimiters = /\r\n%|\r%|\n%/g;
     const internal = eps.match(
       /(%AI9_DataStream)([\s\S]*?)(AI9_PrivateDataEnd)/
     );
     const hasDataStream = internal && internal.length >= 2;

     if (!hasDataStream) resolve('');

     const encoded = internal[2].replace(lineDelimiters, '');
     const decoded = ascii85.decode(encoded);

     try {
       zlib.unzip(decoded, (err, buffer) => {
         // files can crash this process, for now we need to allow it
         if (err) resolve('');
         else resolve(buffer.toString('utf8'));
       });
     } catch (err) {
       reject(err);
     }
   });

I am wondering if anyone out there has had any experience with this issue and has some insight into what might be causing this and whether there is an alternative avenue to explore for reliably decoding this data. Information on this topic seems a bit sparse so really anything that could get us going in the right direction would be very much appreciated.

Note: The buffers produced by the ascii85 decoding all have the same 78 9c header which should indicate standard zlib compression (and it does in fact decompress into parsable data about half the time without error)

Hub
  • 11
  • 3
  • Hi, are you still working on it? Would you like to cooperate? :) You can get in touch with me at support@photopea.com – Ivan Kuckir Jan 15 '19 at 17:51
  • Hello. Thanks for the offer but I think the answer I added there at the top solves this particular issue. If you had other questions about it I can try to answer. – Hub Feb 22 '19 at 00:04
  • I mean, after decompressing the PostScript-ish data, did you try to analyze them further? I thought you wanted to make the full editor of AI files. – Ivan Kuckir Feb 22 '19 at 00:07
  • Oh haha no not attempting to build an editor, I suppose you could if so inclined. We are just trying to extract information such as whether any layers are locked, if there are unexpanded patterns or plugins, etc. In that we have been successful and can share some regexes if you are curious. – Hub Feb 22 '19 at 01:21

2 Answers2

1

Apparently we were misreading something about the ascii85 encoding. There is a ~> delimiter at the end of the encoded block that needs to be omitted from the string before decoding and subsequent unzipping.

So instead of:

/(%AI9_DataStream)([\s\S]*?)(AI9_PrivateDataEnd)/

Use:

/(%AI9_DataStream)([\s\S]*?)(~>)/

And you can get to the correct encoded / compressed data. So far this has produced human readable / regexable data in all of our current test cases so unless we are thrown another curve that seems to be the answer.

Hub
  • 11
  • 3
0

The only reliable method for getting content from PostScript is to run it through a PostScript interpreter, because PostScript is a programming language.

If you stick to a specific workflow with well understood input, then you may have some success in simple parsing, but that's about the only likely scenario which will work.

Note that EPS files don't have 'layers' and certainly don't have 'locked' layers.

You haven't actually pointed to a working example, but I suspect the content of the AI9_DataStream is not relevant to the EPS. Its probably a means for Illustrator to include its own native file format inside the EPS file, without it affecting a PostScript interpreter. This is how it works with AI-produced PDF files.

This means that when you reopen the EPS file with Adobe Illustrator, it ignores the EPS and uses the embedded native file, which magically grants you the ability to edit the file, including features like layers which cannot be represented in the EPS.

KenS
  • 30,202
  • 3
  • 34
  • 51
  • Yes I believe you are correct. The datastream portion of the file is the portion that retains the illustrator editing capabilities, unfortunately that is most of what we are interested in inspecting. We are pretty confident that information like that is gleanable from that encoded data, we just cant reliably decompress it. – Hub Sep 05 '18 at 15:22
  • Well its pretty hard to help without seeing an example..... Its also not an EPS or PostScript question I might add, because whatever is in there is Adobe proprietary stuff. Though older versions of Illustrator used something not entirely unlike PostScript and PDF. – KenS Sep 05 '18 at 16:49
  • Yes, this particular issue is only relevant to adobe generated eps files from illustrator 9 and above. – Hub Sep 05 '18 at 16:55