3

I have routines that collect and analyze file/folder data in relation to database content--system has been in place and working well for many years. These routines use vbscript/AccessVBA to collect file information and prepare/load records to a SQL server db. I'm not currently storing the filestreams in SQL server, just their paths and data about the files. Now I need to extract XML metadata from some of these files, which I haven't had to work with.

The files are JPEG2000 derived from TIFFs. They are generated via batch and metadata from the original TIFFs is added to the JP2s. I can see the XML using JP2 Meta Editor:

j2k tool

The XML looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<!-- Originating Facility -->
<TIFF>
   <METADATA>
      <FILENAME>L145Y1921I001S0005.tif</FILENAME>
      <SEPARATOR>\</SEPARATOR>
      <PARENT>I:\Processing_Unit\L145\Box127</PARENT>
      <CANONICALPATH>I:\Processing_Unit\L145\Box127\L145Y1921I001S0005.tif</CANONICALPATH>
      <ABSOLUTEPATH>I:\Processing_Unit\L145\Box127\L145Y1921I001S0005.tif</ABSOLUTEPATH>
      <PATH>I:\Processing_Unit\L145\Box127\L145Y1921I001S0005.tif</PATH>
      <FILE>true</FILE>
      <DIRECTORY>false</DIRECTORY>
      <FILELENGTH>18462952</FILELENGTH>
      <HIDDEN>false</HIDDEN>
      <ABSOLUTE>true</ABSOLUTE>
      <URL>file:/I:/Processing_Unit/L145/Box127/L145Y1921I001S0005.tif</URL>
      <URI>file:/I:/Processing_Unit/L145/Box127/L145Y1921I001S0005.tif</URI>
      <READ>true</READ>
      <WRITE>true</WRITE>
      <EXTENSION>tif</EXTENSION>
      <MODIFIED>2009-04-02 11:17:31</MODIFIED>
      <DATE>20090402</DATE>
      <DATEPATTERN>yyyyMMdd</DATEPATTERN>
      <TIME>111731987</TIME>
      <TIMEPATTERN>HHmmssSSS</TIMEPATTERN>
      <TYPE>image/tiff</TYPE>
      <PID>null</PID>
      <OID>null</OID>
      <FID>null</FID>
      <PROCESSOR>unknown</PROCESSOR>
   </METADATA>
   <HEADER>
      <LITTLEENDIAN>true</LITTLEENDIAN>
      <VERSION>1.0</VERSION>
   </HEADER>
   <IMAGEFILEDIRECTORY>
      <ELEMENT>
         <NAME>NewSubfileType</NAME>
         <TAG>254</TAG>
         <LENGTH>1</LENGTH>
         <TYPE>4</TYPE>
         <VALUE>0</VALUE>
      </ELEMENT>
      <ELEMENT>
         <NAME>ImageWidth</NAME>
         <TAG>256</TAG>
         <LENGTH>1</LENGTH>
         <TYPE>3</TYPE>
         <VALUE>2705</VALUE>
      </ELEMENT>
      <ELEMENT>
         <NAME>ImageLength</NAME>
         <TAG>257</TAG>
         <LENGTH>1</LENGTH>
         <TYPE>3</TYPE>
         <VALUE>2275</VALUE>
      </ELEMENT>
      <ELEMENT>
         <NAME>BitsPerSample</NAME>
         <TAG>258</TAG>
         <LENGTH>3</LENGTH>
         <TYPE>3</TYPE>
         <VALUE>8</VALUE>
         <VALUE>8</VALUE>
         <VALUE>8</VALUE>
      </ELEMENT>
      <ELEMENT>
         <NAME>Compression</NAME>
         <TAG>259</TAG>
         <LENGTH>1</LENGTH>
         <TYPE>3</TYPE>
         <VALUE>1</VALUE>
      </ELEMENT>
      <ELEMENT>
         <NAME>PhotometricInterpretation</NAME>
         <TAG>262</TAG>
         <LENGTH>1</LENGTH>
         <TYPE>3</TYPE>
         <VALUE>2</VALUE>
      </ELEMENT>
      <ELEMENT>
         <NAME>DocumentName</NAME>
         <TAG>269</TAG>
         <LENGTH>22</LENGTH>
         <TYPE>2</TYPE>
         <VALUE>L145Y1921I001S0005.tif</VALUE>
      </ELEMENT>
      <ELEMENT>
         <NAME>ImageDescription</NAME>
         <TAG>270</TAG>
         <LENGTH>6</LENGTH>
         <TYPE>2</TYPE>
         <VALUE>paper</VALUE>
      </ELEMENT>
      <ELEMENT>
         <NAME>Make</NAME>
         <TAG>271</TAG>
         <LENGTH>10</LENGTH>
         <TYPE>2</TYPE>
         <VALUE>Phase One</VALUE>
      </ELEMENT>
      <ELEMENT>
         <NAME>Model</NAME>
         <TAG>272</TAG>
         <LENGTH>6</LENGTH>
         <TYPE>2</TYPE>
         <VALUE>P 30+</VALUE>
      </ELEMENT>
      <ELEMENT>
         <NAME>Orientation</NAME>
         <TAG>274</TAG>
         <LENGTH>1</LENGTH>
         <TYPE>3</TYPE>
         <VALUE>1</VALUE>
      </ELEMENT>
      <ELEMENT>
         <NAME>SamplesPerPixel</NAME>
         <TAG>277</TAG>
         <LENGTH>1</LENGTH>
         <TYPE>3</TYPE>
         <VALUE>3</VALUE>
      </ELEMENT>
      <ELEMENT>
         <NAME>RowsPerStrip</NAME>
         <TAG>278</TAG>
         <LENGTH>1</LENGTH>
         <TYPE>3</TYPE>
         <VALUE>2275</VALUE>
      </ELEMENT>
      <ELEMENT>
         <NAME>XResolution</NAME>
         <TAG>282</TAG>
         <LENGTH>1</LENGTH>
         <TYPE>5</TYPE>
         <VALUE>300.0</VALUE>
      </ELEMENT>
      <ELEMENT>
         <NAME>YResolution</NAME>
         <TAG>283</TAG>
         <LENGTH>1</LENGTH>
         <TYPE>5</TYPE>
         <VALUE>300.0</VALUE>
      </ELEMENT>
      <ELEMENT>
         <NAME>PlanarConfiguration</NAME>
         <TAG>284</TAG>
         <LENGTH>1</LENGTH>
         <TYPE>3</TYPE>
         <VALUE>1</VALUE>
      </ELEMENT>
      <ELEMENT>
         <NAME>ResolutionUnit</NAME>
         <TAG>296</TAG>
         <LENGTH>1</LENGTH>
         <TYPE>3</TYPE>
         <VALUE>2</VALUE>
      </ELEMENT>
      <ELEMENT>
         <NAME>Software</NAME>
         <TAG>305</TAG>
         <LENGTH>51</LENGTH>
         <TYPE>2</TYPE>
         <VALUE>Capture One 4 Windows; Adobe Photoshop CS3 Windows</VALUE>
      </ELEMENT>
      <ELEMENT>
         <NAME>DateTime</NAME>
         <TAG>306</TAG>
         <LENGTH>20</LENGTH>
         <TYPE>2</TYPE>
         <VALUE>2009:03:26 11:23:36</VALUE>
      </ELEMENT>
      <ELEMENT>
         <NAME>Artist</NAME>
         <TAG>315</TAG>
         <LENGTH>33</LENGTH>
         <TYPE>2</TYPE>
         <VALUE>Preservation Center</VALUE>
      </ELEMENT>
      <ELEMENT>
         <NAME>Custom</NAME>
         <TAG>34665</TAG>
         <LENGTH>1</LENGTH>
         <TYPE>4</TYPE>
         <VALUE>null</VALUE>
      </ELEMENT>
      <ELEMENT>
         <NAME>Custom</NAME>
         <TAG>34675</TAG>
         <LENGTH>560</LENGTH>
         <TYPE>7</TYPE>
         <VALUE>null</VALUE>
      </ELEMENT>
   </IMAGEFILEDIRECTORY>
</TIFF>

I need to extract the original document name--the parent TIFF name--from each of these JP2 files.

Is there a straightforward way to incorporate this into the existing file collection routine, using VBA/VBscript? I will need to be able to process hundreds of thousands of existing file records to get this new additional value, as well as including this extraction in folder scans going forward.

Thanks in advance.

Erik A
  • 31,639
  • 12
  • 42
  • 67
spudchick
  • 41
  • 2
  • Correction: I need the original parent document path, including name ((ABSOLUTEPATH). – spudchick Sep 05 '17 at 21:20
  • i.e, you want this calue "I:\Processing_Unit\L145\Box127\L145Y1921I001S0005.tif" from the xml you posted? – Gurmanjot Singh Sep 06 '17 at 06:22
  • The real problem is outlined in [this question](https://stackoverflow.com/questions/44220571/read-jpeg2000-metadata): there is still no way to read the metadata through an open-source command-line application. – Erik A Sep 06 '17 at 11:24
  • @Kira yes, that's what I'm after. – spudchick Sep 06 '17 at 14:33
  • @Erik I hope that isn't true, but I can buy something that can be integrated this way if not too costly. – spudchick Sep 06 '17 at 14:33
  • @spudchick if it isn't, you could answer the question I've linked to. [Kakadu](http://kakadusoftware.com/) might have a tool, but they're commercial, and their 6 month evaluation version already costs $500 – Erik A Sep 06 '17 at 15:09

0 Answers0