20

Can I use PIL, like in this example?

I only need to read the data, and I'm looking for the easiest simplest way to do it (I can't install pyexiv).

edit: I don't want to believe that the only way to do this is with some library (python-xmp-toolkit, pyexiv2, ...) that needs Exempi and Boost. There must be another option!

Community
  • 1
  • 1
dolma33
  • 4,133
  • 6
  • 28
  • 48

9 Answers9

14

Well, I was looking for something similar, then I came across the PHP equivalent question and I translated the answer to Python:

f = 'example.jpg'
fd = open(f)
d= fd.read()
xmp_start = d.find('<x:xmpmeta')
xmp_end = d.find('</x:xmpmeta')
xmp_str = d[xmp_start:xmp_end+12]
print(xmp_str)

you can then convert xmp_str and parse it with an XML API.

dirac3000
  • 746
  • 6
  • 15
  • I like... always had problems with truncated keywords when using packages like PIL to access data. Another benefit is that reading it from the jpg results in no dependencies when writing a reusable package. – sthzg Mar 28 '14 at 21:23
  • 4
    I had to open with 'rb' and find(b' – Chris Sherwood Nov 01 '18 at 18:03
  • 1
    XMP can now be in multiple separate pieces spread through the jpeg file, a condition this solution won't cope with. – hippietrail Jul 23 '19 at 10:20
11

XMP metadata can be found in applist.

from PIL import Image
with Image.open(filename) as im:
    for segment, content in im.applist:
        marker, body = content.split('\x00', 1)
        if segment == 'APP1' and marker == 'http://ns.adobe.com/xap/1.0/':
            # parse the XML string with any method you like
            print body
  • Nice, is that document anywhere? I only found https://github.com/python-pillow/Pillow/blob/da8f2737a8a325ed5bb1d24a777a0b4d3ddaa7d8/PIL/JpegImagePlugin.py#L57-L128 – Tobias Kienzler Jan 23 '17 at 10:20
  • 1
    This won't always work as some jpeg files have the APP1 marker "XMP\0://ns.adobe.com/xap/1.0/" for some reason, and that \0 will break the split() function. – hippietrail Jul 23 '19 at 10:22
  • If there are no further nulls in the body, you could do `content.rsplit(b'\x00', 1)` and `b'http://ns.adobe.com/xap/1.0/' in marker` instead. – mercator Nov 10 '20 at 18:35
3

I am also interested to know if there is a 'proper' easy way to do this.

In the mean time, I've implemented reading XMP packets using pure Python in PyAVM. The relevant code is here. Maybe this would be useful to you?

astrofrog
  • 32,883
  • 32
  • 90
  • 131
2
with open( imgFileName, "rb") as fin:
    img = fin.read()
    imgAsString=str(img)
    xmp_start = imgAsString.find('<x:xmpmeta')
    xmp_end = imgAsString.find('</x:xmpmeta')
    if xmp_start != xmp_end:
        xmpString = imgAsString[xmp_start:xmp_end+12]

    xmpAsXML = BeautifulSoup( xmpString )
    print(xmpAsXML.prettify())

Or you can use the Python XMP Toolkit

Rocket Nikita
  • 470
  • 2
  • 7
  • 20
user1911091
  • 1,219
  • 2
  • 14
  • 32
  • 1
    This will break when XMP is in multiple parts due to the jpeg format only allowing 64k for each chunk of such data. – hippietrail Jul 23 '19 at 10:25
1

A search through the PIL source (1.1.7) tells me that it can recognize XMP information in Tiff files, but I cannot find any evidence of a documented or undocumented API for working with XMP information using PIL at the application level.

From the CHANGES file included in the source:

+ Support for preserving ICC profiles (by Florian Böch via Tim Hatch).

  Florian writes:

  It's a beta, so still needs some testing, but should allow you to:
  - retain embedded ICC profiles when saving from/to JPEG, PNG, TIFF.
     Existing code doesn't need to be changed.
  - access embedded profiles in JPEG, PNG, PSD, TIFF.

  It also includes patches for TIFF to retain IPTC, Photoshop and XMP
  metadata when saving as TIFF again, read/write TIFF resolution
  information correctly, and to correct inverted CMYK JPEG files.

So the support for XMP is limited to TIFF, and only allows XMP information to be retained when a TIFF image is loaded, possibly changed, and saved. The application cannot access or create XMP data.

wberry
  • 18,519
  • 8
  • 53
  • 85
0

Shout out to Chris Sherwood for the solution that I used. Came here to find a way to pull XMP data from DJI Drone Images. I too did not want to install Exempi. So, for posterity, I pulled these easier methods together for those people looking to extract values from XMP headers without a lot of hassle-

    # Extract XMP Data
    f = open(image_files[i], 'rb')
    d= f.read()
    xmp_start = d.find(b'<x:xmpmeta')
    xmp_end = d.find(b'</x:xmpmeta')
    xmp_str = d[xmp_start:xmp_end+12]

    # Extract Latitude
    search_str = b'Latitude="'
    value_start = xmp_str.find(search_str) + len(search_str)
    value_end = xmp_str.find(b'"',value_start)
    value = xmp_str[value_start:value_end]
    lat = value.decode('UTF-8')
Rich
  • 1
0

Basing on answers from @dirac, @Rich, @user1911091 and a note from @hippietrail, I came up with this solution. Not quite elegant but gets the data in case it is scattered:

from bs4 import BeautifulSoup

f = open(self.filename, "rb")
d = f.read()
xmp_str = b""

while d:
    xmp_start = d.find(b"<x:xmpmeta")
    xmp_end = d.find(b"</x:xmpmeta")
    xmp_str += d[xmp_start : xmp_end + 12]
    d = d[xmp_end + 12 :]

xmpAsXML = BeautifulSoup(xmp_str)
print(xmpAsXML.prettify())
dotz
  • 884
  • 1
  • 8
  • 17
0

Pillow (a PIL fork) can now return the xmpmetada in a dictionary invoking the method getxmp.

It works for png, jpeg and tif images since version 8.3.

Documentation can be found here.

toto
  • 110
  • 1
  • 6
0

As of PIL 8.2.0, this can be achieved with the getxmp() Image method. It does require defusedxml to be installed though.

Martim Passos
  • 137
  • 1
  • 12