Python script to get metadata utf-8 UnicodeDecodeError after running 50%

Question

I have a python script that can get the metadata from a file and it works, somewhat. After it gets data for some files I get a UnicodeDecodeError on this line data = data.decode() and I am not sure why.

Here is the error:

PS C:\Users\Eddie\OneDrive\VS Code 1\GitHub Stuff\Scripts> python -u    
"c:\Users\Eddie\OneDrive\VS Code 1\GitHub Stuff\Scripts\metadata.py"
Traceback (most recent call last):
File "c:\Users\Eddie\OneDrive\VS Code 1\GitHub Stuff\Scripts\metadata.py", line 39, in 
<module>data = data.decode()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 48: invalid start 
byte

Here is the script:

from PIL import Image
from PIL.ExifTags import TAGS
import os
import os.path

rootdir = r"C:\Users\Eddie\Pictures\pics"

newfile = newfile = open('meta.txt', 'w')

for file in os.listdir(rootdir):
    # read the image data using PIL
    image = Image.open(os.path.join(rootdir, file))

    # extract other basic metadata
    info_dict = {
        "Filename": image.filename,
        "Image Size": image.size,
        "Image Height": image.height,
        "Image Width": image.width,
        "Image Format": image.format,
        "Image Mode": image.mode,
        "Frames in Image": getattr(image, "n_frames", 1)
    }

    for label, value in info_dict.items():
        #print(f"{label:25}: {value}")
        newfile.write(f"{label:25}: {value}"+'\n')

    # extract EXIF data
    exifdata = image.getexif()

    # iterating over all EXIF data fields
    for tag_id in exifdata:
        # get the tag name, instead of human unreadable tag id
        tag = TAGS.get(tag_id, tag_id)
        data = exifdata.get(tag_id)
        # decode bytes
        if isinstance(data, bytes):
            data = data.decode()
        #print(f"{tag:25}: {data}")
        newfile.write(f"{tag:25}: {data}"+'\n')

And here is one sample output:

Filename                 : C:\Users\Eddie\Pictures\pics\X01CJ0035.JPG
Image Size               : (600, 400)
Image Height             : 400
Image Width              : 600
Image Format             : JPEG
Image Mode               : RGB
Frames in Image          : 1
ResolutionUnit           : 2
ExifOffset               : 168
Software                 : Adobe Photoshop CC 2018 (Windows)
Orientation              : 1
DateTime                 : 2019:02:27 16:41:17
XResolution              : 72.0
YResolution              : 72.0

Also does image.filename have to give the entire path as well as the name of the file?

So my question is why do i get the error and does image.filename always give the full path? Any help would be great!

That means the file is *not* UTF8 and possibly not even text. Try using the techniques [in this question](https://stackoverflow.com/questions/4764932/in-python-how-do-i-read-the-exif-data-for-an-image) to print out all tags and their values without trying to decode them, to see what's going on at least — Panagiotis Kanavos, Jul 14 '22 at 14:44
Thank you that helped, I went without decoding it and I still got the needed info, do you know anything about just the filename and not the whole path? — Edward Wynman, Jul 14 '22 at 14:52
It's better to use `pathlib` instead of `os.path` to work with paths. With pathlib, you can convert the path to a ... `Path` and get the name with `thatPath.name`, eg `Path(image.filename).name` — Panagiotis Kanavos, Jul 14 '22 at 15:01

Python script to get metadata utf-8 UnicodeDecodeError after running 50%

0 Answers0