3

I have a huggingface dataset with an image column

ds["image"][0]

<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=300x300 at 0x1682DD820>

When I save to disk, load it later I get the image column as bytes:

ds.save_to_disk("./dataset.hf")
ds.load_from_disk("./dataset.hf")
ds["image"][0]

{'bytes': b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x00\x00\x01\x00\x01\x00\x00\xff\xdb\x00C\x00\x08\x06\x06\x07\x06\x05\x08\x07\x07\x07\t\t\x08\n\x0c\x14\r\x0c\x0b\x0b\x0c\x19\x12\x13\x0f\x14\x1d\x1a\x1f\x1e\x1d\x1a\x1c\x1c $.\', 
'path': None}

The image column is converted to bytes.

How can I load the dataset and make sure my image column is PIL.JpegImagePlugin.JpegImageFile?

Vincent Claes
  • 3,960
  • 3
  • 44
  • 62

0 Answers0