0

I have a binary file containing image pixel data consisting only bytes equal to 0 or 1 (0x00 or 0x01). I want to create a black and white image from this data.

A reproducible example of my code so far using Pillow in Python (note I would normally be loading the data from file rather than creating it in place):

import numpy as np
from PIL import Image, ImageOps

w = 128 # image could be much bigger, keeping it small for the example
h = 128

data = np.random.randint(2, size=w*h, dtype=np.uint8).tobytes()

img = Image.frombuffer('L', (h,w), data)
img.show()

The problem is, the image is interpreting pixel data as grayscale, so a value of 1 is almost black. I want 1 to be white (i.e. 255 in grayscale) and 0 to be black.

Is there functionality within Pillow where it "just knows" that I want 1=white & 0=black (rather than 255=white & 0=black)?

EDIT

With help of answers below, I have found a number of solutions where I can modify my data to replace 1's with 255's... I have posted an answer with results of these solutions. Some of them are very fast as they are, so probably not much more performance to be gained.

But IF there is a nice solution which avoids this overhead altogether, i.e.to tell Pillow directly to just treat 1's as white and 0's as black, that would be ideal.

ogb119
  • 119
  • 1
  • 10
  • Why not iterate over `data` and change 1 to 255? – zvi Nov 07 '20 at 22:27
  • I'd discarded that as I thought it would be too slow, but admittedly I didn't try it. I will give it a go, but need to work out how to iterate over a `bytes` object and modify in place... in reality i am dealing with thousands of images with thousands of pixels each, so speed is essential! Thanks for the suggestion – ogb119 Nov 07 '20 at 22:37
  • @ogb119 Can you share a [mcve] ? – AMC Nov 08 '20 at 01:10
  • I'm having a hard time understanding what the issue is. It seems it might be the speed? But you don't say how long anything takes and it is unclear at which point you want to start and end timing, also what speed would be acceptable. It also seems unlikely that such a small of 32x32 (or is it 128x128) could take that long to do anything with. Please clarify your question. Thank you. – Mark Setchell Nov 08 '20 at 09:58
  • edited the question to clarify. I have also posted an answer which (I think) deals with the "change 1's to 255's" approach. I still think a solution which tells Pillow to directly treat 1's as white and 0's as black would be neat... but if that functionality doesn't exist then so be it! – ogb119 Nov 08 '20 at 11:35
  • 1
    Still not sure I understand what you are really trying to do, or why 2ms versus 12ms to load an image is so critical, but in answer to your headline question of *"how can I make PIL treat 0 as black and 1 as white"*, you could simply make a palette image where palette entry 0 is black and palette entry 1 is white. This may or may not be appropriate to your use case, but as I said, I am not sure what you are really driving at. Example here https://stackoverflow.com/a/64682849/2836621 – Mark Setchell Nov 08 '20 at 12:22
  • Thanks @MarkSetchell - I think your palette suggestion looks like it answers my question. I couldn't find good documentation for it when I originally looked but your link looks helpful and I'll give it a go. I think you do understand *what* I'm trying to do, but perhaps I haven't been explicit enough on *why*... performance was the issue with my original attempts, but as per my posted answer, I have found solutions for which performance is not really an issue now. – ogb119 Nov 08 '20 at 12:40
  • If you are looking for performance increases, it's possible that you have more than one image, so multi-threading or multi-processing could potentially yield far greater gains than anything discussed to date... – Mark Setchell Nov 08 '20 at 13:00
  • 1
    Your palette solution has worked nicely, @MarkSetchell.. thank you. See my updated answer below for details. And yes, I could look in to multithreading, now that I seem to have the fastest per-thread solution! But it's probably fast enough as it is now. – ogb119 Nov 08 '20 at 13:42
  • Cool Good luck with your project! – Mark Setchell Nov 08 '20 at 13:47

4 Answers4

2

UPDATE

With thanks to the comment from @MarkSetchell pointing me to https://stackoverflow.com/a/64682849/2836621, I have used Palette's to tell Pillow directly to treat 0 as black, and 1 as white.

Here is the code:

def create_images_palette():
    palette = [  0,  0,  0,
               255,255,255]
    palette = palette + [0]*(768-len(palette))
    imgs = []
    with open(filename, 'rb') as ifile:
        for data in iter(partial(ifile.read, w*h), b''):
            img = Image.frombuffer('L', (h,w), data)
            img.putpalette(palette)
            imgs.append(img)
    return imgs

Results compared to the winners from tests below, but this time I used w=1024, h=1024, N=1000 (more realistic for my usage):

create_images3         0.42854620320013054
create_images6         0.32936501539988966
create_images7         0.31196588300008443
create_images_palette  0.21011565389999304

So the palette solutions wins.


With the help of the answers, I have tested a number of solutions where I can modify my data to replace 1's with 255's. Here are the results of those tests.

I will accept an answer over this one that, as per the question, tells Pillow to directly treat 1's as white and 0's as black. Failing that, some of these solutions work and perform well for my needs.

Note that in my real world application, I can have data for numerous images back-to-back in one binary file. These solutions reflect that.

import numpy as np
import os
from functools import partial
from PIL import Image, ImageOps

w = 128 # image could be much bigger, keeping it small for the example
h = 128
N = 100

filename = 'byte_imgs.dat'
data = np.random.randint(2, size=w*h*N, dtype=np.uint8).tobytes()
f = open(filename, 'wb')
f.write(data)
f.close()
print("image data written to file")

def create_images1():
    imgs = []
    with open(filename, 'rb') as ifile:
        for data in iter(partial(ifile.read, w*h), b''):
            img = Image.frombuffer('L', (h,w), data)
            img = ImageOps.autocontrast(img)
            imgs.append(img)
    return imgs

def create_images2():
    imgs = []
    with open(filename, 'rb') as ifile:
        for data in iter(partial(ifile.read, w*h), b''):
            data = bytes([0 if b==0 else 255 for b in data])
            img = Image.frombuffer('L', (h,w), data)
            imgs.append(img)
    return imgs

def create_images3():
    imgs = []
    with open(filename, 'rb') as ifile:
        for data in iter(partial(ifile.read, w*h), b''):
            mem = memoryview(data).cast('B', shape=[w,h])
            arr = np.asarray(mem)
            img = Image.fromarray(arr*255)
            imgs.append(img)
    return imgs

def create_images4():
    data = bytearray(w*h)
    imgs = []
    with open(filename, "rb") as f:
        byte = f.read(1)
        while byte != b'':
            for i in range(w*h):
                data[i] = int.from_bytes(byte, "big") * 0xFF
                byte = f.read(1)
            img = Image.frombuffer('L', (h,w), bytes(data))
            imgs.append(img)
    return imgs

def create_images5():
    imgs = []
    with open(filename, "rb") as f:
        i = 0
        data = bytearray()
        byte = f.read(1)
        while byte != b'':
            if byte != b'\x00':
                data.append(0xff)
            else:
                data.append(0x00)
            byte = f.read(1)
            i+=1
            if i == w*h:
                img = Image.frombuffer('L', (h,w), bytes(data))
                imgs.append(img)
                i=0
                data = bytearray()
    return imgs

def create_images6():
    imgs = []
    with open(filename, 'rb') as ifile:
        while True:
            arr = np.fromfile(ifile, dtype=np.uint8, count=w*h)
            if arr.size < w*h:
                break
            img = Image.fromarray(arr.reshape(w,h)*255)
            imgs.append(img)
    return imgs

def create_images7():
    imgs = []
    with open(filename, 'rb') as ifile:
        for dat in iter(partial(ifile.read, w*h), b''):
            arr = np.frombuffer(dat, dtype=np.uint8).reshape((w,h))
            img = Image.fromarray(arr*255)
            imgs.append(img)
    return imgs

def create_images8():
    imgs = []
    data = np.fromfile(filename, dtype=np.int8)
    n = int(data.size / (w*h))
    for i in range(n):
        offset = i*w*h
        state = np.reshape(data[offset:offset+w*h], (w, h))
        img = Image.fromarray(state*255)
        imgs.append(img)
    return imgs


def create_images9():
    os.system(r"bbe -e 's/\x01/\xff/g' byte_imgs.dat > byte_imgs_new.dat")
    imgs = []
    with open('byte_imgs_new.dat', 'rb') as ifile:
        for data in iter(partial(ifile.read, w*h), b''):
            img = Image.frombuffer('L', (h,w), data)
            imgs.append(img)
    return imgs

import timeit
number = 10
print("create_images1", timeit.timeit('[func() for func in (create_images1,)]', number=number, globals=globals()) / number)
print("create_images2", timeit.timeit('[func() for func in (create_images2,)]', number=number, globals=globals()) / number)
print("create_images3", timeit.timeit('[func() for func in (create_images3,)]', number=number, globals=globals()) / number)
print("create_images4", timeit.timeit('[func() for func in (create_images4,)]', number=number, globals=globals()) / number)
print("create_images5", timeit.timeit('[func() for func in (create_images5,)]', number=number, globals=globals()) / number)
print("create_images6", timeit.timeit('[func() for func in (create_images6,)]', number=number, globals=globals()) / number)
print("create_images7", timeit.timeit('[func() for func in (create_images7,)]', number=number, globals=globals()) / number)
print("create_images8", timeit.timeit('[func() for func in (create_images8,)]', number=number, globals=globals()) / number)
print("create_images9", timeit.timeit('[func() for func in (create_images9,)]', number=number, globals=globals()) / number)

RESULTS

average runtime for each function reported in seconds. create_images3() and create_images7() are the clear winners in this test.

create_images1 0.012226119600018136
create_images2 0.09197459420001905
create_images3 0.0021811368000271615
create_images4 0.30249598119999066
create_images5 0.3393335546000344
create_images6 0.0033311289999801374
create_images7 0.0021913534999839614
create_images8 0.015457254699958867
create_images9 0.044248268000046664
ogb119
  • 119
  • 1
  • 10
  • In your `create_images9` you don't need the line `img = ImageOps.autocontrast(img)`. This was the point of my variant. – Yuri Khristich Nov 08 '20 at 12:00
  • apologies -- the perils of copy and paste. Corrected (but still not that fast relative to some others unfortunately) – ogb119 Nov 08 '20 at 12:09
  • It's okay. ) But I think if you could apply the `bbe` on all your data files before you start to make images it will be a fastest variant since there is no extra computations. – Yuri Khristich Nov 08 '20 at 12:21
  • I am doing post-processing on image data straight after they are created from a main program. I have a shell script which runs my main program and then this post-processing script... so yes I could add bbe directly to the shell script before I run the post-postprocessing, but I don't think that will make a massive overall difference. – ogb119 Nov 08 '20 at 12:45
1

You can change the data while reading it, something like this:

with open("file.dat", "rb") as f:
    byte = f.read(1)
    while byte != b"":
        if byte !=0:
            # append to data 255
        else:
            # append to data 0
        byte = f.read(1)
zvi
  • 3,677
  • 2
  • 30
  • 48
  • how would the "append to data 255" work... say I begin with an empty, pre-allocated bytes object `data = bytes(rows*cols)`. How would I write a byte to that object and increment the pointer? – ogb119 Nov 07 '20 at 22:55
  • Use `bytearray` as it mutable. – zvi Nov 07 '20 at 23:04
  • unfortunately this approach turned out to be orders of magnitude slower than my original approach in real world tests. – ogb119 Nov 08 '20 at 00:12
1

Try to do it this way, changing the buffer before sending to PIL:

import numpy as np
from PIL import Image, ImageOps

w = 128
h = 128

data = np.random.randint(2, size=w*h, dtype=np.uint8).tobytes()
data = bytes([0 if b==0 else 255 for b in data])
img = Image.frombuffer('L', (h,w), data)
img.show()
zvi
  • 3,677
  • 2
  • 30
  • 48
1

You can try to change '1' with '255' in your source data with bbe https://sourceforge.net/projects/bbe-/ or something alike.

bbe -e 's/\x01/\xff/g' file.dat > file_new.dat

It changes:

\x01\x00\x00\x01\x01\x01\x00\x00\x00\x00...

to:

\xff\x00\x00\xff\xff\xff\x00\x00\x00\x00...


Here is my test code (just to be sure):

import random
from PIL import Image, ImageOps
import os

w = 128
h = 128

# make the data array
data = b''.join([random.randint(0,1).to_bytes(1,'big') for _ in range(w*h)])

# save the data to a file
file = open('file.dat', 'wb')
file.write(data)
file.close()

# make an image with autocontrast
img = Image.frombuffer('L', (h,w), data)
img = ImageOps.autocontrast(img)
img.save('img.png')

# replace bytes in the data file
os.system(r"bbe -e 's/\x01/\xff/g' file.dat > file_new.dat")

# read the new data
file = open('file_new.dat', 'rb')
data_new = file.read()
file.close()

# make an image with no autocontrast
img = Image.frombuffer('L', (h,w), data_new)
img.save('img_new.png')

Output (img.png / img_new.png):

enter image description here

Yuri Khristich
  • 13,448
  • 2
  • 8
  • 23