3

I want to normalize the exposure and color palettes of a set of images. For context, this is for training a neural net in image classification on medical images. I'm also doing this for hundreds of thousands of images, so efficiency is very important.

So far I've been using VIPS, specifically PyVIPS, and would prefer a solution using that library. After finding this answer and looking through the documentation, I tried

x = pyvips.Image.new_from_file('test.ndpi')
x = x.hist_norm()
x.write_to_file('test_normalized.tiff')

but that seems to always produce a pure-white image.

Twiffy
  • 425
  • 1
  • 4
  • 10

1 Answers1

3

You need hist_equal for histogram equalisation.

The main docs are here:

https://libvips.github.io/libvips/API/current/libvips-histogram.html

However, that will be extremely slow for large slide images. It will need to scan the whole slide once to build the histogram, then scan again to equalise it. It would be much faster to find the histogram of a low-res layer, then use that to equalise the high-res one.

For example:

#!/usr/bin/env python3

import sys
import pyvips

# open the slide image and get the number of layers ... we are not fetching 
# pixels, so this is quick
x = pyvips.Image.new_from_file(sys.argv[1])
levels = int(x.get("openslide.level-count"))

# find the histogram of the highest level ... again, this should be quick
x = pyvips.Image.new_from_file(sys.argv[1], 
                               level=levels - 1)
hist = x.hist_find()

# from that, compute the transform for histogram equalisation
equalise = hist.hist_cum().hist_norm()

# and use that on the full-res image
x = pyvips.Image.new_from_file(sys.argv[1])

x = x.maplut(equalise)

x.write_to_file(sys.argv[2])

Another factor is that histogram equalisation is non-linear, so it will distort lightness relationships. It can also distort colour relationships and make noise and compression artifacts look crazy. I tried that program on an image I have here:

$ ~/try/equal.py bild.ndpi[level=7] y.jpg

enter image description here

The stripes are from the slide scanner and the ugly fringes from compression.

I think I would instead find image max and min from the low-res level, then use them to do a simple linear stretch of pixel values.

Something like:

x = pyvips.Image.new_from_file(sys.argv[1])
levels = int(x.get("openslide.level-count"))
x = pyvips.Image.new_from_file(sys.argv[1],
                               level=levels - 1)
mn = x.min()
mx = x.max()
x = pyvips.Image.new_from_file(sys.argv[1])
x = (x - mn) * (256 / (mx - mn))
x.write_to_file(sys.argv[2])

Did you find the new Region feature in pyvips? It makes generating patches for training MUCH faster, up to 100x faster in some cases:

https://github.com/libvips/pyvips/issues/100#issuecomment-493960943

jcupitt
  • 10,213
  • 2
  • 23
  • 39
  • Thanks! The Region feature looks great. For hist_equal, I'm also getting artifacts. Can you say more about how to get max and min and do a linear stretch? – Twiffy Nov 04 '19 at 21:14
  • Also, how does one write a region to a file? – Twiffy Nov 04 '19 at 23:15
  • 1
    I added a mx/mn example. There's no point using fetch if you are going via files -- it'll be incredibly slow whatever you do. Fetch is handy if you want to feed arrays of pixels directly to pytorch etc. -- you'll see an enormous speedup. – jcupitt Nov 05 '19 at 10:53
  • 1
    Oh, I guess you mean for debugging? You can wrap an image around an array of byte values with `new_from_memory`. – jcupitt Nov 05 '19 at 10:54