4

I have hundreds of images of DNA nanotubes from fluorescence microscopy experiments and I would like to measure the distribution of tube lengths in an automated way using image processing. Here is an example microscope image:

example DNA nanotube image

I have tried a few feature extraction methods using python and skimage. I have tried using Canny edge detection, which successfully creates an outline of each nanotube, however it is unclear to me how to go from these outlines to a definitive measure of length. After applying Canny edge detection I have tried using a probabilistic Hough transform to fit straight lines to the curves, which would make length measurement straightforward. As you can see in these results though:

Canny edge detection and hough transform results

line fitting is inconsistent and multiple lines are created in parallel for the same tube structure.

Does anyone know of a straightforward method to measure these tube lengths?

Spektre
  • 49,595
  • 11
  • 110
  • 380
mpacella
  • 41
  • 3
  • 1
    This is way outside of my wheelhouse, but it seems like you want to find the center lines of the shapes created by the Canny edge detection. Here's a related question with OpenCV; perhaps the links there will give you some ideas? https://stackoverflow.com/questions/21039535/opencv-extract-path-centerline-from-arbitrary-area – Jordan Running Jun 21 '17 at 15:43
  • 1
    I tried a Canny edge detection, followed by a morphological closing to fill the gaps, then a skeletonisation to find the centrelines of each shape. I overlaid what I found on top of the original in red. See what you think of the results... http://thesetchells.com/StackOverflow/nano.png – Mark Setchell Jun 21 '17 at 16:27
  • Thank you both! Skeletonize definitely looks like the way to go. @MarkSetchell this looks like precisely what I want! Some of the tubes have small branches forking off of the main line but those should be easy enough to remove by just filtering based on line length. Is there an easy way to access the beginning and end points of each line, as well as the total length? – mpacella Jun 21 '17 at 18:01
  • 1
    why not just binarize + fill the gaps with morphology operators and then just count the pixels per opbject with flood fill and divide by average tube thickness ? The thickness can be also found if you use A* like flood fill you can find 2 most distant points so the thickness will be perpendicular to it ...so just count the pixels in that direction ... Also A* filling will get you the length directly if cast from endpoint to endpoint. Also A* will detect if you got intersected tubes (more than 2 local max) – Spektre Jun 22 '17 at 06:54
  • @Spektre great idea! It is taking advantage of the fact that I know the tubes should be of consistent width. As you mentioned, intersecting tubes would create a problem for this though. What do you mean "Also A* will detect if you got interested tubes"? If there was a way to detect intersected tubes and either discard both of figure out a way to measure the individual lengths that would be ideal. – mpacella Jun 22 '17 at 17:29
  • @mpacella in A* filling you are increasing the filled value to unfilled areas. So all the endpoints should be a local max value if there is more then 2 then you got intersection. ... so find all local maxima (all neighbors are lesser then the cell value itself) ... to make this robust you need to do A* twice. first one to find any endpoint and second to start from endpoint to avoid edge case problems. That also gives you all the endpoints so you can do combinations of found paths to deduce which part is which tube ... – Spektre Jun 22 '17 at 19:15
  • @Spektre thanks again for the help! I haven't had any luck searching for "A*" filling, is there another term it might be known by? Thanks! – mpacella Aug 03 '17 at 15:26
  • @mpacella hard to say English is not my native language and many therms are very different after translation. May be this would help [Backtracking in A star](https://stackoverflow.com/a/28317199/2521214) If you want I can create an answer with my C++ implementation (that counts the area size in pixels) but not today (its too late) maybe tomorrow – Spektre Aug 03 '17 at 17:53
  • @mpacella added answer with some detail... – Spektre Aug 04 '17 at 08:29

1 Answers1

2

I would start like this:

  1. binarize image
  2. find each set pixel of tube
  3. flood fill that position by tube color

    use any filling algorithm with 8-neighbors and during filling also count recolored pixels in some counter cnt.

    if area size cnt too small recolor it to background otherwise account its size cnt/average_tube_width to the histogram.

Here simple C++ example of this:

picture pic0,pic1;
    // pic0 - source img
    // pic1 - output img
//                    0xAARRGGBB
const DWORD col_backg=0x00202020;   // gray
const DWORD col_tube =0x00FFFFFF;   // white
const DWORD col_done =0x0000A0FF;   // aqua
const DWORD col_noise=0x00000080;   // blue
const DWORD col_error=0x00FF0000;   // red  (too smal _hist value)
const DWORD col_hist =0x00FFFF00;   // yellow
const DWORD col_test =0x01000000;   // A* filling start color (must be bigger then all other colors used)
int x,y,xx,yy,i;
DWORD c;
const int _hist=256;    // max area size for histogram
int hist[_hist];        // histogram  

// copy source image to output
pic1=pic0;
pic1.enhance_range();               // maximize dynamic range <0,255>^3
pic1.pixel_format(_pf_u);           // convert to grayscale <0,765>
pic1.threshold(100,766,col_backg,col_tube); // threshold intensity to binarize image
pic1.pf=_pf_rgba;                   // set as RGBA (without conversion)

// clear histogram
for (i=0;i<_hist;i++) hist[i]=0;
// find all tubes
for (y=0;y<pic1.ys;y++)
 for (x=0;x<pic1.xs;x++)
  if (pic1.p[y][x].dd==col_tube)
    {
    pic1.Astarfill(x,y,col_test);   // fill count area (8 neighbors)
    if (pic1._floodfill_n>5)        // if valid size
        {
        c=col_done;                 // set recolor color to done
        // update histogram
        if (pic1._floodfill_n<_hist) hist[pic1._floodfill_n]++;
         else c=col_error;
        }
    else c=col_noise;
    // recolor filled bbox with c
    for (yy=pic1._floodfill_y0;yy<=pic1._floodfill_y1;yy++)
     for (xx=pic1._floodfill_x0;xx<=pic1._floodfill_x1;xx++)
      if (pic1.p[yy][xx].dd>=col_test)
       pic1.p[yy][xx].dd=c;
    }
// render histogram
for (x=0;x<_hist;x++)
 for (i=0,y=pic1.ys-1;(y>=0)&&(i<hist[x]<<2);y--,i++)
   pic1.p[y][x].dd=col_hist;

The result for your input image is:

result

The yellow lines are the lengths distribution (x axis is tube length and y is probability)

I use my own picture class for images so some members are:


xs,ys is size of image in pixels
p[y][x].dd is pixel at (x,y) position as 32 bit integer type
clear(color) clears entire image with color
resize(xs,ys) resizes image to new resolution
bmp is VCL encapsulated GDI Bitmap with Canvas access
pf holds actual pixel format of the image:

enum _pixel_format_enum
    {
    _pf_none=0, // undefined
    _pf_rgba,   // 32 bit RGBA
    _pf_s,      // 32 bit signed int
    _pf_u,      // 32 bit unsigned int
    _pf_ss,     // 2x16 bit signed int
    _pf_uu,     // 2x16 bit unsigned int
    _pixel_format_enum_end
    };


color and pixels are encoded like this:

union color
    {
    DWORD dd; WORD dw[2]; byte db[4];
    int i; short int ii[2];
    color(){}; color(color& a){ *this=a; }; ~color(){}; color* operator = (const color *a) { dd=a->dd; return this; }; /*color* operator = (const color &a) { ...copy... return this; };*/
    };


The bands are:

enum{
    _x=0,   // dw
    _y=1,

    _b=0,   // db
    _g=1,
    _r=2,
    _a=3,

    _v=0,   // db
    _s=1,
    _h=2,
    };

I also use mine dynamic list template so:


List<double> xxx; is the same as double xxx[];
xxx.add(5); adds 5 to end of the list
xxx[7] access array element (safe)
xxx.dat[7] access array element (unsafe but fast direct access)
xxx.num is the actual used size of the array
xxx.reset() clears the array and set xxx.num=0
xxx.allocate(100) preallocate space for 100 items

Now the A* filling is implemented like this:

// these are picture:: members to speed up recursive fillings
int   _floodfill_rn;                                            // anti stack overflow recursions
List<int> _floodfill_xy;                                        // anti stack overflow pendng recursions
int   _floodfill_a0[4];                                         // recursion filled color and fill color
color _floodfill_c0,_floodfill_c1;                              // recursion filled color and fill color
int   _floodfill_x0,_floodfill_x1,_floodfill_n;                 // recursion bounding box and filled pixel count
int   _floodfill_y0,_floodfill_y1;

// here the filling I used
void picture::Astarfill(int x,int y,DWORD id)
    {
    _floodfill_c0=p[y][x];
    _floodfill_c1.dd=id;
    _floodfill_n=0;
    _floodfill_x0=x;
    _floodfill_y0=y;
    _floodfill_x1=x;
    _floodfill_y1=y;
    _floodfill_rn=0;
    _floodfill_xy.num=0;

    if ((x<0)||(x>=xs)||(y<0)||(y>=ys)) return;

    int i;
    List<int> xy0,xy1,*p0,*p1,*pp;
    // first point
    p0=&xy0;
    p1=&xy1;
    p0->num=0;
    p0->add(x); p0->add(y); p[y][x].dd=id; _floodfill_n++;
    for (;p0->num;)
        {
        p1->num=0; id++;
        for (i=0;i<p0->num;)
            {
            x=p0->dat[i]; i++;
            y=p0->dat[i]; i++;
            x--; if ((x>=0)&&(y>=0)&&(x<xs)&&(y<ys)&&(p[y][x].dd==_floodfill_c0.dd)){ p1->add(x); p1->add(y); p[y][x].dd=id; _floodfill_n++; if (_floodfill_x0>x) _floodfill_x0=x; if (_floodfill_y0>y) _floodfill_y0=y; if (_floodfill_x1<x) _floodfill_x1=x; if (_floodfill_y1<y) _floodfill_y1=y; }
            y--; if ((x>=0)&&(y>=0)&&(x<xs)&&(y<ys)&&(p[y][x].dd==_floodfill_c0.dd)){ p1->add(x); p1->add(y); p[y][x].dd=id; _floodfill_n++; if (_floodfill_x0>x) _floodfill_x0=x; if (_floodfill_y0>y) _floodfill_y0=y; if (_floodfill_x1<x) _floodfill_x1=x; if (_floodfill_y1<y) _floodfill_y1=y; }
            x++; if ((x>=0)&&(y>=0)&&(x<xs)&&(y<ys)&&(p[y][x].dd==_floodfill_c0.dd)){ p1->add(x); p1->add(y); p[y][x].dd=id; _floodfill_n++; if (_floodfill_x0>x) _floodfill_x0=x; if (_floodfill_y0>y) _floodfill_y0=y; if (_floodfill_x1<x) _floodfill_x1=x; if (_floodfill_y1<y) _floodfill_y1=y; }
            x++; if ((x>=0)&&(y>=0)&&(x<xs)&&(y<ys)&&(p[y][x].dd==_floodfill_c0.dd)){ p1->add(x); p1->add(y); p[y][x].dd=id; _floodfill_n++; if (_floodfill_x0>x) _floodfill_x0=x; if (_floodfill_y0>y) _floodfill_y0=y; if (_floodfill_x1<x) _floodfill_x1=x; if (_floodfill_y1<y) _floodfill_y1=y; }
            y++; if ((x>=0)&&(y>=0)&&(x<xs)&&(y<ys)&&(p[y][x].dd==_floodfill_c0.dd)){ p1->add(x); p1->add(y); p[y][x].dd=id; _floodfill_n++; if (_floodfill_x0>x) _floodfill_x0=x; if (_floodfill_y0>y) _floodfill_y0=y; if (_floodfill_x1<x) _floodfill_x1=x; if (_floodfill_y1<y) _floodfill_y1=y; }
            y++; if ((x>=0)&&(y>=0)&&(x<xs)&&(y<ys)&&(p[y][x].dd==_floodfill_c0.dd)){ p1->add(x); p1->add(y); p[y][x].dd=id; _floodfill_n++; if (_floodfill_x0>x) _floodfill_x0=x; if (_floodfill_y0>y) _floodfill_y0=y; if (_floodfill_x1<x) _floodfill_x1=x; if (_floodfill_y1<y) _floodfill_y1=y; }
            x--; if ((x>=0)&&(y>=0)&&(x<xs)&&(y<ys)&&(p[y][x].dd==_floodfill_c0.dd)){ p1->add(x); p1->add(y); p[y][x].dd=id; _floodfill_n++; if (_floodfill_x0>x) _floodfill_x0=x; if (_floodfill_y0>y) _floodfill_y0=y; if (_floodfill_x1<x) _floodfill_x1=x; if (_floodfill_y1<y) _floodfill_y1=y; }
            x--; if ((x>=0)&&(y>=0)&&(x<xs)&&(y<ys)&&(p[y][x].dd==_floodfill_c0.dd)){ p1->add(x); p1->add(y); p[y][x].dd=id; _floodfill_n++; if (_floodfill_x0>x) _floodfill_x0=x; if (_floodfill_y0>y) _floodfill_y0=y; if (_floodfill_x1<x) _floodfill_x1=x; if (_floodfill_y1<y) _floodfill_y1=y; }
            }
        pp=p0; p0=p1; p1=pp;
        }
    _floodfill_rn=id-1;
    }

If you want to improve your counting based on the size then if you got a multiple of avg size you got intersected tubes. So you can either try to compute how many of them are there and account the avg size to the histogram instead of using the full size or us A* filling and locate the endpoints. If you find more than 2 endpoints you can try to distinct between tubes.

A* filling

So first use A* fill to find local max and then A* fill again from that position (so you start filling from endpoint). Then find all local maxs and based on avg size and actual size of tube and number of endpoints you can determine how many tubes are grouped together and how many of them are interconnected. Then you can try to do all possible combinations between endpoints and the one closest to avg size of tube per each tube is the most "correct" one. That should enhance precision a lot more.

In case you do not know the avg tube thickness you can use A* filling for non intersecting tubes directly to acquire the length. So in the second fill (from endpoint) when the filling stops the last filled ID is the length of tube in pixels.

Spektre
  • 49,595
  • 11
  • 110
  • 380