How to find flat regions based on depth information

Question

I have a drone that can fly autonomously and I want it to land safely.. I have a depth camera underneath it and now I can't find a way to detect flat regions to tell the drone to land on it. I'm using realsense D435i camera..this is a sample of depth image from the downwards facing camera.

can u help, pleaase?

1. reconstruct 3D points (relative to camera by reversing the projections of camera so you need either FOV or focal length and resolution ... then for each pixel compute `x,y,z`) 2. group neighboring points with the "same" normal for example with `dot(n1,n2) >= 0.9`. If you want something more in depth you should add some more info to your question like sample image, depth camera properties etc ... — Spektre, Jan 19 '21 at 07:44
see bullet #3 in [Does Kinect Infrared View Have an offset with the Kinect Depth View](https://stackoverflow.com/a/19905805/2521214) — Spektre, Jan 19 '21 at 08:53
Hi, @Spektre thanks a lot for your answer. I added more info to my question as you referred. but I do not follow what you suggest! Can you please illustrate more. thanks. — lofy, Jan 20 '21 at 03:38
what is on the image ? what ranges ... your sensor is `0.3-3m` range only so If those are buildings it would be out of range and if not then the output is weirdly logarithmic ... also darker means further away? which color intensities are valid and which out of range or error ... after reconstructios the output does not make any sense to me without more context. I added preliminar answer with few images so check them out which one is closer to what you got ... — Spektre, Jan 20 '21 at 10:15
also some calibration image would be a good idea to post like image of known geometric shape with known dimensions (like cube box, or sphere ...) taken at few distances so its possible to infer the 3D reconstruction equations as accurately as possible — Spektre, Jan 20 '21 at 10:27
Please don't make more work for other people by vandalizing your posts. By posting on the Stack Exchange network, you've granted a non-revocable right, under the [CC BY-SA 4.0 license](//creativecommons.org/licenses/by-sa/4.0/), for Stack Exchange to distribute that content (i.e. regardless of your future choices). By Stack Exchange policy, the non-vandalized version of the post is the one which is distributed. Thus, any vandalism will be reverted. If you want to know more about deleting a post please see: [How does deleting work?](//meta.stackexchange.com/q/5221) — gre_gor, Jun 10 '22 at 20:08

Spektre · Accepted Answer · 2021-01-30T10:49:41.987

According to this D435i infosheet

RGB FOV(1920x1080):  64° × 41° × 77° (±3°) // HxVxD what to heck is D???
depth FOV(1280x720): 86° × 57° (±3°)
min depth(1280x720): 26cm
max depth:           3m

However I see no info about the depth frame if it is post processed with linear output or with perspective. You can infer it from images if the objects get smaller with distance it is still in perspective I do not know but will assume its the case.

your image is 848x480 so I will assume its just linearly scaled by 1/1.5 so the FOV is the same.

linearize depth into regular grid sampled points/mesh

so each pixel must be converted into 3D position relative to the camera. So simply cast ray from the camera focal point going through pixel and stop at its distance. The resulting position is the 3D position we want. Doing this for each point will result in 2D grid of connected points (surface mesh). However we do not know if the depth is euclid or perpendicular distance (cos corrected) so we need to try both combinations and chose the one which has better result (building are boxes and not pyramids...)

From the info I got at hand This is the best I could get from your input image:

so I took all the pixels from your image, converted x,y position into spherical longitude,latitude angles (linear interpolation from FOV of depth camera and image resolution) then converted pixel intensity into depth and converted that to Cartesian.
```
 // depth -> PCL
 for (y=0;y<ys;y++)
     {
     pd=(int*)depth->ScanLine[y];                // pointer to y-th scan line in depth image
     b=depthFOVy*(((double(y)-depthy0)/double(ys-1))-0.5);       // latitude
     for (x=x3=0;x<xs;x++,x3+=3)
         {
         a=depthFOVx*(((double(x)-depthx0)/double(xs-1))-0.5);   // longitude
         r=pd[x]&255;                            // RGB from drepth (just single channel
         r/=255.0;                               // linear scale
         if (int(depthcfg&_DepthImage_invert_z)) r=1.0-r;
         r=depthz0+(r*(depthz1-depthz0));        // range <z0,z1>
         // spherical -> cartesian
         p[0]=cos(a)*cos(b);         // x
         p[1]=sin(a)*cos(b);         // y
         p[2]=       sin(b);         // z
         if (int(depthcfg&_DepthImage_cos_correct_z)) r/=p[0];
         pnt[y][x3+0]=r*p[1];
         pnt[y][x3+1]=r*p[2];
         pnt[y][x3+2]=r*p[0];
         }
     }
```
where pnt[][] is 2D array of 3D points stored as consequent x,y,zand:
```
pcl.depthz0=0.8;
pcl.depthz1=1.00;
pcl.depthcfg=_DepthImage_invert_z|_DepthImage_cos_correct_z;
```
The image resolution is xs,ys.

for more info see:
- Does Kinect Infrared View Have an offset with the Kinect Depth View
compute normal for each point (surface) from its neighbors

simple cross product will do, however its important to use correct order of operands so the resulting normals points the same direction and not opposing ... otherwise we would need to deal with it later by using abs of dot result.
Do a histogram on the normals (aligned to limited set of directions)

Histogram is relation between some value of variable and its probability or occurrence. For example image histogram tells us how many pixels are colored with each color (range/bins). For our purposes flat areas (planes) should have the same normal (in your depth image is too much noise but I assume your gradient function that converts the depth into color is weirdly scaled and non linear emphasizing the noise as standard depth cameras does not have such big noises) so normals of a flat zones should be a very occurrent hence it should be represented by peaks in histogram. Firs we should convert 3D normal into single integer value (for simplifying the code) and also limit possible directions to only some fixed ones (to rule out noise and compress the output data size).

The normalized normal is 3D vector with coordinates in range <-1,+1> so we can convert that range to unsigned integer number of some bitwidth and stack up 3 of those into single integer. For example 5bit per coordinate:
```
normal = (nx,ny,nz)
nx = 31.0*(nx+1.0)*0.5; // <-1,+1> -> <0,31>
ny = 31.0*(nx+1.0)*0.5; // <-1,+1> -> <0,31>
nz = 31.0*(nx+1.0)*0.5; // <-1,+1> -> <0,31>
int ni =  int(nx) + int(ny)<<5 + int(nz)<<10
```
so any direction will be converted to integer <0,2^15) so <0,32767> which is nicely manageable. So simply create some array int his[32768] and reset all its values to zero. Then for each pixel convert its normal to such integer ni and increment its counter his[ni]++; beware if you got big resolution of image you need to take in mind possible overflow of counters. So for 32 bit int the reslution must be less than 2^31 pixels... otherwise you use bigger bitdepth or add condition to avoid incrementing whet max value is reached already.
remove points with very low occurrence of their normal

this will allow us to ignore parts of image which are small surfaces and or are not flat enough. We can set those points to some specific normal like (0,0,0) or have a separate array with flags for this.
find big enough continuous area with the same normals

You can brute force search a box (size of your drone + some margin) in the data (so 2 nested for loops for position and another 2 for the test) and if all points have the same normal you found possible landing zone.

normal n1,n2 s are the "same" if:
```
dot(n1,n2)/(|n1|.|n2|) >= cos(margin)
```
where margin is allowed angular difference between normal in deg or rad depends on your cos implementation.

[Edit1] Your data is too noisy for normal approach

here image (after smoothing of data to reduce noise) showing the occurrence of individual normal directions (with this gradient) so blue is lowest occurrence, red is highest, green is in between...

As you can see the most occurrence has pixels out of range or invalid values (hard to say) and the top of boxes that where measured correctly. However most of the surface of your view has weird very big noise (in both amplitude and radius ... those blobs) which I never saw on depth images before. My bet is your conversion from raw depth to "color" gradient is non linear and emphasize noise (that is most likely also the reason why I had so hard time to reconstruct your scene back to 3D) + JPEG lossy compression artifacts.

So unless you provide clear equation for conversion between color of pixel and depth this is a no go. So instead you need different slower approach. I would do a window "brute force" search instead:

define a window

So your drone has some size so we need some rectangle of defined size (slightly bigger than your drone area projected on ground, how much depends on the landing trajectory and flight control stability & precision ...)

So we need some rectangle of size wx * wy in meters (or what ever units you reconstruct the scene with).
test "all" possible positions of window

so simply start at top left corner test if window of our pointcloud is fla enough and if yes you found your solution, if not test position slightly moved to the right ... until end of pointcloud reached. Then start again from left but moved slightly to the bottom. So this will be just 2 nested for loops incrementing by some value smaller than window size (does not need to be just pixel)
for each window location detect if the area is flat

so obtain avg position of each corner of window. That will define a plane so simply test all points in the window, by computing their perpendicular distance to the plane. if bigger than some threshold the area is not flat enough. If all points where OK then you found your landing zone.

Here a preview:

Of result of this approach. The red means invalid color (black on your original image) and green is the first found landing zone. I modified the code for the normal approach to do this so there is quite some code remnant that is not necessary anymore like color, normals etc ... You only need the pointcloud point positions... Here the relevant C++/OpenGL code for this:

First simple depth image to pointcloud class:

const int _DepthImage_invert_z      =0x00000001;    // brighter colros are closer
const int _DepthImage_cos_correct_z =0x00000002;    // depth is perpendicular distance instead of euclid
const int _DepthImage_smooth        =0x00000004;    // average position to reduce noise

class DepthPoint
    {
public:
    double pos[3];  // position (x,y,z)
    double nor[3];  // normal vector (x,y,z)
    double col[3];  // color (r,g,b)
    DepthPoint(){};
    DepthPoint(DepthPoint& a)   { *this=a; }
    ~DepthPoint(){};
    DepthPoint* operator = (const DepthPoint *a) { *this=*a; return this; }
    //DepthPoint* operator = (const DepthPoint &a) { ...copy... return this; }
    };

class DepthImage
    {
public:
    // resoluton
    int xs,ys;                  // resolution
    // point cloud ys x xs
    DepthPoint **pnt;           // pnt[y][x] point for each pixel(x,y) of image
    // depth camera properties
    double depthFOVx,depthFOVy, // [rad]    FOV angles
           depthx0,depthy0,     // [pixels] FOV offset
           depthz0,depthz1;     // [m] intensity <0,255> -> distance <z0,z1>
    int    depthcfg;            // configuration flags
    double error;               // invalid depth
    // color camera properties
    double colorFOVx,colorFOVy, // [rad]    FOV angles
           colorx0,colory0;     // [pixels] FOV offset
    int    colorcfg;            // configuration flags

    DepthImage()
        {
        pnt=NULL; _free();
        depthFOVx=86.0*deg; colorFOVx=86.0*deg;
        depthFOVy=57.0*deg; colorFOVy=57.0*deg;
        depthx0=0.0;        colorx0=0.0;
        depthy0=0.0;        colory0=0.0;
        depthz0=0.0;
        depthz1=6.0;
        depthcfg=0;         colorcfg=0;
        }
    DepthImage(DepthImage& a)   { *this=a; }
    ~DepthImage()   { _free(); }
    DepthImage* operator = (const DepthImage *a) { *this=*a; return this; }
    //DepthImage* operator = (const DepthImage &a) { ...copy... return this; }

    void _free()
        {
        if (pnt)
            {
            if (pnt[0]) delete[] pnt[0];
            delete[] pnt;
            }
        pnt=NULL; xs=ys=0;
        }
    void load(AnsiString _depth,AnsiString _color)
        {
        _free();
        Graphics::TBitmap *depth,*color;
        int i,x,y,*pd,*pc;
        double p[3],q[3],a,b,r;
        // load depth image
        depth=new Graphics::TBitmap;
        picture_load(depth,_depth,NULL);
        depth->HandleType=bmDIB;
        depth->PixelFormat=pf32bit;
        xs=depth->Width;
        ys=depth->Height;
        // allocate PCL
        pnt   =new DepthPoint*[   ys];
        pnt[0]=new DepthPoint [xs*ys];
        for (y=0;y<ys;y++) pnt[y]=pnt[0]+(y*xs);
        // depth -> PCL
        error=depthz1*0.99;
        for (y=0;y<ys;y++)
            {
            pd=(int*)depth->ScanLine[y];                // pointer to y-th scan line in depth image
            b=depthFOVy*(((double(y)-depthy0)/double(ys-1))-0.5);       // latitude
            for (x=0;x<xs;x++)
                {
                a=depthFOVx*(((double(x)-depthx0)/double(xs-1))-0.5);   // longitude
                r=pd[x]&255;                            // RGB from drepth (just single channel
                r/=255.0;                               // linear scale
                if (int(depthcfg&_DepthImage_invert_z)) r=1.0-r;
                r=depthz0+(r*(depthz1-depthz0));        // range <z0,z1>
                // spherical -> cartesian
                p[0]=cos(a)*cos(b);         // x
                p[1]=sin(a)*cos(b);         // y
                p[2]=       sin(b);         // z
                if (int(depthcfg&_DepthImage_cos_correct_z)) r/=p[0];
                // scale and reorder coordinates to match my view
                pnt[y][x].pos[0]=r*p[1];
                pnt[y][x].pos[1]=r*p[2];
                pnt[y][x].pos[2]=r*p[0];
                }
            }

        // smooth positions a bit to reduce noise (FIR averaging smooth/blur filter)
        if (int(depthcfg&_DepthImage_smooth))
         for (i=0;i<10;i++)
            {
            for (y=1;y<ys;y++)
             for (x=1;x<xs;x++)
                {
                vector_mul(p,pnt[y][x].pos,3.0);
                vector_add(p,p,pnt[y][x-1].pos);
                vector_add(p,p,pnt[y-1][x].pos);
                vector_mul(pnt[y][x].pos,p,0.2);
                }
            for (y=ys-1;y>0;y--)
             for (x=xs-1;x>0;x--)
                {
                vector_mul(p,pnt[y-1][x-1].pos,3.0);
                vector_add(p,p,pnt[y-1][x].pos);
                vector_add(p,p,pnt[y][x-1].pos);
                vector_mul(pnt[y][x].pos,p,0.2);
                }
             }
        // compute normals
        for (y=1;y<ys;y++)
         for (x=1;x<xs;x++)
            {
            vector_sub(p,pnt[y][x].pos,pnt[y][x-1].pos);// p = pnt[y][x] - pnt[y][x-1]
            vector_sub(q,pnt[y][x].pos,pnt[y-1][x].pos);// q = pnt[y][x] - pnt[y-1][x]
            vector_mul(p,q,p);                          // p = cross(q,p)
            vector_one(pnt[y][x].nor,p);                // nor[y][x] = p/|p|
            }
        // copy missing edge normals
        for (y=1;y<ys;y++) vector_copy(pnt[y][0].nor,pnt[y][1].nor);
        for (x=0;x<xs;x++) vector_copy(pnt[0][x].nor,pnt[1][x].nor);

        // set color
        for (y=0;y<ys;y++)
         for (x=0;x<xs;x++)
            {
            DepthPoint *p=&pnt[y][x];
            vector_ld(p->col,0.5,0.5,0.5);
            if (p->pos[2]>=error) vector_ld(p->col,1.0,0.0,0.0);    // invalid depths as red
            }
        delete depth;
        }
    bool find_flat_rect(double* &p0,double* &p1,double* &p2,double* &p3,double wx,double wy,double dmax)
        {
        int i,x,y,x0,y0,x1,y1,dx=1,dy=1;
        double d,p[3],n[3];
        // whole image
        for (y0=0;y0<ys;y0+=dx)
         for (x0=0;x0<xs;x0+=dy)
            {
            dx=1; dy=1;
            // first coorner
            p0=pnt[y0][x0].pos; if (p0[2]>=error) continue;
            // find next coorner wx distant to p0
            for (x1=x0+1;x1<xs;x1++)
                {
                p1=pnt[y0][x1].pos; if (p1[2]>=error) continue;
                vector_sub(p,p1,p0);    // p = p1-p0
                d=vector_len(p);        // d = |p|
                if (d>=wx) break;
                } if (x1>=xs){ x0=xs; continue; }
            // find next coorner wy distant to p0
            for (y1=y0+1;y1<ys;y1++)
                {
                p3=pnt[y1][x0].pos; if (p3[2]>=error) continue;
                vector_sub(p,p3,p0);    // p = p3-p0
                d=vector_len(p);        // d = |p|
                if (d>=wy) break;
                } if (y1>=ys){ p0=p1=p2=p3=NULL; return false; }
            // diagonal coorner
            p2=pnt[y1][x1].pos;
            // window position increments are 1/4 of window size
            dx=1+((x1-x0)>>2);
            dy=1+((y1-y0)>>2);
            // plane normal
            vector_sub(p,p1,p0);    // p = p1-p0
            vector_sub(n,p3,p0);    // n = p3-p0
            vector_mul(n,n,p);      // n = cross(n,p)
            vector_mul(n,n);        // n = n/|n|
            // test distances to plane
            i=1;
            for (y=y0;y<=y1;y++)
             for (x=x0;x<=x1;x++)
                {
                if (pnt[y][x].pos[2]>=error) continue;
                vector_sub(p,pnt[y][x].pos,p0); // p = pnt[y][x] - p0
                d=fabs(vector_mul(p,n));        // d = |dot(p,n)| ... perpendicular distance
                if (d>dmax){ i=0; x=x1+1; y=y1+1; break; }
                }
            if (i) return true;
            }
        p0=p1=p2=p3=NULL;
        return false;
        }

    void gldraw()
        {
        int x,y;
        DepthPoint *p;
        for (y=1;y<ys;y++)
            {
            glBegin(GL_QUAD_STRIP);
            for (x=0;x<xs;x++)
                {
                p=&pnt[y-1][x];
                glColor3dv(p->col);
                glNormal3dv(p->nor);
                glVertex3dv(p->pos);
                p=&pnt[y][x];
                glColor3dv(p->col);
                glNormal3dv(p->nor);
                glVertex3dv(p->pos);
                }
            glEnd();
            }
        glColor3f(1.0,1.0,1.0);
        }
    };
//---------------------------------------------------------------------------

And usage:

DepthImage pcl;         // pointcloud
double *q0,*q1,*q2,*q3; // pointers to landing zone rectangle points
pcl.depthz0=0.8;        // some constants for reconstruction
pcl.depthz1=1.00;
pcl.depthcfg=
     _DepthImage_invert_z
    |_DepthImage_cos_correct_z
    ;
pcl.load("depth.jpg",""); // this creates the pointcloud from image
pcl.find_flat_rect(q0,q1,q2,q3,0.40,0.25,0.005); // this finds a rectangle of 0.4x0.25 m size and 0.005m deviation from flat surface (return true if found)
// the q0,q1,q2,q3 are pointers to 3D points of the found rectangle or NULL if no such has been found

The image loader is from mine OpenGL engine (partially based on VCL and PNGDelphi) so you need to use whatever you got in your environment.

The only important stuff (apprat fromthe 3D reconstruction) is the function find_flat_rect which works exactly as described in this approach.

The only thing left to add is a test that landing zone is not too big slope (by dot product between plane normal n and down vector (from accelerometer or gyros). In case your drone is always in horizontal orientation you can use z axis instead.

Here the function that converts 3D point into 2D depth image cooridnates:

void DepthImage::project(double &x,double &y,double* p)     // p[3] -> (x,y) ... 3D to 2D perspective projection
    {
    x=atan2(p[0],p[2])/depthFOVx;
    y=atan2(p[1],p[2])/depthFOVy;
    if (int(depthcfg&_DepthImage_cos_correct_z))
     y*=(double(ys)*depthFOVx)/(depthFOVy*double(xs));
    x=(x+0.5)*xs;
    y=(y+0.5)*ys;
    }

However due to not perfect 3D reconstruction (due unknown depth encoding used) is this not very accurate so the result is slightly distorted. Beware results might be slightly out of image bounds so you need to to limit it to <0,xs) and <0,ys) if you want to use it for pixel access ...

"[still work in progress please wait until I code/test some stuff...]" In order to avoid feedback before you are satisfied with your post please delete it until you are done. — Yunnosch, Jan 20 '21 at 08:46
@lofy if you want to get GL working see: [How to render an openGL frame in C++ builder?](https://stackoverflow.com/a/20679773/2521214) Also beware in window constructor the window is still not fully operational so you can not render in it. You can add Timer component set its interval to 40ms and render in its event instead. Or in `OnPaint` event ... I am still using old BDS2006 (IIRC RAD started in 2009) so the things I described might change slightly in newer IDE version... PS hope you like Borland/Embarcadero IDE if you get used to it all other IDEs will become so backward ... — Spektre, Jan 27 '21 at 07:51
@lofy btw You can load jpegs into bitmap like this [TJPEGImage](https://stackoverflow.com/a/22779808/2521214) some components are capable of loading all registered images however I like my own loaders more This might interest you too [opening image file on c++ , PNG , JPEG](https://stackoverflow.com/a/37340970/2521214) Its what I used but it uses `PngDelphi` lib which must be downloaded first! However RAD is newer so it might have it integrated already (similar like jpg) — Spektre, Jan 27 '21 at 08:04
@lofy 1. use [4x4 homogenous transform matrices](https://stackoverflow.com/a/28084380/2521214) 2. best is not convert to color at all and use distance directly. Your depth camera most likely have image formats where it sends the distance as integer or float instead of color gradient (for example on Kinect there is the RAW depth frame). If you must use gradient then use linear one and with more than just 8 bits so either more bpp (bits per pixel) or colored ones. To infer reverse equation you can measure sloped plane covering whole depth range reconstruct it — Spektre, Jan 28 '21 at 07:38
@Spektre, sorry but when I apply these equations to the four points of the rectangle (p0,p1,p2,p3) I get very big values (away bigger than the image dimensions) — lofy, Jan 30 '21 at 09:42
@lofy hmm weird standard perspective projection result is very distorted (due to wrong depth decoding as I do not know your encoding equations) so I changed it from triangle similarity to `atan2` and it works ... I added `project` function into `DepthImage` class and it works as it is in mine code. However also slightly distorted but still usable. It returns position in pixels I Added its code to end of answer — Spektre, Jan 30 '21 at 10:45
@Spektre, sorry but the result still so confusing, DepthImage::project([ 3.87, 51.03, 0.82]) gives me result of [x= 1192.38, y=880.56] where my image is only (480*848) I do not know what limiting function can put this in boundaries — lofy, Jan 30 '21 at 13:45
@lofy then you must mess something ... do the 3D render that will reveal whats wrong without it its just guess work — Spektre, Jan 30 '21 at 15:41
@Spektre, thanks a lot Python code has worked and the cpp code has worked too after some substitutions of libraries. however the two codes give different results so I guess I will spend some time trying to figure this out. — lofy, Feb 05 '21 at 12:35

How to find flat regions based on depth information

1 Answers1