5

I have a problem where I receive a lot of histograms from some database images. Those histograms are represented as vectors (0...255), and I have to identify and work with the bimodal histograms.

Is there a formula to automatically identify which histograms are bimodal and which aren't? Since they are numeric vectors, I could use a programming language (Java/C#) to work with it.

Is there a criterion on literature to identify bimodal histograms by software?


Here are 3 examples of histograms and format inputs I'm working with. Each histogram is a vector with 256 (0...255) positions.

Histogram 1
8029, 41, 82, 177, 135, 255, 315, 591, 949, 456, 499, 688, 446, 733, 712, 1595, 2633, 3945, 6134, 9755, 9236, 11911, 11888, 9450, 13119, 8819, 5991, 4399, 6745, 2017, 3747, 1777, 2946, 1623, 2151, 454, 3015, 3176, 2211, 1080, 391, 580, 750, 473, 10424, 334, 559, 621, 340, 2794, 1094, 5274, 2822, 204, 389, 728, 268, 15, 1060, 58, 113, 2728, 52, 3166, 11, 103, 522, 107, 351, 97, 66, 565, 315, 444, 3305, 245, 647, 306, 147, 112, 103, 672, 69, 317, 61, 224, 71, 52, 479, 62, 106, 166, 215, 132, 137, 321, 998, 427, 846, 787, 542, 1054, 1429, 615, 697, 580, 642, 768, 1244, 462, 4107, 1701, 2394, 4954, 4869, 1841, 1807, 1032, 3075, 331, 488, 627, 1281, 233, 1010, 1178, 727, 830, 1619, 728, 1428, 1849, 4826, 351, 745, 320, 888, 335, 741, 1151, 734, 689, 2143, 1130, 2482, 3609, 4779, 5678, 4186, 2654, 1668, 1290, 702, 1093, 476, 438, 445, 271, 98, 368, 226, 90, 75, 26, 33, 62, 16, 824, 21, 37, 34, 24, 54, 42, 101, 112, 18, 24, 17, 15, 3, 50, 7, 6, 54, 3, 58, 9, 10, 66, 12, 11, 10, 6, 25, 11, 7, 172, 13, 18, 21, 9, 8, 9, 42, 16, 15, 6, 12, 17, 7, 591, 6, 7, 14, 24, 7, 7, 19, 87, 18, 8, 9, 9, 35, 55, 4, 17, 10, 18, 22, 46, 8, 852, 15, 14, 12, 11, 9, 3, 50, 163, 12, 4, 18, 129, 6, 35, 47, 14, 18, 150, 21, 46, 24, 0

Histogram 2
8082, 4857, 1494, 2530, 1604, 1636, 1651, 1681, 1630, 1667, 1636, 1649, 1934, 1775, 1701, 1691, 1478, 1649, 1449, 1449, 1503, 1475, 1497, 1398, 1509, 1747, 1301, 1539, 1575, 1496, 1754, 1432, 1759, 1786, 1679, 1816, 2435, 1174, 1780, 1344, 1749, 2026, 1779, 1742, 1722, 1835, 2306, 1662, 1965, 1885, 2212, 2139, 1930, 2306, 2707, 2289, 2307, 2082, 2360, 2216, 2480, 2243, 2222, 1824, 4555, 1918, 2116, 2275, 2615, 2240, 2703, 2481, 2626, 2708, 3008, 2696, 2561, 2906, 3625, 2419, 3137, 2793, 2747, 2861, 2774, 4124, 3155, 3243, 3523, 3432, 3277, 3456, 2984, 2902, 2819, 2778, 3158, 2997, 2591, 2717, 2553, 2464, 3657, 2296, 2352, 2046, 2124, 1965, 2014, 2096, 1664, 1373, 1607, 1322, 1272, 1113, 1156, 1055, 924, 881, 1019, 669, 929, 636, 590, 463, 524, 177, 1267, 378, 409, 413, 415, 435, 385, 379, 267, 413, 266, 282, 499, 194, 360, 199, 337, 92, 986, 183, 160, 230, 124, 213, 188, 334, 164, 159, 130, 143, 135, 331, 25, 118, 114, 98, 74, 301, 92, 119, 94, 72, 192, 38, 64, 100, 138, 30, 98, 65, 226, 23, 46, 78, 78, 61, 55, 234, 26, 36, 95, 31, 49, 214, 25, 34, 58, 37, 101, 20, 41, 34, 150, 16, 50, 25, 53, 18, 30, 67, 27, 36, 42, 23, 60, 12, 21, 36, 12, 45, 21, 58, 53, 18, 51, 16, 25, 9, 24, 15, 18, 30, 33, 20, 19, 12, 23, 16, 14, 21, 14, 10, 20, 13, 12, 9, 6, 9, 7, 10, 7, 2, 0, 0, 0, 0, 0, 2087

Histogram 3
50, 226, 857, 2018, 1810, 1795, 1840, 1929, 1942, 1693, 1699, 1547, 1564, 1556, 1451, 1439, 1448, 1357, 1428, 1419, 1383, 1705, 1670, 1777, 1826, 1865, 1897, 1924, 2003, 1973, 1813, 1801, 1827, 1696, 1717, 1654, 1678, 1705, 1621, 1523, 1494, 1559, 1434, 1370, 1358, 1385, 1348, 1380, 1368, 1367, 1389, 1445, 1514, 1471, 1465, 1461, 1475, 1484, 1390, 1403, 1324, 1339, 1426, 1432, 1487, 1460, 1469, 1460, 1546, 1504, 1425, 1373, 1391, 1391, 1382, 1311, 1368, 1354, 1325, 1323, 1263, 1325, 1363, 1357, 1325, 1322, 1429, 1419, 1412, 1371, 1266, 1179, 1166, 1076, 1100, 1083, 1103, 1053, 1116, 1080, 1071, 1025, 1088, 1060, 1011, 984, 958, 959, 954, 937, 982, 950, 1001, 963, 965, 875, 1010, 954, 990, 894, 959, 972, 963, 1101, 971, 1042, 1064, 1075, 1029, 1088, 1090, 1068, 1073, 1058, 1102, 1105, 1009, 1062, 1005, 1048, 973, 998, 1034, 1013, 961, 1006, 983, 948, 1031, 972, 952, 1013, 954, 964, 970, 881, 887, 967, 941, 928, 994, 1019, 1106, 1056, 1113, 1071, 1158, 1108, 1178, 1071, 1080, 1074, 1050, 1076, 1106, 1048, 973, 1042, 997, 1034, 934, 863, 935, 845, 839, 803, 764, 782, 787, 771, 766, 751, 745, 804, 789, 765, 681, 658, 690, 672, 650, 635, 695, 619, 572, 499, 535, 565, 564, 520, 516, 568, 530, 479, 507, 424, 446, 455, 380, 395, 371, 360, 391, 373, 351, 388, 426, 349, 417, 421, 400, 443, 470, 485, 456, 495, 452, 484, 457, 518, 519, 631, 652, 693, 762, 771, 807, 906, 991, 1138, 1433, 1545, 2467, 4907, 6743, 1921
IanS
  • 15,771
  • 9
  • 60
  • 84
Marcel James
  • 834
  • 11
  • 20
  • 1
    See [this on CrossValidated](http://stats.stackexchange.com/questions/5960/how-to-identify-a-bimodal-distribution). It's exactly the same idea. – Ami Tavory Jun 10 '15 at 23:10
  • smooth histogram (to eliminate small local min max) then find all local min max and test if you have 2 distinct big peaks ... – Spektre Jun 11 '15 at 06:37
  • @Spektre thanks for your answer. Looks like its simples for your comment. But ive seen some articles about it and it doesnt looks to be trivial. Do you have an example of an application that does it? Thanks again! – Marcel James Jun 11 '15 at 10:46
  • no i do not use 3th party apps for DIP. Yes the implementation looks trivial but when you start adding things like adaptive smoothing or adaptive tresholding of insignificant peaks the code gets much more complicated. (local peaks finding on generic histogram is not trivial at all) there are sure many research papers about this but each is probably suited to specific type of images or task. in DIP/CV is hard (near impossible) to come with reliable algorithm for any task with generic/arbitrary input. so to be more specific you need to show us your input data set – Spektre Jun 11 '15 at 11:31
  • @Spektre i made an edit on question, and put a sample of my histograms models. Since histogram are occrency index, they can be easily represented by vectors. Thanks – Marcel James Jun 12 '15 at 02:34
  • added answer also added a dot line to your hist code so it is easily selectable (the last histogram was not selectable whole at once) – Spektre Jun 12 '15 at 07:16

2 Answers2

4
  1. smooth histogram

    this filter out small local min max and noise. Use symmetric smoothing to avoid shifting to one side. I smooth from left then from the right which lower the shifting a lot.

  2. find/count the local max peaks

    Count only big enough peaks (by some treshold). If peak count is not 2 then it is not a bimodal histogram unless you have different definition of bimodal like:

    • noise count too
    • no matter how many peaks but single gap must be present

    It depends on what for the histograms are used

Here is some code in C++ I busted for this:

void histograms(Graphics::TBitmap *bmp,int xs,int ys,int **pyx)
    {
    // clear buffer
    bmp->Canvas->Brush->Color=clBlack;
    bmp->Canvas->FillRect(TRect(0,0,xs,ys));
    bmp->Canvas->Font->Color=clAqua;
    bmp->Canvas->TextOutA(  5,5,"Raw histogram");
    bmp->Canvas->TextOutA(285,5,"Smoothed histogram");

    int his1[256]={ 8029, 41, 82, 177, 135, 255, 315, 591, 949, 456, 499, 688, 446, 733, 712, 1595, 2633, 3945, 6134, 9755, 9236, 11911, 11888, 9450, 13119, 8819, 5991, 4399, 6745, 2017, 3747, 1777, 2946, 1623, 2151, 454, 3015, 3176, 2211, 1080, 391, 580, 750, 473, 10424, 334, 559, 621, 340, 2794, 1094, 5274, 2822, 204, 389, 728, 268, 15, 1060, 58, 113, 2728, 52, 3166, 11, 103, 522, 107, 351, 97, 66, 565, 315, 444, 3305, 245, 647, 306, 147, 112, 103, 672, 69, 317, 61, 224, 71, 52, 479, 62, 106, 166, 215, 132, 137, 321, 998, 427, 846, 787, 542, 1054, 1429, 615, 697, 580, 642, 768, 1244, 462, 4107, 1701, 2394, 4954, 4869, 1841, 1807, 1032, 3075, 331, 488, 627, 1281, 233, 1010, 1178, 727, 830, 1619, 728, 1428, 1849, 4826, 351, 745, 320, 888, 335, 741, 1151, 734, 689, 2143, 1130, 2482, 3609, 4779, 5678, 4186, 2654, 1668, 1290, 702, 1093, 476, 438, 445, 271, 98, 368, 226, 90, 75, 26, 33, 62, 16, 824, 21, 37, 34, 24, 54, 42, 101, 112, 18, 24, 17, 15, 3, 50, 7, 6, 54, 3, 58, 9, 10, 66, 12, 11, 10, 6, 25, 11, 7, 172, 13, 18, 21, 9, 8, 9, 42, 16, 15, 6, 12, 17, 7, 591, 6, 7, 14, 24, 7, 7, 19, 87, 18, 8, 9, 9, 35, 55, 4, 17, 10, 18, 22, 46, 8, 852, 15, 14, 12, 11, 9, 3, 50, 163, 12, 4, 18, 129, 6, 35, 47, 14, 18, 150, 21, 46, 24, 0 };
    int his2[256]={ 8082, 4857, 1494, 2530, 1604, 1636, 1651, 1681, 1630, 1667, 1636, 1649, 1934, 1775, 1701, 1691, 1478, 1649, 1449, 1449, 1503, 1475, 1497, 1398, 1509, 1747, 1301, 1539, 1575, 1496, 1754, 1432, 1759, 1786, 1679, 1816, 2435, 1174, 1780, 1344, 1749, 2026, 1779, 1742, 1722, 1835, 2306, 1662, 1965, 1885, 2212, 2139, 1930, 2306, 2707, 2289, 2307, 2082, 2360, 2216, 2480, 2243, 2222, 1824, 4555, 1918, 2116, 2275, 2615, 2240, 2703, 2481, 2626, 2708, 3008, 2696, 2561, 2906, 3625, 2419, 3137, 2793, 2747, 2861, 2774, 4124, 3155, 3243, 3523, 3432, 3277, 3456, 2984, 2902, 2819, 2778, 3158, 2997, 2591, 2717, 2553, 2464, 3657, 2296, 2352, 2046, 2124, 1965, 2014, 2096, 1664, 1373, 1607, 1322, 1272, 1113, 1156, 1055, 924, 881, 1019, 669, 929, 636, 590, 463, 524, 177, 1267, 378, 409, 413, 415, 435, 385, 379, 267, 413, 266, 282, 499, 194, 360, 199, 337, 92, 986, 183, 160, 230, 124, 213, 188, 334, 164, 159, 130, 143, 135, 331, 25, 118, 114, 98, 74, 301, 92, 119, 94, 72, 192, 38, 64, 100, 138, 30, 98, 65, 226, 23, 46, 78, 78, 61, 55, 234, 26, 36, 95, 31, 49, 214, 25, 34, 58, 37, 101, 20, 41, 34, 150, 16, 50, 25, 53, 18, 30, 67, 27, 36, 42, 23, 60, 12, 21, 36, 12, 45, 21, 58, 53, 18, 51, 16, 25, 9, 24, 15, 18, 30, 33, 20, 19, 12, 23, 16, 14, 21, 14, 10, 20, 13, 12, 9, 6, 9, 7, 10, 7, 2, 0, 0, 0, 0, 0, 2087 };
    int his3[256]={ 50, 226, 857, 2018, 1810, 1795, 1840, 1929, 1942, 1693, 1699, 1547, 1564, 1556, 1451, 1439, 1448, 1357, 1428, 1419, 1383, 1705, 1670, 1777, 1826, 1865, 1897, 1924, 2003, 1973, 1813, 1801, 1827, 1696, 1717, 1654, 1678, 1705, 1621, 1523, 1494, 1559, 1434, 1370, 1358, 1385, 1348, 1380, 1368, 1367, 1389, 1445, 1514, 1471, 1465, 1461, 1475, 1484, 1390, 1403, 1324, 1339, 1426, 1432, 1487, 1460, 1469, 1460, 1546, 1504, 1425, 1373, 1391, 1391, 1382, 1311, 1368, 1354, 1325, 1323, 1263, 1325, 1363, 1357, 1325, 1322, 1429, 1419, 1412, 1371, 1266, 1179, 1166, 1076, 1100, 1083, 1103, 1053, 1116, 1080, 1071, 1025, 1088, 1060, 1011, 984, 958, 959, 954, 937, 982, 950, 1001, 963, 965, 875, 1010, 954, 990, 894, 959, 972, 963, 1101, 971, 1042, 1064, 1075, 1029, 1088, 1090, 1068, 1073, 1058, 1102, 1105, 1009, 1062, 1005, 1048, 973, 998, 1034, 1013, 961, 1006, 983, 948, 1031, 972, 952, 1013, 954, 964, 970, 881, 887, 967, 941, 928, 994, 1019, 1106, 1056, 1113, 1071, 1158, 1108, 1178, 1071, 1080, 1074, 1050, 1076, 1106, 1048, 973, 1042, 997, 1034, 934, 863, 935, 845, 839, 803, 764, 782, 787, 771, 766, 751, 745, 804, 789, 765, 681, 658, 690, 672, 650, 635, 695, 619, 572, 499, 535, 565, 564, 520, 516, 568, 530, 479, 507, 424, 446, 455, 380, 395, 371, 360, 391, 373, 351, 388, 426, 349, 417, 421, 400, 443, 470, 485, 456, 495, 452, 484, 457, 518, 519, 631, 652, 693, 762, 771, 807, 906, 991, 1138, 1433, 1545, 2467, 4907, 6743, 1921 };
    int *his,tmp[256],a,x0,y0,x,y,h,tr=12,sm=10,peak[256],peaks;
    // loop through histograms
    for (y0=20,h=0;;h++)
        {
        x0=5;if (h==0) his=his1;
        else if (h==1) his=his2;
        else if (h==2) his=his2;
        else break;
        // rescale his <0,?> to tmp <0-100>
        for (y=his[0],x=0;x<256;x++) if (y<his[x]) y=his[x];    // y=max
        for (         x=0;x<256;x++) tmp[x]=(his[x]*100)/y;
        // draw tmp
        for (x=0;x<256;x++) for (pyx[y0+100][x0+x]=0x00004040,y=0;y<tmp[x];y++) pyx[y0+100-y][x0+x]=(40+x)*0x00010101;
        x0+=280;
        // smooth tmp few times
        for (y=0;y<sm;y++)
            {
            // from both directions to avoid shifting to one side
            for (x=0;x<255;x++) tmp[x]=((90*tmp[x])+(10*tmp[x+1]))/100;
            for (x=255;x>0;x--) tmp[x]=((90*tmp[x])+(10*tmp[x-1]))/100;
            }
        // find (count) peaks
        for (peaks=0,a=0,y=0,x=0;x<255;x++)
            {
                 if (tmp[x]<tmp[x+1]){ if ((y< 0)&&(a-tmp[x]>tr)){ a=tmp[x]; }                         y=+1; }
            else if (tmp[x]>tmp[x+1]){ if ((y>=0)&&(tmp[x]-a>tr)){ a=tmp[x]; peak[peaks]=x; peaks++; } y=-1; }
            }
        // draw tmp
        for (x=0;x<256;x++) for (pyx[y0+100][x0+x]=0x00004040,y=0;y<tmp[x];y++) pyx[y0+100-y][x0+x]=(40+x)*0x00010101;
        // draw peaks
        bmp->Canvas->Pen->Color=clAqua;
        bmp->Canvas->Brush->Color=clAqua;
        for (a=0;a<peaks;a++)
            {
            x=x0+peak[a];
            y=y0+100-tmp[peak[a]];
            bmp->Canvas->Ellipse(x-5,y-5,x+5,y+5);
            }
        // draw cross for not bimodal histograms
        if (peaks!=2)
            {
            bmp->Canvas->Pen->Color=clRed;
            bmp->Canvas->MoveTo(x0    ,y0    );
            bmp->Canvas->LineTo(x0+256,y0+100);
            bmp->Canvas->MoveTo(x0+256,y0    );
            bmp->Canvas->LineTo(x0    ,y0+100);
            }
        y0+=128;
        }
    }
  • you can ignore the pyx[y][x] and bmp-> stuff it is just rendering
  • pyx[y][x] is direct 32bit pixel access of bitmap bmp
  • and bmp->Canvas is VCL encapsulated Windows GDI interface of bitmap bmp
  • xs,ys is bitmap resolution
  • set treshold tr and smooth sm to suite your needs best

If you have too much different types of histogram then you need to apply dynamic tresholding or different approach for peak finding this is how it looks like for your histograms:

example

Where Histogram 1 is the top one. Hope the code is clear enough if not comment me... if you rescale to power of 2 instead of 100 then you can change the multiplications and divisions to bit shifts to speed this a bit. I choose 100 for more clear selection of tresholds and smoothing coefficients...

Spektre
  • 49,595
  • 11
  • 110
  • 380
  • Wow man, thats amazing! I will check this out and try to implement on my Java code, just a question, are u sure histograms 2 and 3 are bimodal? i mean, i got ur logic, but they dont look as that bimodal as another samples of bimodal histograms ive seen. Anyways, thats just personal perception. – Marcel James Jun 12 '15 at 10:44
  • @MarcelJames as I mentioned that depend on what the input images are and what for is the histogram used. for example overexposured images usually does not have any gap between peaks. if you have not natural images then the peaks are usually just single line (noise like image and you need to preserve the noise). for example if you want to binarise image usually the gap is more important then the peaks (the case you can have many peaks but if only one gap is between the histogram is considered bimodal) so you always need to take in mind what you have and what you want ... – Spektre Jun 12 '15 at 11:03
  • @MarcelJames so when you know what histograms to consider bimodal for your task then tweak this to suite your needs. the program is simple enough to make changes on your own I hope – Spektre Jun 12 '15 at 11:06
  • yes im using those histogram to binarize digital images. Those histograms represent the HSV(just another space color like RGB...) from a digital image. ill use the images from a [single object segmentation database](http://www.wisdom.weizmann.ac.il/~vision/Seg_Evaluation_DB/1obj/index.html), and ill use the colored images.The segmentation will be based on a global threshold. So for this case of problem, do you think histograms like 2 or 3 can be considered bimodal? – Marcel James Jun 12 '15 at 11:18
  • @MarcelJames for binarizing image you can consider all 3 bimodal up to a point. The first one certainly yes and those other two may be the starting peak can be either useful information or background colors or just irregularity anyway sometimes the result is good sometimes not depends on the image composition. see [2D HSV histogram in C++](http://stackoverflow.com/a/29286584/2521214) it might be interesting for you Also you could try another approach like [fast color quantisation in C++](http://stackoverflow.com/a/30265253/2521214) because by going to grayscale you could loss significant data – Spektre Jun 12 '15 at 11:42
  • @MarcelJames anyway for your task I think you can smooth much more then I originaly set `sm` to ... or change the coeficients from `90% 10% ` to `60% 40%` ... – Spektre Jun 12 '15 at 11:44
  • thanks again! By the way, do you have the souce code or project of your C++ aplication? I might want to do some tests before implementing this in Java!! Thanks! – Marcel James Jun 12 '15 at 12:25
  • @MarcelJames I do in BDS2006 but it is really just an empty form app with single bitmap created at start and calling that histogram code every repaint ... – Spektre Jun 12 '15 at 13:49
1

I don't think there is a simple and straightforward solution for that.

If you think each peak in your histogram as a cluster, you can try to implement some sort of clustering algorithm that is able to automatically detect the number of peaks in your histogram.

You can start looking at here:

Determining the number of clusters in a data set

Kmeans without knowing the number of clusters?

AUTOMATIC DETERMINATION OF THE NUMBER OF CLUSTERS USING SPECTRAL ALGORITHMS

Community
  • 1
  • 1
Andres
  • 3,324
  • 6
  • 27
  • 32