1

I'd like an algorithm to organize a 2D cloud of points in front of a bar graph so that a viewer could easily see the spread of the data. The y location of the point needs to be equal/scaled/proportional to the value of the data, but the x location doesn't matter and would be determined by the algorithm. I imagine a good strategy would be to minimize overlap among the points and center the points.

Here is an example of such a plot without organizing the points:

without organizing the points

I generate my bar graphs with points in front of it with MATLAB, but I'm interested just in the best way to choose the x location values of the points.

I have been organizing the points by hand afterwards in Adobe Illustrator, which is time-consuming. Any recommendations? Is this a sub-problem of an already solved problem? What is this kind of plot called?

For high sample sizes, I imagine something like the following would be better than a cloud of points.

this

I think, mathematically, starting with some array of y-values, it would maximize the sum of the difference between every element from every other element, inversely scaled by the distance between the elements, by rearranging the order of the elements in the array.

Here is the MATLAB code I used to generate the graph:

y = zeros(20,6);
yMean = zeros(1,6);
for i=1:6
    y(:,i) = 5 + (8-5).*rand(20,1);
    yMean(i) = mean(y(:,i));
end

figure
hold on
bar(yMean,0.5)
for i=1:6
    x = linspace(i-0.3,i+0.3,20);
    plot(x,y(:,i),'ro')
end
axis([0,7,0,10])
Sardar Usama
  • 19,536
  • 9
  • 36
  • 58
  • 1
    Something like an [`errorbar`](https://www.mathworks.com/help/matlab/ref/errorbar.html) is what I'd normally expect to see as a way of representing the variance (or "spread") of the data, and it can be [added to bar charts](https://stackoverflow.com/q/15717139/52738). – gnovice Jun 21 '17 at 04:20

1 Answers1

0

Here is one way that determines x-locations based on grouping into (histogram) bins. The result is similar to e.g. the plot in https://stackoverflow.com/a/1934882/4720018, but retains the original y-values. For convenience the points are sorted, but they could be displayed in order of appearance using the bin_index. Whether this is "the best way" of choosing the x-coordinates depends on what you are trying to achieve.

% Create some dummy data
dummy_data_y = 1+0.1*randn(10,3);

% Create bar plot (assuming you are interested in the mean)
bar_obj = bar(mean(dummy_data_y));

% Obtain data size info
n = size(dummy_data_y, 2);

% Algorithm that creates an x vector for each data column
sorted_data_y = sort(dummy_data_y, 'ascend'); % for convenience
number_of_bins = 5;
for j=1:n
    % Get histogram information
    [bin_count, ~, bin_index] = histcounts(sorted_data_y(:, j), number_of_bins);

    % Create x-location data for current column
    xj = [];
    for k = 1:number_of_bins
        xj = [xj 0:bin_count(k)-1];
    end

    % Collect x locations per column, scale and translate
    sorted_data_x(:, j) = j + (xj-(bin_count(bin_index)-1)/2)'/...
                              max(bin_count)*bar_obj.BarWidth;
end

% Plot the individual data points
line(sorted_data_x, sorted_data_y, 'linestyle', 'none', 'marker', '.', 'color', 'r')  

Whether this is a good way to display your data remains open to discussion.

djvg
  • 11,722
  • 5
  • 72
  • 103
  • This is a better replacement for my example code, but doesn't really answer the question I was trying to ask. – flailandsail Jun 22 '17 at 23:20
  • Sorry, it appears I misunderstood your question. Perhaps you could clarify by showing us your manually organized plot? Do you need every data point to be clearly distinguishable? Would it be something like [this](https://stackoverflow.com/a/1934882/4720018) (rotated 90 degrees and applied to each bar)? – djvg Jun 23 '17 at 06:47
  • Updated the answer based on the above. – djvg Jun 23 '17 at 08:17