0

How to turn the following tabular dataset into a simple 2D density plot to show a loc-number distribution? I am new to gnuplot. Attempted a tutorial. A simple x,y plot with multiple columns of data, the plot is fine of course. Then tried this answer.. However I encountered the following issue, though x values are defined. I am guessing fundamentally my data set is lacking?(!).. what am I not doing right here? How to achieve a simple 2D contour from below data?

Updating based on recommended suggestions while OP aim remains intact. Following is the input sample data used. File is single-space delimited. x = x, y=y, z1 = locid (1 to n) or z2=loctype (scuba, shower, swimming, restrooms, sushi, cafe, restaurant, etc)

input data :

ametype amename X(1000) Y1000)  km-to-carpark
Scuba   SCUB1   10.72   49.01   
Scuba   SCUB2   13.88   47.32   
Scuba   SCUB3   14.58   46.46   
Scuba   SCUB4   14.52   48.23   
Scuba   SCUB5   13.05   47.23   
Scuba   SCUB6   12.21   47.95   
Scuba   SCUB7   12.66   46.19   
Cafe    CAFE1   13.97   47.45   
Cafe    CAFE4   31.63   30.3    
Playground  PARK2   31.57   30.2    
Playground  PARK1   27.51   31.87   
Cafe    CAFE5   67.71   109.09  
Scuba   SCUB8   68.58   109.54  
Scuba   SCUB9   67.14   109.99  
Cafe    CAFE2   13.83   46.24   
SUSHI   SUSH1   79.59   41.22   
SUSHI   SUSHI2  73.81   54.14   
SUSHI   SUSHI3  72.87   55.47   
SUSHI   SUSHI4  75.05   56.51   
RESTROOM    RESTR1  74.1    56.05   
RESTROOM    RESTR2  74.96   57.9    
RESTROOM    RESTR3  75.06   55.59   
RESTAURANT  RESTAU1 76.57   56.33   
RESTAURANT  RESTAU1 76.95   55.1    
RESTAURANT  RESTAU2 77.75   54.69   
RESTAURANT  RESTAU2 76.15   54.34   

code tried for a different dataset where x,y weren't coordinates;

set view map
set contour
set isosample 250, 250
set cntrparam level incremental 1, 0.1
set palette rgbformulae 33,13,10
splot 'data.dat' with lines nosurface
#splot for [col=1:10]  ‘data.dat’ u ($1):(column(col) > 2 ? 1/0 : column(col)):3

errors:

1) All points x value undefined
2) Tabular output of this 3D plot style not implemented

updated: a) increased data points c) a possible chicken scratch to give simple impression.

Expecting a distribution density map like this.

enter image description here

bonCodigo
  • 14,268
  • 1
  • 48
  • 91
  • 1
    With the data and script you provided I cannot reproduce your error messages. I get different ones. Which gnuplot version are you running? In any case, `splot 'data.dat' with lines` will not work because if you don't specify the columns explicitly, gnuplot will take `u 1:2:3` per default and since column 1 contains only the text `y`, what should gnuplot plot then? So, I am still confused: x-values are in column2 from 1...12, y-values are loc101,loc201,... locN and z-values are the values from about 1.08 to 8.18, correct? Could you please provide more data? It's too little to see any pattern. – theozh Aug 05 '22 at 07:06
  • @theozh appreciate the suggestions, I updated the question accordingly. I would like to know how to define axis explicitly as per your comment. I think `y` in the 1st column has mislead you. Apologies. I was trying to _indicate_ that the data with decimals are for y, column names are z , secondly I thought simplest x,y based 2D contour is possible even without z e.g. `f(x,y) = 1/(x^2 + y^2)` where x, y falls on to the positive range of the cartesian plane, yeah? – bonCodigo Aug 05 '22 at 11:36
  • 1
    thanks for the data, however, if I plot the data (column 1 as x and columns 2 to 4 as y-values), I see points wobbling around y-levels of 1, 3 and 6. How do you want to get a 2D-density map and contour lines with this? – theozh Aug 05 '22 at 18:56
  • @theozh, thanks for trying. If we add bin sizes at 1, 3, 6, will it make a 2D-density map any better? can you show me what you got? I think the dataset format is not conversant to expected goal. I looked through number of other questions/answers here which, some that you were engaged in. My dataset appears to be already in the pivot-tabular format and it has lost an important quality, `the direction`. I am thinking of a 2D heatmap with x,y for location of amenities from a central point of the area. – bonCodigo Aug 06 '22 at 05:28
  • 1
    with your illustrative sketch, I guess I understand what your goal is. However, for this you need x,y coordinates of your locations of interest. Well, you have x,y data, but x seems to be an ID number, not a real x coordinate. gnuplot can do contours from a x,y,z surface plot. For your case this would require data: x,y,z="density of points in x-y plane". Hence, first you have to calculate the density of your locations. I'm not sure if gnuplot has a function for that, but I remember a similar question. I will check. – theozh Aug 06 '22 at 05:43
  • @theozh I was about update the question. Since my original question remains unresolved, the aim is the same, I am happy to upload the x,y coordinates. In the place of z axis, I would have the amenityID or amenitytype. I would like to start with a heatmap and then expand from there. As 2D heatmap may allow the amenityID/type to be present while heat colours would corresond to the density based on x,y. What do you think? – bonCodigo Aug 06 '22 at 08:56
  • 1
    OK, I guess we are getting to a clearer picture. So, for each of the 6 (or more?) types "Cafe, Scuba, Playground, ..." you would like to get a distribution density map and contour lines? I think this can be done, but the question is how crowded and confusing the graph will get. I guess a density plot is starting to makes sense if you have a few 10s or 100s of each type in a certain area. – theozh Aug 06 '22 at 19:59
  • @theozh thanks for following up. Can we stick to just 3 types for now for a start: Cafe, Scuba, Sushi? I am ok if you wish to generate random x,y for some of these 3 types. Is this a few lines of code or more lines? In my view bin size would be 0.1, but what do you suggest? – bonCodigo Aug 07 '22 at 01:29
  • 1
    Yes, actually that's what I was planning to do for a minimal example: 3 types (easily expandable to N types) and about 50 random coordinates per each type. I would not put the locations into bins but determine a "density" for each type like here: https://stackoverflow.com/a/53751188/7295599 So, I am working on it (as soon as I will find some more time). Thank you for your patience. – theozh Aug 07 '22 at 19:50
  • @theozh I could try someting on my end as well, so may be you can correct it or make suggestions. I have seen that questions you quoted. The issues is in this case, I managed to use python before to show the densities and it shows all 3 types cumulatively in the density plot (using plotly package). But it doesn't let me separate them into density groups per type e.g. in North area, I may have 3 clusters of high density cafe, scuba groups. so and so forth. – bonCodigo Aug 09 '22 at 01:27
  • @theozh will you be available to add a chat room please? I am trying out your code. I can post code questions here, but in order to use the code, I would like to get some info from you. Cheers – bonCodigo Aug 17 '22 at 14:55
  • ok, but I haven't found out yet how to enter a chat... ok, here we go: https://chat.stackoverflow.com/rooms/247341/so73244095 – theozh Aug 17 '22 at 14:58

1 Answers1

1

This is an interesting plotting challenge. The input data format is also straightforward, but needs some processing until the desired contour lines can be plotted with gnuplot.

Comments:

  • The data is all in one file. Data entries for the types can be random, no order necessary.

  • the example below will create some random test data with "Cafe, Scuba, Sushi" and 50 entries of each. Skip this part if you want to use your own file.

  • the further lines of the script, have no idea about the content of the test data file (i.e. how many types, type names, coordinates, etc.), all will be determined automatically.

  • create a unique list of types. The list will be in the order of first occurrence.

  • define a grid (here dx=0.2, dy=0.2, i.e. reasonable values within the data range) and count for each grid point the occurrences for each type within a certain radius (here: 0.5). Calculate the density by dividing the count by the unit area (area of the circle).

  • for each type create the contour lines via plotting to a file indexed by a two digit number. So far, I don't know how one would easily write this into indexed datablocks to avoid files on disk.

  • finally, plot the contour line files and the original data points by using a filter to get the right color.

One thing which I haven't figured out yet is set cntrparam level 2: I would like to have exactly 2 contour lines per type, but it seems gnuplot still uses the option set cntrparam level auto 2 and adjusts the number of levels itself.

As you can imagine this graph will probably look pretty confusing with 10 or more types.

For sure, there is room for improvement and no guarantee that there are no bugs in this script. Look at it as a starting point for further optimization. Suggestions for improvements are welcome!

Script:

### plot density contours from simple x,y location file
reset session

FILE = "SO73244095.dat"

# create some random test data
myTypes = "Cafe Scuba Sushi"
set print FILE
    do for [p=1:words(myTypes)] {
        a  = word(myTypes,p)
        x0 = rand(0)*5
        y0 = rand(0)*5
        do for [i=1:20] {
            print sprintf("%s %s%d %.3g %.3g",a,a,i,invnorm(rand(0))+x0,invnorm(rand(0))+y0)
        }
    }
set print

# create a unique list of types
# and extract min, max data
addToList(list,col) = list.(_s='"'.strcol(col).'"', strstrt(list,_s)>0 ? '' : _s)
myTypes   = ''
myType(i) = word(myTypes,i)
stats FILE u (myTypes=addToList(myTypes,1),$3):4 name "DATA" nooutput
Nt = words(myTypes)
print sprintf("%d types found: %s",Nt,myTypes)

# get densities for each type
dx     = 0.2    # adjust the grid as you like...
dy     = 0.2    # ... time for graph creation will increase with finer grid 
Radius = 0.5    # adjust radius to a reasonable value
Nx     = ceil((DATA_max_x-DATA_min_x)/dx)
Ny     = ceil((DATA_max_y-DATA_min_y)/dy)

Dist(x0,y0,x1,y1) = sqrt((x1-x0)**2 + (y1-y0)**2)
print "Please wait..."
set print $Densities
    do for [nt=1:Nt] {
        do for [ny=0:Ny] {
            do for [nx=0:Nx] {
                c = 0
                x = DATA_min_x+nx*dx
                y = DATA_min_y+ny*dy
                stats FILE u (Dist(x,y,$3,$4)<=Radius && (strcol(1) eq word(myTypes,nt)) ? c=c+1 : 0) nooutput
                d = c / (pi * Radius**2)    # density per unit area
                print sprintf("%g %g %g",x,y,d)
            }
            print ""         # empty line
        }
        print ""; print ""   # two empty lines
    }
set print

# get contour lines via splot into files
myContFile(n) = sprintf("%s.cont%02d",FILE,n)
unset surface
set contour
set cntrparam cubicspline levels 2    # cubicspline for "nice" round curves
do for [nt=1:Nt] {
    set table myContFile(nt)
        splot $Densities u 1:2:3 index nt-1
    unset table
}

# set size ratio -1        # uncomment if equal x,y scale is important
set grid x,y
set key out noautotitle
set xrange[:] noextend
set yrange[:] noextend
set colorsequence classic

myFilter(colD,colF,valF) = strcol(colF) eq valF ? column(colD) : NaN

plot for [i=1:Nt] myContFile(i) u 1:2 w l lc i, \
     for [i=1:Nt] FILE u 3:(myFilter(4,1,myType(i))) w p pt 7 lc i ti myType(i)
### end of script

Result: (a few random examples)

enter image description here

enter image description here

enter image description here

enter image description here

theozh
  • 22,244
  • 5
  • 28
  • 72
  • man! This is some work. I am trying to digest it and would get back to you. Appreciate your efforts. Seriously. – bonCodigo Aug 13 '22 at 13:23
  • do you reckon if a (z) coordinate is used, then the plot could be lifted to a 3D contour? It's not that important, just a question. I just checked into the chat. Catch you in about 16 hours if you happened to see this mssg. – bonCodigo Aug 20 '22 at 18:15