0

I have a table with lots of city names as column and locations as rows. Now I want to know what is the next city for a location.

 location     Berlin     Hamburg    Stuttgart
location 1      5            6           7 
location 2      4            9           6  
location 3      2            1           3

Now I want to know for location 1 what is the nearest city. I can use a "LEAST" query (like here: How to get min value from a single row in mysql), however this gives me only the value 5 for location 1. But I want to get "Berlin" as result. Is this possible?

In PHP I also tried "mysqli_result::fetch_field", which gave me my sql request string, but not what I need.

==> Seems it is not possible to do with SQL queries, solution is to use another database structure, see answers below, or combine it with the programming language if you don't want to change the database, see my answer below.

Jonny
  • 241
  • 3
  • 10
  • Please [edit] your question to include the mysql table structures and sample data, see https://meta.stackoverflow.com/questions/333952/why-should-i-provide-a-minimal-reproducible-example-for-a-very-simple-sql-query – Progman Jul 27 '19 at 18:10
  • @Jonny what Strawberry is saying is that you (or somebody else) have designed a spreadsheet-like table, which looks good as it is but is very problematic (as you already discovered) when you need to get the info that you want. If it's possible it's better to change this design to something that meets the relational criteria. Your problem as it is can be solved with a long long case statement but is this really a solution? What will happen when you add a new City? You will have to make changes every time. So think about it. – forpas Jul 28 '19 at 07:53
  • @forpas Thank you! The main usecase of the table is to find all locations with a maximum distance to any of several cities. For example find all locations that are max 5km to Hamburg or Stuttgart. Is there a better way for a database design to solve this? The usecase described in this question is just the second usecase. – Jonny Jul 28 '19 at 07:56
  • 2
    @Jonny this is the worst design for this case. Consider a table with Locations(id, name), another table with Cities(id, name) and the table with the relation between theses 2 tables like Distances(idLocation, idCity, distance). This is how you do it when using relational databases. – forpas Jul 28 '19 at 08:01
  • @forpas thank you! I read a bit about relational databases before, however thought I had the best solution for my usecase (For example find all locations that are max 5km to Hamburg or Stuttgart. Note: The cities will not change in the future.). After thinking about my problem and your proposal, I get a bit nearer to understanding the reasons for relational databases. It would solve both usecases. However would need 2 instead of 1 SQL queries to get a result. – Jonny Jul 28 '19 at 08:11
  • I don't see how you would solve both usecases, having the current design with just 1 query when it's hard to solve just the 1st usecase. – forpas Jul 28 '19 at 08:21
  • Yes, when I created the table I only had the first usecase in mind. Then the second came up which I found no solution for to solve it with SQL query only... – Jonny Jul 28 '19 at 08:24

3 Answers3

2

Consider the following:

DROP TABLE IF EXISTS my_table;

CREATE TABLE my_table
(id SERIAL PRIMARY KEY
,location_id INT  NOT NULL
,city VARCHAR(12) NOT NULL
,score INT NOT NULL
,UNIQUE KEY(location_id,city)
);

INSERT INTO my_table VALUES
(1,1,'Berlin',5),
(2,2,'Berlin'    ,4),
(3,3,'Berlin'    ,2),
(4,1,'Hamburg'   ,6),
(5,2,'Hamburg'   ,9),
(6,3,'Hamburg'   ,1),
(7,1,'Stuttgart' ,7),
(8,2,'Stuttgart' ,6),
(9,3,'Stuttgart' ,3);

SELECT * FROM my_table WHERE location_id = 1 ORDER BY score LIMIT 1;
+----+-------------+--------+-------+
| id | location_id | city   | score |
+----+-------------+--------+-------+
|  1 |           1 | Berlin |     5 |
+----+-------------+--------+-------+

In practice, something like the following would be more useful...

DROP TABLE IF EXISTS cities;

CREATE TABLE cities 
(city_id SERIAL PRIMARY KEY
,city VARCHAR(12) NOT NULL UNIQUE
,lat DECIMAL(7,4) NOT NULL
,lon DECIMAL(7,4) NOT NULL
);

INSERT INTO cities VALUES
(101,'Berlin',52.5200,13.4050),
(102,'Hamburg',53.5511, 9.9937),
(103,'Stuttgart',48.7758, 9.1829);

DROP TABLE IF EXISTS locations;

CREATE TABLE locations
(location_id SERIAL PRIMARY KEY
,location_name VARCHAR(100) NOT NULL UNIQUE
,lat DECIMAL(7,4) NOT NULL
,lon DECIMAL(7,4) NOT NULL
);

INSERT INTO locations VALUES
(1,'Signal Iduna Park',51.4926,7.4519),
(2,'Allianz Arena',48.2188,11.6247),
(3,'Olympiastadion Berlin',52.5147,13.2395);

I have a stored function for calculating distances on Earth in km...

DELIMITER $$

CREATE FUNCTION geo_distance_km(lat1 double, lon1 double, lat2 double, lon2 double) RETURNS double
begin
   declare R int DEFAULT 6372.8;
   declare phi1 double;
   declare phi2 double;
   declare d_phi double;
   declare d_lambda double;
   declare a double;
   declare c double;
   declare d double;
   set phi1 = radians(lat1);
   set phi2 = radians(lat2);
   set d_phi = radians(lat2-lat1);
   set d_lambda = radians(lon2-lon1);
   set a = sin(d_phi/2) * sin(d_phi/2) +
         cos(phi1) * cos(phi2) *
         sin(d_lambda/2) * sin(d_lambda/2);
   set c = 2 * atan2(sqrt(a), sqrt(1-a));
   set d = R * c;
   return d;
end$$
DELIMITER ;

And so to the query...

SELECT l.location_name
     , c.city nearest_city
  FROM locations l
  JOIN cities c
  JOIN ( SELECT x.location_id
              , MIN(geo_distance_km(x.lat,x.lon,y.lat,y.lon)) dist
           FROM locations x
           JOIN cities y
          GROUP
             BY x.location_id
       ) n
    ON n.location_id = l.location_id
   AND geo_distance_km(l.lat,l.lon,c.lat,c.lon) = n.dist;

+-----------------------+--------------+
| location_name         | nearest_city |
+-----------------------+--------------+
| Olympiastadion Berlin | Berlin       |
| Signal Iduna Park     | Hamburg      |
| Allianz Arena         | Stuttgart    |
+-----------------------+--------------+

..and if we add two more cities to that list...

INSERT INTO cities VALUES
(104,'Munich',48.1351,11.5820),
(105,'Dortmund',51.5136,7.4653);

SELECT l.location_name
     , c.city nearest_city
  FROM locations l
  JOIN cities c
  JOIN ( SELECT x.location_id
              , MIN(geo_distance_km(x.lat,x.lon,y.lat,y.lon)) dist
           FROM locations x
           JOIN cities y
          GROUP
             BY x.location_id
       ) n
    ON n.location_id = l.location_id
   AND geo_distance_km(l.lat,l.lon,c.lat,c.lon) = n.dist;
+-----------------------+--------------+
| location_name         | nearest_city |
+-----------------------+--------------+
| Olympiastadion Berlin | Berlin       |
| Allianz Arena         | Munich       |
| Signal Iduna Park     | Dortmund     |
+-----------------------+--------------+   

We can further optimize this by using bounding boxes, but that's for another day.

Strawberry
  • 33,750
  • 13
  • 40
  • 57
  • Wow, thank you very much for all the details and description! I didn't know one can use functions within a SQL query... very interesting! In my case, as I have around several ten thousend entries, I assume it would last too long to calculate the distance in each query. Or do you have different experience? – Jonny Jul 28 '19 at 18:56
  • 1
    It would be pretty quick. But this is where bounding boxes come in - so you know you're only crunching (potentially) relevant data. – Strawberry Jul 28 '19 at 19:17
2

Check this relational schema:

create table locations(
  id int primary key, 
  name varchar(30) unique
);
insert into locations(
  id, name) values
(1, 'location1'), (2, 'location2'), (3, 'location3');

create table cities(
  id int primary key, 
  name varchar(30) unique
);
insert into cities(id, name) values                                            
(1, 'Berlin'),  (2, 'Hamburg'), (3, 'Stuttgart');

create table distances(
  idlocation int, 
  idcity int, 
  distance int,
  primary key(idlocation, idcity),
  foreign key(idlocation) references locations(id),
  foreign key(idcity) references cities(id)  
);
insert into distances(idlocation, idcity, distance) values
(1, 1, 5), (1, 2, 6), (1, 3, 7), 
(2, 1, 4), (2, 2, 9), (2, 3, 6),
(3, 1, 2), (3, 2, 1), (3, 3, 3);

Now to solve your problem you can use any of the 2 below queries:

select l.name, c.name
from locations l 
inner join distances d on d.idlocation = l.id
inner join cities c on c.id = d.idcity
where l.name = 'location1'
and d.distance = (select min(distance) from distances where idlocation = l.id);

or:

select l.name, c.name
from locations l 
inner join distances d on d.idlocation = l.id
inner join cities c on c.id = d.idcity
where l.name = 'location1'
and not exists (
  select 1 from distances
  where idlocation = l.id and distance < d.distance
);

See the demo.
Results:

| name      | name   |
| --------- | ------ |
| location1 | Berlin |


This schema is just for guidance.
You may adjust it to fit your needs.

forpas
  • 160,666
  • 10
  • 38
  • 76
0

As there is no answer yet, I see the following possibility. It is not ideal and not with SQL Query directly (not possible?) but will do what I need in a reasonable time. Any better proposals will be appreciated:

  1. Get all column names and save them in an 2D array in row 1(see https://www.codexworld.com/how-to/get-column-names-from-table-in-mysql-php/ and MySQL query to get column names?)

  2. Get all values of the row "location 1" and add them to the array in 1. into row 2. (sth like SELECT * from table WHERE location = location 1)

  3. Now solve the problem in the corresponing programming language based on the array instead of solving it with a MySQL query.

or

  1. Get the min value of a row with the above mentioned LEAST query (How to get min value from a single row in mysql). Find this value in row2 from 2. and get the corresponding value with same index in row1.
Jonny
  • 241
  • 3
  • 10