2

I'm working with an old database and I want to select data spread across different tables but in an recurring manner based on multiple values in one cell.

There are three tables (here with fictional data): treaties, parties, related_parties. The treaties table includes information about specific treaties as well as the ID of the parties that signed it. The parties table comprises information about the parties, whereas the related_parties table consists of information about parties that were related (the related_partner_id being the IDs used the parties table).

Table 1: treaties

id name party_id
1 Peace of Westphalia 49, 80
2 Peace of Rijswijk 49, 50, 81

Table 2: parties

party_id party_name
49 Holy Roman Empire
50 Dutch Republic
51 Mainz
52 Cologne
80 France
81 Sweden
82 Paris
83 Bordeaux

Table 3: related_parties

party_id related_party_id
49 51, 52
80 82, 83

What I want as the output is something like this, where information about the parties is gathered for every value in the relevant cells. So for the first treaty (1) this would be:

id name party_id party_name related_party_id related_party_name
1 Peace of Westphalia 49 Holy Roman Empire 51 Mainz
1 Peace of Westphalia 49 Holy Roman Empire 52 Cologne
1 Peace of Westphalia 80 France 82 Paris
1 Peace of Westphalia 80 France 83 Bordeaux

Is this at all doable? So far every query I've created only retrieves the data pertaining to the first value in a cell.

Jaap
  • 87
  • 6
  • 1
    What data type are your `partner_id` and `related_partner_id` columns? Can you post the queries you've tried so far? – robere2 Jan 24 '23 at 14:36
  • 1
    I'd like to resonate with robere2, asking whether `partner_id` and `related_partner_id` are comma-separated-values (strings) or arrays, but also ask if you can have mixed values, like 81 too, and some other random number, in the `partner_id` field. In that case what would the expected output be? – lemon Jan 24 '23 at 14:51
  • `party_id` and `related_party_id` are integers (sorry, in the orginal question I mixed up the names, instead of partner_id it should be party_id, I've changed this now. – Jaap Jan 25 '23 at 07:43
  • Here's a faulty query that I came up with (focusing on one particular treaty): `SELECT treaty.id, treaty.partners, treaty_partners.id, treaty_partners.partner, treaty_partner_cascading.included_partners FROM _ieg_treaty as treaty INNER JOIN _ieg_treaty_partners treaty_partners ON treaty.partners = treaty_partners.id INNER JOIN _ieg_treaty_partner_cascading treaty_partner_cascading ON treaty_partners.id = treaty_partner_cascading.partner WHERE treaty.id = "1" ` – Jaap Jan 25 '23 at 07:46

1 Answers1

0

To achieve this, the first thing to do would be divide columns into multiple rows. You have to define the maximum amount of divisions your column can have, but i came with this query which becomes pretty handy for this task:

Split string into multiple rows in SQL

Applied to this particular case:

select t5.*,t6.party_name as related_party_name from (
select t3.id,t3.name,t3.party_id, t4.party_name,related_party_id
from
(
select t2.id, t2.name, t1.party_id,t1.related_party_id from
(
select * from (
select
  party_id,(SUBSTRING_INDEX(SUBSTRING_INDEX(related_party_id, ',', NS.n), ',', -1)) as related_party_id
from (
  select 1 as n union all
  select 2 union all
  select 3 union all
  select 4 union all
  select 5 union all
  select 6 union all
  select 7 union all
  select 8 union all
  select 9 union all
  select 10
) NS
inner join related_parties B ON NS.n <= CHAR_LENGTH(B.related_party_id) - CHAR_LENGTH(REPLACE(B.related_party_id, ',', '')) + 1
) divided_related_parties
)t1
left join 

(
select * from (
select
  id,name,(SUBSTRING_INDEX(SUBSTRING_INDEX(party_id, ',', NS.n), ',', -1)) as party_id
from (
  select 1 as n union all
  select 2 union all
  select 3 union all
  select 4 union all
  select 5 union all
  select 6 union all
  select 7 union all
  select 8 union all
  select 9 union all
  select 10
) NS
inner join treaties B ON NS.n <= CHAR_LENGTH(B.party_id) - CHAR_LENGTH(REPLACE(B.party_id, ',', '')) + 1
) divided_treaties
)t2
using(party_id)

)t3

left join 

(
select * from parties
)t4
using(party_id)
)t5
left join 
(
select * from parties
)t6
on(t5.related_party_id=t6.party_id)
order by 1,3

This is exactly what you want (add more elements to union all if you expect more than 10 parties in a treaty).

General Grievance
  • 4,555
  • 31
  • 31
  • 45
LSG
  • 127
  • 1
  • 12
  • Many thanks for this (and sorry for the tardy reply, I'm on a part-time contract). I'm trying to understand your solution, but which tables does the select statement refer to, given that my example contains three tables, but I see references to t4, t5, and t6? Moreover, t3.name doesn't exist (if t3 refers to the table related_parties). Or am I missing something here? – Jaap Jan 30 '23 at 07:43
  • Query has been tested and it is working. t2..t6 are tags/aliases for particular query parts. Ex: SELECT T5.COL FROM ( SELECT COL FROM TABLE1)T1 LEFT JOIN (SELECT COL FROM TABLE2)T5 USING(COL). T5 would make reference to "SELECT COL FROM TABLE2". Related_parties is only referenced directly in t1. You can test query by yourself and if it works, accept it as an answer. – LSG Jan 31 '23 at 08:48