0

In the satellite table there are duplicate satellites from different sources.

╔═════════╦══════════╦═══════╦═════════╦════════════╦══════════╦════════╗
║ _sat_id ║   name   ║ norad ║ intldes ║ un_country ║ operator ║ source ║
╠═════════╬══════════╬═══════╬═════════╬════════════╬══════════╬════════╣
║       1 ║ ISS      ║ 25544 ║ 98067A  ║ null       ║ Frank    ║ s_t    ║
║       2 ║ Int stat ║ 25544 ║ 98067A  ║ null       ║ null     ║ ucs    ║
║       3 ║ zarya    ║     0 ║ 98067A  ║ US         ║ John     ║ nasa   ║
║       4 ║ iss      ║ 25544 ║ 98067A  ║ Sovjet     ║ John     ║ celes  ║
║       5 ║ ISS      ║ 25544 ║ null    ║ Russia     ║ null     ║ other  ║
╚═════════╩══════════╩═══════╩═════════╩════════════╩══════════╩════════╝

How do I merge(not group concat) this in MySQL following a priority list? For example priority list:

  1. s_t

  2. ucs

  3. celes

  4. nasa

    so the merged row will contain all 1.s_t data, all nulls are 2.ucs data, if there are still nulls then 3.celes data etc.

I tried using the following MySQL query:

SELECT 
group_concat(`sources`) as sources,
max(`_sat_id`) as _sat_id,
max(`off_name`) as off_name,
max(`norad`) as norad,
max(`intldes`) as intldes,
max(`un_reg_country`) as un_reg_country,
max(`operator_country`) as operator_country,
max(`operator`) as operator,
max(`contractor_country`) as contractor_country,
max(`contractor`) as contractor,
max(`users`) as users,
max(`contact_info`) as contact_info,
max(`operational_status`) as operational_status,
max(`application`) as application,
max(`period`) as period,
max(`has_propulsion`) as has_propulsion,
max(`power`) as power,
max(`dry_mass`) as dry_mass,
max(`orbit_class`) as orbit_class,
max(`orbit_type`) as orbit_type,
max(`expected_life_time`) as expected_life_time,
max(`decay_date`) as decay_date,
max(`longitude`) as longitude,
max(`perigee`) as perigee,
max(`apogee`) as apogee,
max(`eccentricity`) as eccentricity,
max(`inclination`) as inclination,
max(`launch_date`) as launch_date,
max(`launch_mass`) as launch_mass,
max(`launch_site`) as launch_site,
max(`launch_vehic`) as launch_vehic,
max(`description`) as description,
group_concat(`comments`) as comments
    FROM satellite
    GROUP BY intldes

But with this I don't know which row gets the priority.

I also tried doing it with Java, but this takes over 40 seconds per 500 rows...

Thx in advance...

Community
  • 1
  • 1
Frank
  • 99
  • 1
  • 6
  • Did not quite understand what you want. Would you add a small data sample (a little bigger than the current one) and provide what would be the desired result from it please? – Jorge Campos Feb 11 '16 at 17:10
  • Sounds like you're looking for [coalesce](http://dev.mysql.com/doc/refman/5.7/en/comparison-operators.html#function_coalesce). However, I'm not sure how you are identifying duplicate satellite entries which are actually the same satellite. If that logic was spelled out I have no doubt the community could provide a query that would fit your needs. – Andreas Feb 11 '16 at 17:15

3 Answers3

1

This returns the best match over your priority:

SELECT 
   intldes,
   coalesce(max(case when source = 's_t'   then `_sat_id` end)
           ,max(case when source = 'ucs'   then `_sat_id` end)
           ,max(case when source = 'celes' then `_sat_id` end)
           ,max(case when source = 'nasa'  then `_sat_id` end)) as _sat_id,
   coalesce(max(case when source = 's_t'   then `off_name ` end)
           ,max(case when source = 'ucs'   then `off_name ` end)
           ,max(case when source = 'celes' then `off_name ` end)
           ,max(case when source = 'nasa'  then `off_name ` end)) as off_name,
...
FROM satellite
GROUP BY intldes

A lot of cut&paste&modify and probably not really efficient. But hopefully this is a one-time-only job.

dnoeth
  • 59,503
  • 4
  • 39
  • 56
1

I believe something like the following query will do the trick:

SELECT
    COALESCE(s_t.off_name, ucs.off_name, celes.off_name, nasa.off_name) AS off_name
FROM (SELECT intldes FROM satellite GROUP BY intldes) all
LEFT JOIN satellite s_t ON all.intldes = s_t.intldes AND s_t.source = 's_t'
LEFT JOIN satellite ucs ON all.intldes = ucs.intldes AND ucs.source = 'ucs'
LEFT JOIN satellite celes ON all.intldes = celes.intldes AND celes.source = 'celes'
LEFT JOIN satellite nasa ON all.intldes = nasa.intldes AND nasa.source = 'nasa';

I showed just one attribute as example, you can extend it for the rest of them.

reaanb
  • 9,806
  • 2
  • 23
  • 37
0

Not completely sure what the result should look like, it might be helpful to put that in - Would the ROW_NUMBER function help, here ? like in ROW_NUMBER() in MySQL and then choose the row accordingly to your priority ?

Community
  • 1
  • 1
matthieu lieber
  • 662
  • 1
  • 17
  • 30