-2

Background: I have a number of sources that reference a relationship. I have calculated the percentage of the sources in which any particular relationship is found (since multiple sources sometimes reference the same relationship). Any relationship that is found in less than 10% of the sources is considered a "unique relationship".

I now want to find the metric for each source that is the ratio of the amount of unique relationships found in that source over the total number of relationships found in that source.

I have successfully obtained a table for the numerator values with the following code:

    (SELECT `Source ID`, COUNT(`Source ID`) 
        AS "Numerator"
        FROM
        (
            SELECT statements.`Source ID`, observations.`Relationship ID`, 
                ROUND((COUNT(statements.`Source ID`) / 
                (SELECT COUNT(sources.`Source ID`) FROM sources)),3) * 100
                AS "Ubiquity Percentage"
            FROM statements
            JOIN observations ON statements.`Old Observation ID` = observations.`Old Obs ID`
            GROUP BY `Relationship ID`
            ORDER BY `Relationship ID`
            )
            AS subsubquery
        WHERE `Ubiquity Percentage` < 10
        GROUP BY `Source ID`
        ORDER BY `Source ID`
    )

Which outputs the following: enter image description here

I have also successfully obtained a table for the denominator values with the following code:

SELECT `Source ID`,
    #`Ubiquity Percentage`,
    COUNT(`Source ID`) 
        AS "Denominator"

FROM
(
    SELECT statements.`Source ID`, observations.`Relationship ID`, 
        ROUND((COUNT(statements.`Source ID`) / 
        (SELECT COUNT(sources.`Source ID`) FROM sources)),3) * 100
        AS "Ubiquity Percentage"
    FROM statements
    JOIN observations ON statements.`Old Observation ID` = observations.`Old Obs ID`
    GROUP BY `Relationship ID`
    ORDER BY `Relationship ID`
)
AS subquery
GROUP BY `Source ID`
ORDER BY `Source ID`
;

Which outputs the following: enter image description here

But I am unsure how to combine these into a single table, because when I try to run both under a single select, I get error 1242 that says the queries return more than one row and cannot be combined.

nbk
  • 45,398
  • 8
  • 30
  • 47
  • Not sure on MySQL workbench but you can check out UNION. Select statement UNION 2nd select statement. Must have same number and data types for columns. – Brad May 10 '23 at 16:16
  • is the source_id always the same for both resultsets – nbk May 10 '23 at 16:22
  • this could woud not work with a FULL_G'ROUP_BY – nbk May 10 '23 at 16:31

2 Answers2

0

I would simply run the 2 statements on the same line with a innerr join in the middle. That should solve it.

  • i doubt that would be the result the user wants he needs a FULL OUTER JOIN or an INNER JOIN – nbk May 10 '23 at 16:34
  • @nbk is correct. I don't want the values UNION'd in the same column. The numerator column and the denominator column must be next to each other, with the correct corresponding Source ID. But how can I join resultant tables together in a single query? – Tyson Humphrey May 10 '23 at 16:37
  • Ok I understand why as it could dupe the source column and can be inefficient. I have changed my answer. – Rene Yibowei May 10 '23 at 16:38
0

I case that both queries the same sourc_id you would use a INNER JOIN

So the result would be

SELECT A.`Source ID`, A.`Numerator`, B. `Denominator`
FROM
    (SELECT `Source ID`, COUNT(`Source ID`) 
        AS "Numerator"
        FROM
        (
            SELECT statements.`Source ID`, observations.`Relationship ID`, 
                ROUND((COUNT(statements.`Source ID`) / 
                (SELECT COUNT(sources.`Source ID`) FROM sources)),3) * 100
                AS "Ubiquity Percentage"
            FROM statements
            JOIN observations ON statements.`Old Observation ID` = observations.`Old Obs ID`
            GROUP BY statements.`Source ID`, observations.`Relationship ID`
            )
            AS subsubquery
        WHERE `Ubiquity Percentage` < 10
        GROUP BY `Source ID`
    ) A
    INNER JOIN 
    (SELECT `Source ID`,
    #`Ubiquity Percentage`,
    COUNT(`Source ID`) 
        AS "Denominator"

FROM
(
    SELECT statements.`Source ID`, observations.`Relationship ID`, 
        ROUND((COUNT(statements.`Source ID`) / 
        (SELECT COUNT(sources.`Source ID`) FROM sources)),3) * 100
        AS "Ubiquity Percentage"
    FROM statements
    JOIN observations ON statements.`Old Observation ID` = observations.`Old Obs ID`
    GROUP BY statements.`Source ID`, observations.`Relationship ID`
)
AS subquery
GROUP BY `Source ID`
) B ON A.`Source ID` = B.`Source ID`
ORDER BY `Source ID`

If the soure_ids are not the same you need a FULL OUTER JOIN with mysql doesn't provide

for the sake of it here is a FULL OUTER JOIN solution

SELECT A.`Source ID`, A.`Numerator`, B. `Denominator`
FROM
    (SELECT `Source ID`, COUNT(`Source ID`) 
        AS "Numerator"
        FROM
        (
            SELECT statements.`Source ID`, observations.`Relationship ID`, 
                ROUND((COUNT(statements.`Source ID`) / 
                (SELECT COUNT(sources.`Source ID`) FROM sources)),3) * 100
                AS "Ubiquity Percentage"
            FROM statements
            JOIN observations ON statements.`Old Observation ID` = observations.`Old Obs ID`
            GROUP BY statements.`Source ID`, observations.`Relationship ID`
            )
            AS subsubquery
        WHERE `Ubiquity Percentage` < 10
        GROUP BY `Source ID`
    ) A
    LEFT JOIN 
    (SELECT `Source ID`,
    #`Ubiquity Percentage`,
    COUNT(`Source ID`) 
        AS "Denominator"

FROM
(
    SELECT statements.`Source ID`, observations.`Relationship ID`, 
        ROUND((COUNT(statements.`Source ID`) / 
        (SELECT COUNT(sources.`Source ID`) FROM sources)),3) * 100
        AS "Ubiquity Percentage"
    FROM statements
    JOIN observations ON statements.`Old Observation ID` = observations.`Old Obs ID`
    GROUP BY statements.`Source ID`, observations.`Relationship ID`
)
AS subquery
GROUP BY `Source ID`
) B ON A.`Source ID` = B.`Source ID`
UNION 
SELECT A.`Source ID`, A.`Numerator`, B. `Denominator`
FROM
       (SELECT `Source ID`,
    #`Ubiquity Percentage`,
    COUNT(`Source ID`) 
        AS "Denominator"

FROM
(
    SELECT statements.`Source ID`, observations.`Relationship ID`, 
        ROUND((COUNT(statements.`Source ID`) / 
        (SELECT COUNT(sources.`Source ID`) FROM sources)),3) * 100
        AS "Ubiquity Percentage"
    FROM statements
    JOIN observations ON statements.`Old Observation ID` = observations.`Old Obs ID`
    GROUP BY statements.`Source ID`, observations.`Relationship ID`
)
AS subquery
GROUP BY `Source ID`
)  A
    LEFT JOIN 
(SELECT `Source ID`, COUNT(`Source ID`) 
        AS "Numerator"
        FROM
        (
            SELECT statements.`Source ID`, observations.`Relationship ID`, 
                ROUND((COUNT(statements.`Source ID`) / 
                (SELECT COUNT(sources.`Source ID`) FROM sources)),3) * 100
                AS "Ubiquity Percentage"
            FROM statements
            JOIN observations ON statements.`Old Observation ID` = observations.`Old Obs ID`
            GROUP BY statements.`Source ID`, observations.`Relationship ID`
            )
            AS subsubquery
        WHERE `Ubiquity Percentage` < 10
        GROUP BY `Source ID`
    ) B ON A.`Source ID` = B.`Source ID`
ORDER BY `Source ID`
nbk
  • 45,398
  • 8
  • 30
  • 47