0

I have a small database, holding the details of just under 400 ponies. I wish to query that table and return a table showing the pertinant details of each pony, and it's owner's and breeder's names. The data is held primarily like so:

  1. profiles - a table holding all info assigned to each individual pony, including it's sire's and dam's reg numbers, and it's owner's and breeder's DB assigned id's.
  2. contacts - a table for the people's info. Joined as 'owner' and again as 'breeder' in the query below.
  3. prm_* - multiple parameter tables, holding broad details such as colour, breed, etc.

Where I am running into trouble is when trying my first self join: querying the profiles table three times in order to retrieve the names of the sire and dam for each profile, as well as the pony's own name to begin with. When I run the query, it returns duplicate rows for many (not all) profiles. Using DISTINCT eliminated most of these, but the issue remains with the non-identical results, particularly for those ponies where no sire or dam is on record.

I have googled the problem, and it does appear here and there, but I cant quite grasp what happening in the solutions given. I'm not even certain why the problem occurs at all. Can someone please step me through the issue and the solving of it? I'd be most grateful.

My query as it stands (returns 408 results, from only 387 ponies!):

include 'conn.php';
        ?>
<table class="admin-display">
<thead><tr><th>No:</th><th>Name:</th><th>Sire:</th><th>Dam:</th><th>Age:</th><th>Colour:</th><th>Gender:</th><th>Owner:</th><th>Breeder:</th></tr></thead>  
<?php
$i=1;

$sql = mysql_query("SELECT DISTINCT p.ProfileID, p.ProfileOwnerID, p.ProfileBreederID, p.ProfilePrefix, p.ProfileSireReg, p.ProfileDamReg,
                p.ProfileGenderID, p.ProfileAdultColourID, p.ProfileColourModifierID, p.ProfileYearOfBirth, 
                p.ProfileYearOfDeath, p.ProfileLocalRegNumber, p.ProfileName,
                sire.ProfileName AS sireName, sire.ProfilePrefix AS sirePrefix,
                dam.ProfileName AS damName, dam.ProfilePrefix AS damPrefix,
                owner.ContactFirstName AS owner_fname, owner.ContactLastName AS owner_lname,
                breeder.ContactFirstName AS breeder_fname, breeder.ContactLastName AS breeder_lname,
                BreedGender, BreedColour, BreedColourModifier
                FROM profiles AS p
                    LEFT JOIN profiles AS sire
                        ON p.ProfileSireReg = sire.ProfileLocalRegNumber
                    LEFT JOIN profiles AS dam
                        ON p.ProfileDamReg = dam.ProfileLocalRegNumber
                            LEFT JOIN contacts AS owner
                                ON p.ProfileOwnerID = owner.ContactID
                            LEFT JOIN contacts AS breeder
                                ON p.ProfileBreederID = breeder.ContactID
                    LEFT JOIN prm_breedgender
                                ON p.ProfileGenderID = prm_breedgender.BreedGenderID
                            LEFT JOIN prm_breedcolour
                                ON p.ProfileAdultColourID = prm_breedcolour.BreedColourID
                            LEFT JOIN prm_breedcolourmodifier
                                ON p.ProfileColourModifierID = prm_breedcolourmodifier.BreedColourModifierID
                          WHERE p.ProfileName != 'Unknown'
                          ORDER BY p.ProfileID ASC");

while($row = mysql_fetch_array($sql)) {

    $id = $row['ProfileID'];
    $name = $row['ProfilePrefix'] . ' ' . $row['ProfileName'];
    if ($row['ProfileYearOfDeath'] > 0000) { $age = ($row['ProfileYearOfDeath'] - $row['ProfileYearOfBirth']); }
    elseif ($row['ProfileYearOfDeath'] <= 0000) { $age = (date('Y') - $row['ProfileYearOfBirth']); }
    $reg = $row['ProfileLocalRegNumber'];
    $sire = $row['sirePrefix'] . ' ' . $row['sireName'];
    $dam = $row['damPrefix'] . ' ' . $row['damName'];
    $colour = $row['BreedColour'];
    $gender = $row['BreedGender'];
    $owner = $row['owner_fname'] . ' ' . $row['owner_lname'];
    $breeder = $row['breeder_fname'] . ' ' . $row['breeder_lname'];

    echo '<tr><td>' . $i++ . '</td><td>' . $name . '</td><td>' . $sire . '</td>';
    echo '<td>' . $dam . '</td><td>' . $age . '</td><td>' . $colour . '</td><td>' . $gender. '</td>';
    echo '<td>' . $owner . '</td><td>' . $breeder. '</td></tr>';
}
echo '</table>';

mysql_close($con);
Eamonn
  • 1,338
  • 2
  • 21
  • 53

2 Answers2

0

The problem is going to be in the data - one of the tables that you're joining against has multiple rows on associated to the join key.

I recommend executing the query in stages. Start with the base query (taking out the field list):

SELECT count(*)
    FROM profiles AS p
    WHERE p.ProfileName != 'Unknown'

And then add the join tables in one at a time until you see the count increase...

SELECT count(*) 
    FROM profiles AS p
    LEFT JOIN profiles AS sire
        ON p.ProfileSireReg = sire.ProfileLocalRegNumber
    WHERE p.ProfileName != 'Unknown'

You should then be able to see where the duplicate is. If you want to easily see which record is duplicated, you can run this query:

SELECT p.Profile_id, count(*) cnt
    FROM profiles AS p
    LEFT JOIN profiles AS sire
        ON p.ProfileSireReg = sire.ProfileLocalRegNumber
    -- (all other joins)
    WHERE p.ProfileName != 'Unknown'
    GROUP BY p.Profile_id
    HAVING count(*) > 1

Then you can look at the details of the duplicated records.

Doug Kress
  • 3,537
  • 1
  • 13
  • 19
  • Hi Doug - is this to check IF any records are actually duplicated? because there are no actual duplicates, just duplicated returns. Eg: profile 'Flashy Fox', with no sire or dam on record, and only in the DB once, is listed four times. – Eamonn Aug 19 '11 at 16:13
  • @Eamonn - The group by solution 'hides' the problem. The problem is that one ore more of the tables into which you are joining has duplicate entries. Whether this is by design or not is your call. The steps I've given above will help you locate which table(s) have the duplicate entries. Setting up unique keys on each table for what you know should be unique could also help. – Doug Kress Aug 19 '11 at 16:59
0

Use GROUP BY over DISTINCT:

http://msmvps.com/blogs/robfarley/archive/2007/03/24/group-by-v-distinct-group-by-wins.aspx

rickyduck
  • 4,030
  • 14
  • 58
  • 93
  • Worked like a charm - was going to suggest you post it as an answer. I still dont know why the problem occured at all though... – Eamonn Aug 19 '11 at 16:22
  • Maybe this can help? http://stackoverflow.com/questions/426723/sql-group-by-versus-distinct – rickyduck Aug 19 '11 at 16:25