0

I am trying to search 1 table in the database and count the number of unique records where 3 fields do not match. I have the below but it doesn't work at all. I am an SQL noob so any help really is appreciated!

This is what I have so far

<? php
SELECT COUNT(*) 
FROM (
SELECT  DISTINCT field1, field2, field3
FROM table1);
$result = $query;
$row = mysql_fetch_array($result);
echo $row;
?>

Thanks for any help!

EDIT: I dont think this syntax does what I need it to.

I need to count the unique records in "table 1" on the basis that the fields "title","firstname", "surname" do not match another rows contents. For example the table below

+---------+-----------+-----------+-----------+
|   ID    |   Title   | Firstname | Surname   |
+---------+-----------+-----------+-----------+
|    1    |    Mr     |    J      |   Doe     |
|    2    |    Mrs    |    J      |   Doe     |
|    3    |    Mr     |    A      |   James   |
|    4    |    Mr     |    J      |   Doe     |
+---------+-----------+-----------+-----------+

The query would need to return the answer 3. There is only 1 row in the table above where "title", "firstname" & "surname" match and therefore does not get counted.

I hope that is a little clearer. I think I must be confused about what DISTINCT does!

EDIT AGAIN:

The "real world" scenario is I have a table with peoples details and want to extract them to send mail out but I do not want duplicates.

5ummer5
  • 153
  • 1
  • 13

4 Answers4

2
<?php
    $query = "select distinct field1, field2, field3 from table1"; 
    $results = mysqli_query($link, $query);    
    echo $results->num_rows;
?>

But to answer your question in SQL only,

select count(*) from (select distinct field1, field2, field3 from table1) as x;

because "every derived table must have its own alias."

Update: your understanding of distinct is accurate, actually. You can also use group by to do similar work, but group by is used for aggregating by columns, like this:

select count(*) as count, city, state from airports group by city, state;

this will give you a row for each unique city, state combination, and a single column for the number of occurrences of each, e.g.

count    city     state
--------|--------|---------|
3        Boise    ID
1        Marion   OH
24       Chicago  IL
2        Newark   DE
1        Corbin   KY
1        Ames     IA
2        Stuart   FL
...

but if you wanted just to get the count of the rows returned from that query, you'd do it a lot like you did, with a subselect:

select count(*) as rows from (select count(*) from airports group by city, state) as x;

rows
--------
2324

but there would be no reason for the overhead of doing the aggregation in the subselect. According to this guy, the two "generate the same query plan."

UPDATE 2: So the critical issue with your question is that you've asked for a solution to a problem you don't really have. Counting the rows will not help you send mail to anyone.

So what you probably want to think about is what level of "sameness" you're looking to filter out. Imagine your rows look like this:

Piet, J, Mondrian, 123 Main St, Columbus, Ohio, 43209
P, NULL, Mondrian, 123 Main St Apt. 3, Columbus, Ohio, 43209
P, Jan, Mondrian, 123 Main Street #3, Bexley, Ohio 43209

You'd probably be correct to assume this is the same person in all three cases, but which address is the best one? Your query above would send to all three. But if you group by last name and zip code, this would provide you only one of the three above. Is that the right answer? No. It's only right for those three. If your data is dirty, (which it is, if you have duplicates in it), only you know the right answer, and it depends on how dirty it is and how it got that way. But I can tell you this much: either you are going to risk sending some duplicates or you are going to spend a lot of time combing through dirty data. You have to choose which is a higher priority, your time or your appearance, because there isn't a one-size-fits-all query that's going to solve this problem. In order to create such a query, you'd need to manually go through the data to find out what the problem is. Obviously, if you did that you might as well have filtered it in Excel or something like that.

Community
  • 1
  • 1
tjb1982
  • 2,257
  • 2
  • 26
  • 39
  • This potentially sends a large amount of unused data from the server. Count the rows on the server and only send that number. – BonzaiThePenguin May 17 '14 at 01:37
  • All php code is run on the server. As far as I understand it, the MySQL database server gives the php client code a pointer/cursor to a result-set and some other data, one of which is the number or rows returned from the query. – tjb1982 May 17 '14 at 01:55
  • Thanks for the help, that works and returns a value but I think I am confused with what DISTINCT does. I have updated my question to hopefully make more sense! Thanks again for the help! – 5ummer5 May 17 '14 at 09:57
2

I'm not sure about your PHP code but the SQL query is correct (needs only an alias):

SELECT COUNT(*) 
FROM
  ( SELECT DISTINCT field1, field2, field3
    FROM table1
  ) AS x ;                                 -- you need to alias the derived table

You could also use this:

SELECT COUNT(*) 
FROM
  ( SELECT 1                                -- doesn't matter what is here
    FROM table1
    GROUP BY field1, field2, field3
  ) AS x ; 

or this query:

SELECT COUNT(DISTINCT field1, field2, field3) 
FROM table1 ; 
ypercubeᵀᴹ
  • 113,259
  • 19
  • 174
  • 235
0

I think this is what you want?

SELECT COUNT(*) FROM table1 WHERE field1 != field2 AND field2 != field3 AND field3 != field1;

EDIT: Oh wait, this is what you want:

SELECT * FROM People GROUP BY title, firstname, surname

You can play around with it here: http://sqlfiddle.com/#!2/f489e2/3

BonzaiThePenguin
  • 1,413
  • 13
  • 19
  • Thanks for the reply. I don't think I was clear enough with my question! I have updated it above to make more sense! Thanks again! – 5ummer5 May 17 '14 at 09:57
  • I'm still not entirely sure what you're asking, since your example has two rows where title, firstname, and surname do not match – 2 and 3. I edited my answer to show how to find those rows. – BonzaiThePenguin May 17 '14 at 16:25
  • Thanks for the reply. I have a table with peoples details and want to extract them to send mail out but I do not want duplicates. – 5ummer5 May 17 '14 at 20:25
  • Oh wait, I misread your question this time around! Just remove the HAVING part and it will do what you wanted. – BonzaiThePenguin May 17 '14 at 21:32
  • Thanks for the reply. I tried it on the sqlfiddle and it worked great but it doesn't work with my database for some reason. It just returns all of the records, not like it did in your example. – 5ummer5 May 18 '14 at 00:03
  • Apparently SQLFiddle uses SQL Server, while you're using MySQL. Not sure if that'd explain the difference in functionality. – BonzaiThePenguin May 18 '14 at 18:45
  • Wait never mind, SQLFiddle lets you choose which version of SQL you want to use, in the upper-left corner. Never noticed that option before! – BonzaiThePenguin May 18 '14 at 18:54
0

Here's a clue - both in terms of how to ask a question on SO, and how to figure out the answer to this one...

DROP TABLE IF EXISTS my_table;

CREATE TABLE my_table
(ID    INT NOT NULL AUTO_INCREMENT PRIMARY KEY
,Title   VARCHAR(12) NOT NULL
,Firstname CHAR(1) NOT NULL 
,Surname  VARCHAR(12) NOT NULL 
);

INSERT INTO my_table VALUES
(1,'Mr', 'J','Doe'),
(2,'Mrs','J','Doe'),
(3,'Mr', 'A','James'),
(4,'Mr', 'J','Doe');

You give two, contradictory, requirements...

First, the thing that DISTINCT can do...

SELECT x.*
     , y.id 
  FROM my_table x 
  LEFT 
  JOIN my_table y 
    ON y.id <x.id 
   AND y.title = x.title 
   AND y.firstname = x.firstname 
   AND y.surname = x.surname;
+----+-------+-----------+---------+------+
| ID | Title | Firstname | Surname | id   |
+----+-------+-----------+---------+------+
|  1 | Mr    | J         | Doe     | NULL |
|  2 | Mrs   | J         | Doe     | NULL |
|  3 | Mr    | A         | James   | NULL |
|  4 | Mr    | J         | Doe     |    1 |
+----+-------+-----------+---------+------+

Second, the thing that DISTINCT cannot do...

SELECT x.*
     , y.id 
  FROM my_table x 
  LEFT 
  JOIN my_table y 
    ON y.id <> x.id 
   AND y.title = x.title 
   AND y.firstname = x.firstname 
   AND y.surname = x.surname;
+----+-------+-----------+---------+------+
| ID | Title | Firstname | Surname | id   |
+----+-------+-----------+---------+------+
|  1 | Mr    | J         | Doe     |    4 |
|  2 | Mrs   | J         | Doe     | NULL |
|  3 | Mr    | A         | James   | NULL |
|  4 | Mr    | J         | Doe     |    1 |
+----+-------+-----------+---------+------+
Strawberry
  • 33,750
  • 13
  • 40
  • 57
  • I have a table with peoples details and I want to send a mailout, I need to make sure people with the same surname, address & postcode do not get 2 letters. I tried to find a query that was suitable but I am a TOTAL mySQL noob so appreciate the help! – 5ummer5 May 17 '14 at 20:30
  • So why do you need to count stuff? – Strawberry May 17 '14 at 22:16
  • I also need a total number of unique records that are in the table as well as getting the records for a mailout. I thought I would ask the count question first then try and adapt it after to work for the mailout rather than asking too many questions. – 5ummer5 May 17 '14 at 23:52