0

I have a table where column A and column B have some kind of relationship. The relationship is defined by the rule that the majority of all rows where column A is X column B will be Y.

The table may look like this:

| columnA | columnB  |
+---------+----------+
|  Katze  |   Cat    |
|  Katze  |   Cat    |
|  Katze  |   hkjhkj |
|  Katze  |   Cat    |
|  Hund   |   Dog    |
|  Hund   |   Dog    |
|  Hund   |   werw   |

How can I find the non fitting rows 3 and 7 here?

AdamMc331
  • 16,492
  • 10
  • 71
  • 133
midnight
  • 112
  • 10
  • 1
    Use group with count, and select the top row of a DESC order on the count – Spacemonkey Apr 29 '15 at 19:50
  • 1
    Start by writing a query that finds the most common value of `columnB` for each `columnA`. Put this in a subquery. Then find the rows that don't match this, as described [here](http://stackoverflow.com/questions/21633115/return-row-only-if-value-doesnt-exists-mysql?lq=1) – Barmar Apr 29 '15 at 19:50

2 Answers2

1

What you can do is count the number of times each pair exists like this:

SELECT columnA, columnB, COUNT(*) AS numAppearances
FROM myTable
GROUP BY columnA, columnB;

If you only want to see rows that appear once, use a HAVING clause to get those:

SELECT columnA, columnB, COUNT(*) AS numAppearances
FROM myTable
GROUP BY columnA, columnB
HAVING COUNT(*) = 1;

Here is an SQL Fiddle example.

AdamMc331
  • 16,492
  • 10
  • 71
  • 133
  • Great first suggestion, Thank you. But first one shows me all. And second one just works if I have not to identical wrong pairs like "Katze | hkjhkj". I think about finding a threshold like HAVING COUNT(*) < 4; – midnight Apr 29 '15 at 20:17
  • @midnight the first one was meant to be a demonstration on how to get the number of combinations, and the second one is my answer for how to get ones that do not have good pairs. Isn't that what you wanted? You said rows 3 and 7 and I returned them. – AdamMc331 Apr 29 '15 at 20:19
  • Dont get me wrong. I am already pretty happy. But in the real table I probably have several identical wrong, so COUNT(*) = 1 will not work. – midnight Apr 29 '15 at 20:24
  • I think http://sqlfiddle.com/#!9/f3304/4/0 will work for me with count <50 because the good rows will have a count bigger 50. – midnight Apr 29 '15 at 21:03
  • @midnight The value used for having can be changed to whatever you need. Does the query itself work for you though? – AdamMc331 Apr 29 '15 at 21:16
1

http://sqlfiddle.com/#!9/ac67a/2

SELECT t4.id, t4.columnA, t4.columnB 
FROM (
SELECT t.columnA,MAX(t1.major) correct
FROM table1 t
LEFT JOIN (
  SELECT *, COUNT(*) as major
  FROM table1
  GROUP BY columnA, columnB
) t1
ON t.columnA = t1.columnA
  AND t.columnB = t1.columnB
GROUP BY t.columnA) t3
LEFT JOIN (
  SELECT *, COUNT(*) as major
  FROM table1
  GROUP BY columnA, columnB
) t4
ON t3.columnA = t4.ColumnA
  AND t3.correct>t4.major
Alex
  • 16,739
  • 1
  • 28
  • 51