0

I have a table in which each column represent a text from a manuscript, here is a simple example:

mss1  | mss2  | mss3
------------------------
The   |  The  |  A
big   |  big  |  big
black |       |
dog   |  dog  |  dog

I would like to display rows where two columns have different values (or the same values), for instance I want to see the differences between mss1 and mss3. The result should look like:

mss1  | mss3
---------------
The   |  A
black |

These seemed to be good solution candidates :

SELECT mss1, mss3 FROM table WHERE mss1 != mss3;

SELECT mss1, mss3 FROM table WHERE mss1 NOT LIKE mss3;

However it is not working, even after converting all columns from text to varchar of the same length.

I also tried LOCATE (See here) to find same values: if I can locate mss1 in mss3 and vice-versa, they must have the same value, right? But that was not successful either. Any idea? It seems like it should be easy, but I can't figure it out...

Community
  • 1
  • 1
Ely
  • 498
  • 5
  • 14
  • 2
    The first query you mentioned should work fine actually. – raina77ow Sep 19 '14 at 16:06
  • 1
    So what do you get when you try the first query? How is it "Not working?" – xQbert Sep 19 '14 at 16:08
  • It returns just every row in my table, and the opposite (mss1 = mss3) returns no row. – Ely Sep 19 '14 at 16:09
  • 2
    @Ely: then the the values in the rows are different; likely there's extra non-displayable characters in one or the other (most likely suspects are newline `\n`, carriage return `\r`, and tab `\t`. Also note that inequality comparison to NULL value will return NULL, not TRUE. The NULL-safe comparison operator (`<=>`) is a convenient way to get equality comparison between NULL values, e.g. `WHERE NOT ( mss1 <=> mss3 )` – spencer7593 Sep 19 '14 at 16:11
  • @Ely: You could use the `HEX` function as a way to "dump" the column contents for a closer inspection. `SELECT HEX(mss1), HEX(mss3)`. If the two values are not equal, then the hex representations will not be equal, and you can look for the actual difference; again, most likely it's a non-display character. (It's also possible, but much less likely, that the columns are defined with different charactersets and/or there's a characterset translation going on.) – spencer7593 Sep 19 '14 at 16:18
  • @spencer7593: Thanks for the help! mss3 terminates with 0D, which should be the carriage return. So I added `LINE TERMINATED BY '\r'`when I create the table, and it is fine. – Ely Sep 19 '14 at 16:39

1 Answers1

0

Perhaps the issue with your query is that NULL values are filtered out. For your purposes, I think this may do what you want:

SELECT mss1, mss3
FROM table
WHERE coalesce(mss1, '') <> coalesce(mss3, '');
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • This also returned every row in the table... now I wonder if something can be wrong in the values themselves? I loaded the data from a csv file, but I cannot see anything special. – Ely Sep 19 '14 at 16:13
  • This answer also returns the result which you need – Ullas Sep 19 '14 at 16:15
  • @Ely . . . Can you set up a SQL Fiddle with some examples of data that should match but do not? – Gordon Linoff Sep 19 '14 at 16:35