1

Basicaly my problem is that I need to remove duplicate values which are not correct (please see below). I Can't use Destinct function as it will remove some correct values. If you have any suggestions I would appreciate it. If you need any more clarification please let me know :)

I have two tables.

----------------------------------------------------------
CUSTOMER | Ammount | Invoice number | Time Stamp
----------------------------------------------------------
 A       | 57000,2 | 631            | Time Stamp
 A       | 56000   | 631            | Time Stamp
 A       | 55000,1 | 632            | Time Stamp
 A       | 54000   | 632            | Time Stamp

And

--------------------------------------------------------------------------
CUSTOMER |        FREE TEXT           |Invoice number| Time Stamp
--------------------------------------------------------------------------
 A       | 57.000,2 invoice number 631 | 631          | Time Stamp
 A       | 56.000   invoice number 631 | 631          | Time Stamp
 A       | 55.000,1 invoice number 632 | 632          | Time Stamp
 A       | 54.000   invoice number 632 | 632          | Time Stamp

I use this query:

Select A.CUTOMER, A.AMMOUNT, B.FREE_TEXT, B.Invoice_number   
FROM Table1 A,
Table2 B
WHERE A.CUSTOMER = B.CUSTOMER
AND A.Invoice_number = B.Invoice_number
AND B.Invoice_number IN ('631','632')
AND A.CUSTOMER = 'A'
AND B.Time_stamp >= TIMESTAMP('2015-01-01 00:00:00')
AND A.Time_stamp >= TIMESTAMP('2015-01-01 00:00:00')

And results are with duplicates, and 1 of them is incorrect, result looks like this:

 A       | 57000,2 | 57.000,2 invoice number 631  | 631 
 A       | 56000   | 57.000,2 invoice number 631  | 631
 A       | 57000,2 | 56.000   invoice number 631  | 631 
 A       | 56000   | 56.000   invoice number 631  | 631 
 A       | 55000,1 | 55.000,1 invoice number 632  | 632 
 A       | 54000   | 54.000   invoice number 632  | 632 
 A       | 55000,1 | 55.000,1 invoice number 632  | 632 
 A       | 54000   | 54.000   invoice number 632  | 632 

And I want it to be like:

 A       | 57000,2 | 57.000,2 invoice number 631  | 631  |
 A       | 56000   | 56.000   invoice number 631  | 631  |
 A       | 55000,1 | 55.000,1 invoice number 632  | 632  |
 A       | 54000   | 54.000   invoice number 632  | 632  |
Saugiklis
  • 19
  • 2
  • Try reading up about [`DISTINCT`](http://www.tutorialspoint.com/sql/sql-distinct-keyword.htm) – ydaetskcoR May 27 '15 at 17:43
  • But when use distinct, you will end up missing some values in last column as you will get only one value – Pratik Joshi May 27 '15 at 17:44
  • It's always important to know why your duplicating records. In this case you have a record for the invoice 631 and 632 in each table twice. If there's no other field you can join on to eliminate the duplicates, then you're only choice is to use a group by or a distinct. – xQbert May 27 '15 at 17:51
  • possible duplicate of [Finding duplicate values in a SQL table](http://stackoverflow.com/questions/2594829/finding-duplicate-values-in-a-sql-table) – David Manheim May 27 '15 at 20:15

3 Answers3

0

Use Group by condition.

Select A.CUTOMER, A.AMMOUNT, B.FREE_TEXT, B.Invoice_number   
FROM Table1 A,
Table2 B
WHERE A.CUSTOMER = B.CUSTOMER
AND A.Invoice_number = B.Invoice_number
AND B.Invoice_number IN ('631','632')
AND A.CUSTOMER = 'A'
AND B.Time_stamp >= TIMESTAMP('2015-01-01 00:00:00')
AND A.Time_stamp >= TIMESTAMP('2015-01-01 00:00:00')
GROUP BY A.AMMOUNT 
Pratik Joshi
  • 11,485
  • 7
  • 41
  • 73
  • I was forced to: GROUP BY A.CUTOMER, A.AMMOUNT, B.FREE_TEXT, B.Invoice_number and this gave me the same result :( – Saugiklis May 28 '15 at 11:03
0

In this case I'd like to understand how these tables relate. based on your comments it's by both customer, invoice_number, and amount, though amount is in a free form text field in one table.

if we assume formatting is consistent in this free_text field.... and we assume mySQl vs Oracle.

SELECT A.CUTOMER, A.AMMOUNT, B.FREE_TEXT, B.Invoice_number   
FROM Table1 A 
INNER JOIN Table2 B
 on A.CUSTOMER = B.CUSTOMER
AND A.Invoice_number = B.Invoice_number
and concat_ws(' ', A.Amount, 'invoice number', A.invoice_number)= B.Free_Text
where 
AND B.Invoice_number IN ('631','632')
AND A.CUSTOMER = 'A'
AND B.Time_stamp >= TIMESTAMP('2015-01-01 00:00:00')
AND A.Time_stamp >= TIMESTAMP('2015-01-01 00:00:00')

though performance may be sluggish due to string concatenation. and inability to use indexes on the amount.

xQbert
  • 34,733
  • 2
  • 41
  • 62
  • Yes this looked a possible solution, but I can't use it as in freetext field there is used other number format (I have updated it in question). As in ammount colum thousands are not separated but in free text field they all are separeted by dot. So this doesn't help, or maybe I can somehow update the query ? – Saugiklis May 28 '15 at 11:07
-1

DISTINCT will not work for this problem.

It looks like you need to restrict rows based on Ammount as compared to the FREE_Text

The concatenation below might need to be adjusted depending on what database engine you are using.

AND B.FREE_TEXT LIKE '%' + A.Ammount + '%'

or

AND B.FREE_TEXT LIKE CONCAT('%', A.Ammount,'%')

A warning: This may not perform well if you are working with a lot of rows.

C.Heyer
  • 1
  • 2
  • Thanks for your answer, this looked a possible solution, but I can't use it as in freetext field there is used other number format (I have updated it in question). As in ammount colum thousands are not separated but in free text field they all are separeted by dot. So this doesn't help, or maybe I can somehow update the query ? – Saugiklis May 28 '15 at 11:08