0

I have inherited a very large SQL Server database to work on (1,000,000+ lines of data). The data shows purchases and sales of items the details of the items are not important. I need to look for anomalies in this data by comparing data rows which occur close together in Date and Time.

The data as it looks at the moment is shown on the Left of the following picture.

enter image description here

However what I want to do is demonstrated on the right, with consecutive items being matched according to what was bought and what was sold. I produced this example manually by rearranging the order of the data rows in Excel, pairing up matching 'Bought' and 'Sold' items. This indicated one anomalous sale which has no corresponding purchase. I need to reproduce this type of analysis right across the database for all million plus rows of data.

The corresponding Bought and Sold transactions should take place within minutes of each other, if they don't that is potentially another indicator of an anomalous transaction.

To achieve this in SQL is currently beyond my scripting skills, and I ask if anyone out there might be able to give me some helpful hints / tips, or point me at an article that might get me started with solving this problem.

PJW
  • 5,197
  • 19
  • 60
  • 74

1 Answers1

1

Without knowing your table structure and having business logic it will be impossible to provide exact query. But what you need are Analytic Functions that allow you to read data from multiple rows and compare it current rows. Particular functions you need to look at

LAG() OVER (PARTITION BY ORDER BY)
LEAD() OVER (PARTITION BY ORDER BY)
SUM() OVER (ORDER BY)
SUM() OVER (PARTITION BY ORDER BY ROWS BETWEEN UNBOUND PRECEDING AND CURRENT ROW)

If you are running there are even more functionality available to like doing running total refer to this question Calculate a Running Total in SQL Server.

Community
  • 1
  • 1