I have inherited a very large SQL Server database to work on (1,000,000+ lines of data). The data shows purchases and sales of items the details of the items are not important. I need to look for anomalies in this data by comparing data rows which occur close together in Date and Time.
The data as it looks at the moment is shown on the Left of the following picture.
However what I want to do is demonstrated on the right, with consecutive items being matched according to what was bought and what was sold. I produced this example manually by rearranging the order of the data rows in Excel, pairing up matching 'Bought' and 'Sold' items. This indicated one anomalous sale which has no corresponding purchase. I need to reproduce this type of analysis right across the database for all million plus rows of data.
The corresponding Bought and Sold transactions should take place within minutes of each other, if they don't that is potentially another indicator of an anomalous transaction.
To achieve this in SQL is currently beyond my scripting skills, and I ask if anyone out there might be able to give me some helpful hints / tips, or point me at an article that might get me started with solving this problem.