There are lots of ways one can validate data. Much of it depends on three things:
How much time do you have for validation?
What are your processing capabilities?
Is the data on a QA or Production SQL server?
If you are in QA and have lots of processing power, you can do basic checks:
- Where there any warnings or errors during the data load?
- Count the total number of items in the database vs. the raw file
- Count the total number of null records in the database
- Check the total number of columns vs. the raw file
- Check the length of the variables. Are they as expected?
- Are any character columns unexpectedly truncated?
- Are numeric columns out to the correct number of significant digits?
- Are dates reasonable? For example, if you expected dates from 2004, do they say 1970?
- How many duplicates are there?
- Check if the data in the columns make sense. A few questions you can ask: are any rows "shifted?" Are numeric variables in numeric columns? Is the key column actually a key? Do the column names make sense? Your check of null records should help detect these things.
- Can you manually calculate any columns and compare your calculation to the one in the file?
If you are low on processing power or are on a production server and do not want to risk degrading performance for other users, you can do many of the above checks with a simple random sample. Take, say, 100,000 rows at a time.; or, stratify it if needed.
These are just a few checks you can do. The more comparisons and sanity checks, the better off you are.
Most importantly, communicate these findings and anything that seems strange to the file owner. They should be able to give you additional insight to the data load is correct, or if they even gave you the right file in the first place.
You're loading the data and providing as many reasonable checks as possible. If they're satisfied with the outcome, and you're satisfied with the outcome, you should consider the data valid.