Hello dear boys and girls, I apologize if the question is not in the right place (talking about the right forum - stackoverflow, etc.)
I can use python and R on a semi-intermediate level... I have been wondering for a while about the topic of this question:
- If i have a data set that i can build a statistical model on then all is well. I build the model, test it, test it again, make a score card and poof.
- I want to know... Is there a way of (theoretically or even practically) to detect irregularities/outliners in data without a previous data set that (for example) you can build a statistical model on. I mean a way that excludes checking 400 million records and flagging the irregs as such and then doing something productive.
Is this possible? Identifying such things without a preset solid definition for the given data set? Lets take accounting records for example. I have "x" amount of records and i want to detect any records that are not "natural" for the data set. is there a way to code a system that does that - given that you don't have prior data with such records flagged as not normal?