-3

How are the very huge KDD-Cup 1999 and DARPA 1998/99 intrusion detection public datasets generated? Anyone who knows the software tool they have used to classify and keep the state of the sessions of these raw datasets? I mean once you generated the network data, how do you classify sessions as anomalous (intrusion) and normal? Is there any special software tool or machine which does that?

Desta Haileselassie Hagos
  • 23,140
  • 7
  • 48
  • 53
  • Duplicate of http://stackoverflow.com/questions/22602174/convert-http-request-to-kdd-cup-data-format-with-41-parameters/22603120 http://stackoverflow.com/questions/22500525/nsl-kdd-features-from-raw-live-packets/22522174 http://stackoverflow.com/questions/34758999/how-can-i-transform-the-tcpdump-data-to-kddcup99-intrusion-detection-dataset-for – Has QUIT--Anony-Mousse Jan 24 '16 at 14:52
  • And one more: http://stackoverflow.com/questions/14090121/how-to-derive-kdd99-features-from-darpa-pcap-file – Has QUIT--Anony-Mousse Jan 24 '16 at 15:04

1 Answers1

2

Stop using this data set.

It is simulated, and not realistic.

Modern attacks look nowhere like the early '90s kind of attacks they simulated there, and you can detect these attacks using trivial filters, no need to use machine learning.

This data set has a bad reputation in the ML community:

As a result, we strongly recommend that (1) all researchers stop using the KDD Cup '99 dataset, (2) The KDD Cup and UCI websites include a warning on the KDD Cup '99 dataset webpage informing researchers that there are known problems with the dataset, and (3) peer reviewers for conferences and journals ding papers (or even outright reject them, as is common in the network security community) with results drawn solely from the KDD Cup '99 dataset.

Whatever you do with this synthetic data set - it is useless.

Apart from that, read the documentation of the data. It seems they used BSM, if you happen to still have an SunOS (now Oracle) computer somewhere...

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
  • 1
    Part of that point is: these features are meaningless nowadays, too! The days of the "ping of death" and "smurf" attacks (Wikipedia) are long long gone. **The features do not work anymore, too,** – Has QUIT--Anony-Mousse Jan 24 '16 at 22:07
  • 1
    Todays attacks are SQL injections, cross site scripting (XSS) and similar attacks in the *content layer*, not on the network layer of TCP/IP anymore. – Has QUIT--Anony-Mousse Jan 24 '16 at 22:10