The main goal of this project was to detect novel intrusions against computer systems. Our approach was to analyze the behavior of programs and determine if they are misbehaving; this let us detect potential intrusions. In order to detect novel intrusions against systems, we used an anomaly detection based approach. That is, our approach involves learning the normal behavior of a program in order to determine when the program is misbehaving. This approach lends itself to detecting novel attacks against systems. In order to learn the normal behavior of programs, we used machine learning techniques in our intrusion detection algorithms. Specifically, we have developed three machine learning techniques that for use in anomaly detection an Elman neural network, a finite state automaton approach, and a string transducer.
The goal in using Elman neural networks for intrusion detection is to be able to generalize from incomplete data and be able to classify online data as being normal or intrusive. An artificial neural network is composed of simple processing units, or nodes, and connections between them. The connection between any two units has some weight, which is used to determine how much one unit will affect the other. A subset of the units of the network acts as input nodes, and another subset acts as output nodes. By assigning a value, or activation, to each input node, and allowing the activations to propagate through the network, a neural network performs a functional mapping from one set of values (assigned to the input nodes) to another set of values (retrieved from the output nodes). The mapping itself is stored in the weights of the network.
A string transducer is an algorithm that associates a sequence of input symbols with a series of output symbols. String transducers are most often used in computational biology and computational linguistics, where they are usually implemented using finite automata whose transitions or states are associated with output symbols. In the current context we use automata as well, but the input sequence is a string of BSM events, and the output sequence is a prediction for the next several events. Our use of string transducers as intrusion detectors is based on an examination of the probabilities of the output symbols at each state. During training, we estimate the probability distribution of the symbols at each state, and during testing, deviations from this probability distribution are taken as evidence of anomalous behavior.
The third system we developed to perform program-based intrusion detection we call a state tester. The state tester is a learning algorithm to infer a finite automaton to represent normal program behavior. The finite automaton merely has to accept any training sequence that is not abnormal. Of course, it should also reject abnormal BSM sequences, but since there are no abnormal BSM sequences in the training data this requirement cannot be formalized within the learning algorithm itself. Rather, we will evaluate the performance of the FAs empirically.
The strength of our system is in the detection of user-to-root (U2R) attacks, since these often involve exploitation of vulnerabilities in existing programs. We also wanted to see how well we would do on remote-to-local attacks, so we evaluated our systems on those attacks as well.