Spam emails are flooding the Internet. Research to prevent spam is an ongoing concern. SMTP traffic was collected from different sources in real networks and analyzed to determine the difference regarding SMTP traffic characteristics of legitimate email clients, legitimate email servers and spam relays. It is found that SMTP traffic from legitimate sites and non-legitimate sites are different and could be distinguished from each other. Some methods, which are based on analyzing SMTP traffic characteristics, were purposed to identify spam relays in the network in this thesis. An autonomous combination system, in which machine learning technologies were employed, was developed to identify spam relays in this thesis. This system identifies spam relays in real time before spam emails get to an end user by using SMTP traffic characteristics never involving email real content. A series of tests were conducted to evaluate the performance of this system. And results show that the system can identify spam relays with a high spam relay detection rate and an acceptable ratio of false positive errors.
A Doctoral Thesis. Submitted in partial fulfilment of the requirements for the award of Doctor of Philosophy of Loughborough University.