Abstract
The deployment of honeypots is one of the methods used to collect data about attack trends in computer networks. Two or more honeypots on a network form a honeynet. The lack of a standard format for data representation makes the exchange and centralization of data generated by different technologies difficult. This also restricts the correlation and analysis of this information. This paper presents the HIDEF ( Honeypots Information and Data Exchange Format), a proposal for a format to enable the representation and exchange of data and information produced by organizations using honeypots and honeynets
Introduction
In the past few years the number of computer security incidents on Internet connected networks has continuously increased . As a result, there is an increasing need for attack data correlation and tools to help understand attacks and identify trends. The deployment of sensors in computer networks to gather malicious traffic is one of the methods used by researchers and security professionals to collect data for attack trends analysis. One type of sensor that has been used is a honeypot, a security resource whose value lies in being probed, attacked or compromised. Another type of sensor used is a honeynet, a network specifically designed for the purpose of being compromised, that has control mechanisms to prevent it from being used as a base of attacks against other networks. The honeypot technologies have considerably developed in the past few years, mainly because of the increase in research activities and the development of new ways to collect data about attacks. However, it is very important to enable a more robust and complete analysis of the traffic captured by honeypots and honeynets that use different technologies and are deployed in different locations around the world. This analysis would make it possible to better understand the attacks’ distribution and how they interrelate. To allow the correlation of information from different honeypot and honeynet implementations, it is not only necessary to have a system to collect and analyze these data, but also to have a format to represent this information. The existence of a standard format would facilitate the exchange of data, because it would enable the development of tools to automate the generation and retrieval of information from different sources. The existing research in the area of honeypot data collection and analysis is concentrated in visualizing and correlating data from a unique honeynet or from a set of honeypots using similar technologies. However, none of them considers the interoperability issue of analyzing data generated in different architectures by different technologies. On the other hand, standard formats to represent data related to attacks and intrusion detection, such as the Intrusion Detection Message Exchange Format (IDMEF) and the Incident Object Description Exchange Format (IODEF) are not adequate to represent data from honeypots. To contribute to this research area this work proposes HIDEF (Honeypots Information and Data Exchange Format), a format to represent and exchange data and information produced by organizations using honeypots and honeynets. This proposed format intends to solve the interoperability issue among different honeypot implementations, while preserving compatibility with other standards like IDMEF and IODEF.
This paper would be introducing Honeynet. It is the technology used by Pakistan Honeynet Project to gather information about the motives and tactics of the Black Hat community targeting Pakistan’s’ networks. By the end of the paper, I hope you will understand what honeynet is, what it can do for you, its benefits, different risks / issues, types of honeynet technologies available, and how you can deploy them in your network.
Overview
Normally due to the size of traffic and activity on the production network, we cannot log the level of detail that security practitioner often needs. Honeynets are a way to get much more detailed logging for certain malicious situations than would be possible with normal logging. Suppose you have firewall, which is properly configured to stop attack on port 111. It is good, but you won’t be able to learn about the attack, which can be bad. There might be situations when you want to see the content of the traffic. It can be when you want to know the intentions of the attackers and how much they know about your network. It can be when a particular system is getting lots of probes. Also, when you think that a new attack or technique has been used to exploit your network.
Honeynet is a high-interaction network of honeypots. High-interaction Honeynet is a network of actual systems running real operating systems and services. They give the ability to learn more about the attacks and attackers since they are running actual operating system with real services, which an attacker can compromise. Unlike low-interaction honeypot emulation, they are running everything real that comes with the operating system. Honeynets are like real networks comprising different systems but the difference is that they don’t have any production value and all the activity is logged and analyzed. They don’t run any production services, so they don’t have any production activity or interaction. As a result, any activity happens on the honeynet is supposed to be from an attacker.
We know Honeynet is a highly controlled network of systems designed for capturing attackers’ activity. They can be created in accordance with the existing network infrastructure to give attacker a feel of real network to which she interacts with. It all depends on the services you want to provide. It can be from clustered Exchange server on a Windows 2003 to an ISO Linux environment. It all depends on you!
Honeynet is not a single product but composed of multiple technologies and products. It is an architecture. The goal is to create a highly controlled environment in which everything is monitored and logged. Once the architecture is created, you put your target systems inside it. Normally target systems are default installations of widespread operating systems placed on external networks.
Mechanism
Honeynet is a not a single product that we install and it is ready to go. Honeynet is an architecture composed of multiple technologies and products. The architecture depends on you but the deployment can be very complex. Improper deployment can get you into trouble. There are certain requirements for a proper deployment of Honeynet. The honeynet deployment emphasizes Data Control, Data Capture, and Data Collection.
Data Control is defined as management or tracking of the activity to and from the Honeynet. You won’t like getting emails regarding unauthorized scans from your network or attackers using Honeynet to harm other systems on the network. Data Control is used to prevent attackers from attacking other systems. It has been observed that attackers use compromised systems to discover other vulnerable systems on the internet. There are different approaches to implement data control. We do not want attackers to know that they are under a controlled environment but also we do not want to give them full freedom. Normally there are three techniques used for data control, i.e. connection control, bandwidth control and intrusion prevention. These three techniques make up a powerful data controlling system. Connection control is used to limit the outbound connections from the Honeynet. Usually inbound connections to Honeynet are not controlled. We just don’t want an attacker to harm other systems through Honeynet. A certain limit is set for outbound connections and once the limit is achieved, all outbound connections are blocked. This minimizes the risks of different network attacks. Bandwidth control is used to manage the inbound and outbound network bandwidth of Honeynet. You don’t want an attacker to choke your network pipe with a DoS attack. So bandwidth control allows you to set a limit on the amount of network bandwidth your Honeynet can consume. Intrusion prevention is used to block known attacks. It is done by inspecting each packet at the gateway, and if it matches with IDS rules, the packet is dropped or modified with an alert. These techniques cannot completely eliminate the possibility of an attacker harming others systems. They can only help you in minimizing the risks, but cannot completely eliminate them.
Data Capture is defined as the logging of the entire attacker’s activity in the Honeynet. The purpose of the Honeynet is to learn and analyze the attacker’s activity. Honeynet does not have any value without logging, it is useless. There are different techniques and approaches used for data capturing. The purpose of data capture is to log as much information as we can without attackers knowing it. Data logging is done on multiple layers to avoid single point of failure. Normally there are three layers of data capture, i.e. firewall activity, network activity, and system activity. Firewall activity is logged through data control script. It logs all inbound and outbound connections in /var/log/messages. Firewall logs give an overview of the activity and provide first indication of the Honeynet compromise. Network activity is logged through a network sniffer, i.e. snort or tcpdump. The purpose of logging network activity is to capture every packet (with its full payload) crossing Honeynet. System activity logging is the most complex and critical task to accomplish. They give you exciting and ample amount of information, as the activity is captured on the honeypot itself. The advantage of capturing activity on the honeypot itself is that it makes encryption ineffective, and as we are logging on the system level, we capture everything unencrypted. We know that different risks are involved with every mechanism; therefore we also have to minimize the risk of attackers knowing that they are being logged. There are different techniques used to reduce the chances of attackers detecting the data capture mechanism. Normally, we make changes on the honeypots by installing customized data capture patches and kernel modules. Sebek is one of the tools used for logging attacker’s activity on the honeypot, which is installed as a hidden kernel module. Secondly it is recommended to store the captured data on a secured remote system rather than storing locally. It reduces the chances of attackers detecting the captured data, and deleting or modifying it. These techniques and mechanism can never completely eliminate the risks, but can only reduce them.
Data Collection is defined as the collection of data from multiple honeynets to a central location. It is not the requirement for a standalone Honeynet deployment. The purpose of data collection is to centrally capture and combine the information collected from multiple Honeynet deployments.
Architectures
We have discussed the three requirements of Honeynet architecture. There are different ways you can implement these architectures but we will discuss two architectures evolved by the Honeynet Project. These two architectures are known as GenI (first generation) and GenII (second generation). GenI was the first Honeynet architecture deployed by the project in 1999. After learning the lessons, identifying the problems and issues in GenI architecture, GenII was evolved in 2002.
GenI (1st Generation)
GenI Honeynets were developed in 1999 by the Honeynet Project. The purpose of GenI Honeynet was to capture the maximum amount of attacker activity and give them a feel of real network. The architecture of GenI Honeynet is uncomplicated. The approach used for data capture and data control is simple, which makes it detectable by attackers sometimes. However, it can capture great deal of information and even can help in capturing unknown attacks. The ability of this architecture to control and capture attacks makes it very effective in capturing known, automated, and beginner level attacks. GenI Honeynet is not effective in capturing advance attacks because it can be easily detected by advanced attackers. We know that the purpose of data control is to keep track of attacker’s activity to and from Honeynet. We don’t want her to harm other non-Honeypots systems on the network. Data Control is done by putting a layer three firewall in front of Honeypots. The firewall works as a gateway in NAT (Network Address Translation) mode and controls all the inbound and outbound connections. It allows all inbound connections, but limits outbound connections. The firewall keeps track of all the outbound connections an attacker makes, and when a certain limit is reached, it blocks all outgoing connections from the Honeynet. The firewall we use for setting up data control mechanism is IPTables, an Open Source firewall.
As we know that the purpose of data capture is to log as much information as we can without attackers knowing it. In GenI Honeynet, Data logging is done on multiple layers to avoid single point of failure. The first layer of data logging is firewall. We don’t get detailed information through firewall logs though, but any information available is helpful in the case of Honeynet. Firewall logs give information about the source / destination IP address, source / destination port, protocol, and data / time. The second layer of data logging is network. An Intrusion Detection System (IDS) is deployed on the gateway, which logs every packet and its payload traveling on the wire. IDS logs provide most useful information, as every packet traveling to and from the Honeynet is captured. Secondly, IDS alerts when it catches any suspicious activity. Most of the IDS have signatures database of known attacks, and if the packet traveling on the wire matches with the signature, the IDS generates an alert. The IDS used for setting up data capture mechanism is Snort, an Open Source IDS. The third layer of data logging is system. We log attacker’s activity on the Honeypot itself. The keystrokes and screenshots are captured by installing a modified version of bash or a kernel module. The logs are securely forwarded to a remote server over the network. The disadvantage of transferring the logs over the network is that it can be easily detected by an advanced attacker.
GenII (2nd Generation)
GenII Honeynets were developed in 2002 by the Honeynet Project after identifying the problems and issues in GenI architecture. The problems in GenI were solved by changing the architecture of GenII Honeynet. In GenI architecture, firewall works on layer three which is easily detectable. This problem is addressed by making the gateway a layer two device, which makes it harder to detect. The firewall works in BRIDGE mode and controls all the inbound and outbound connections like GenI architecture. The new ability added to the gateway in GenII architecture is the IPS (Intrusion Prevention System). Basically, IPS works same as an IDS but has the capability to block and modify the attacks also. As we know most of the IDS have signatures database of known attacks. So, if the packet traveling on the wire matches with the signature, the IPS can block or even modify that packet. This capability helps in distinguishing between legitimate and malicious activity. If an attacker would try to run an exploit against a non-Honeypot system, the IPS would be able to block or modify the attack even if it is under connection limit. IPS mechanism works with know attacks only, so unknown attacks can bypass this technology. That is the reason it is combined with connection control mechanism, so that the attack can be blocked after a certain limit if it doesn’t matches with the signature. This mechanism makes the Honeynet harder to detect.
Data capture mechanism in GenI architecture is somewhat same as the GenII. Data logging is done on three layers, i.e. firewall layer, network layer and system layer. The most difficult part is to capture the attacker’s activity on the Honeypot itself. The newest and greatest development that has been done for data capturing during the GenII period is Sebek. Sebek is a client-server tool designed to capture attacker’s activity on the Honeypot. It is a hidden kernel module capable of tracking attacker’s activity. Once the Sebek client is installed on the Honeypot, it starts transmitting the data using UDP to its server. Sebek client hides its activity from the attacker. Sebek server captures the activity from the client and logs it.
The capability of Data Collection and Alerting is also introduced in GenII Honeynets. Data Collection mechanism lets you collect and analyze the data from distributed Honeynet deployments. Alerting notifies if someone breaks into the Honeynet, which helps in keeping track of the activity.
Virtual Honeynet
Virtual Honeynet lets you run everything on a single computer. It is deployed by running virtualization software, that allows creating multiple virtual machines and running separate operating systems on them. This technology is very effective when we have limited availability of the resources. Also, Virtual Honeynet is easier to manage as compared to traditional Honeynet, since everything runs on a single machine. There are certain limitations for the type of architecture and operating system you can use for Virtual Honeynet. Also, there are risks involved in Virtual Honeynet deployment. If an attacker is able to compromise the operating system on which virtualization software is running, he would be able to control the whole system. Secondly, if an attacker compromises the system in your Virtual Honeynet, he may be able to detect that the system is running in a virtual environment. The possible solutions that Pakistan Honeynet Project has used and tested are VMWare Workstation, VMWare GSX Server, Microsoft Virtual PC and User Mode Linux. The advantage of using User Mode Linux is that it is open source and free. All of these products have nice features and capabilities.
Alerting
There is one last element you need to consider before finishing your honeynet, alerting. Having someone break into your honeynet is a great learning experience, unless you are unaware that someone has broken into it. Ensuring that you are notified to a compromise (and responding to it) are critical for a successful honeynet. Ideally you could have round-the-clock monitoring by a seasoned admin. However, for organizations that cannot support 24/7 staff, one alternative is automated alerting. One option for automated monitoring is Swatch, the Simple Watcher. Swatch is an automated monitoring tool that is capable of alerting administrators of possible successful attacks on the honeynet. Swatch monitors log files for patterns described in a configuration file. When a pattern is found it can disseminate alerts via email, system bells, phone calls, and can be extended to run other commands/programs. A simple Swatch rule contains the pattern to watch for followed by a list of actions to take. By default Swatch will include in email alerts the line in the log file that matched the given rule. An example email for the above rule would look like the example below.
To: admin@honeynet.org
From: yourdatacontrol@yourdomain.org
Subject: ------ ALERT!: OUTBOUND CONN --------
Apr 6 17:19:05 honeywall FIREWALL:OUTBOUND CONN UDP:IN=br0
PHYSIN=eth1 OUT=br0 PHYSOUT=eth2 SRC=192.168.1.101
DST=63.107.222.112 LEN=123 TOS=0x00 PREC=0x00 TTL=255 ID=43147
PROTO=UDP SPT=5353 DPT=79 LEN=103
Even with the automated tools described in the Data Control section, an effective honeynet requires constant supervision. Properly configured, Swatch can be used to quickly notify administrators of events on their network. However, do not depend on outbound connections as your only source of alerting. For example, attacker may compromise the system, but never attempt an outbound connection. Be sure to montior other sources of information, such as keystrokes collected by the Sebek clients. More advanced detection, reporting, and alerting mechanisms are under development for the Honeywall CDROM.
Testing
Once we have configured Data Control and Data Capture, the next step will be to test the gateway. To test your deployment, below are some basics steps you can take. The Honeywall CDROM comes with a more thorough test plan, which can be found in the documentation section. To test the gateway, we will need a system on the external interface, we will call this the test system. Based onFigure, we will use the system 192.168.1.20 as our test system. We begin by first testing Data Control, does our honeynet successfully contain inbound and outbound activity?. First, initiate a connection from the test system to one of the honeypots within the honeynet. Based on your ruleset, this connection should have most likely been allowed. If so, you would have an entry similar to this in /var/log/iptables
Once you have confirmed inbound connectivity is working, the next step is to test outbound. Begin by accessing one of the honeypots behind the gateway, (we recommend you access honeypots from the console, as the honeypot will log locally any remote connections, such as SSH) . From there, initiate multiple outbound connections to the test system. This will replicate one of the honeypots has been compromised, and an attacker is attempting to initiate outbound connections, and potentially an attack. The connections should be logged to /var/log/iptables on the Honeywall CDROM. In our case, we can attempt multiple outbound FTP connections to the test system on the production network. When our limit of 15 TCP connections is hit, a "Drop TCP" entry is logged. You would most likely have entries similar to this.
Next, we will want to confirm that our NIPS technology, snort_inline is working. Fortunately, the snort-inline toolkit comes with test rules, designed specifically for test. Be sure to enable those rules before testing snort_inline. The rulebase you are running will also determine which test you run. If you are running a drop ruleset, you test by simply by first enabling the default test rule, restart snort_inline using the start script, then attempt an outbound Telnet connection. Snort_inline should detect, drop, and log the attempt. Your Telnet attempt should not work, it should simply time out. If you are running a replace ruleset, you test by enabling the default replace test rule, restart snort_inline, then attempt a simple HTTP GET command. Snort_inline should detect, modify, and log the attmempt. To confirm the modification happens, be sure to sniff the EXTERNAL interface eth0. Do NOT sniff the internal interface eth1, as the GET command is not modified until after it passes through the internal interface, but before it leaves the external interface. Be sure that once you are done testing, you disable the rules (or the bad guys could simply run the default tests also).
Once we confirm Data Control we then want to ensure that Data Capture is working. Remember, if our honeynet is not logging all activity, then the honeynet has no value. We confirm Data Capture by looking at the logs, did the Honeynet capture the Data Control test we just ran? We begin with the firewall logs. This test is simple, all of our connections should have been logged to /var/log/iptables. Specifically, we should first see the inbound connections from the test system to the honeypot. Second, we should see the outbound connections from the honeypot to the test system. Last, we should see an alert message indicating that the outbound limit has been met, and all further connections will be dropped. We already tested this when we confirmed Data Control Next, we review the network logs. We want to be sure we captured every packet and the full payload of every connection, both inbound and outbound. Based on experience, we have found its best to rotate the logs on daily basis. The log we are primarily interested in is the binary log capture, called snort.log.*. In addition, you may find various directories made up IP addressess. These contain the any output of packets containing ASCII content, such as FTP commands or an .html page. You can confirm Snort logged all packets by analyzing the binary log file, as follows:
honeywall #snort -vdr snort.log.*
Finally we review the Sebek logs. These are the keystrokes captured by the Sebek kernel module, then dumped onto the network. These packets should have been captured by the network sniffer (Snort). Its highly recommend you use the GUI interface WAlleye that comes with the Honeywall CDROM for all Sebek analysis. If you were able to analyze the collected data, you Data Capture is working successfully! Also, check your email, you should have been alerted to the test you just ran. Swatch should have alerted you to your own tests that you just ran. Now, since you are a true security professional, we know you are going to reboot your honeynet gateway and test Data Control and Data Capture one more time, just to be sure. Also, we encourage organizations to do more extensive testing, as documented in the Honeywall CDROM docs section.
What the honeynet collects
One of the best ways to demonstrate how a honeynet works (and its value) is to review a captured attack.
In February 2002, Honeynet Project member Michael Clark deployed a virtual using GenI technology, similar to the architecture in Figure 1. Several Linux-based honeypots served as intended victims. On 18 February, a standard FTP exploit compromised one of the Linux honeypots in the honeynet?in this case, the tool used was TESO's wu-ftpd massrooter, a well-known and highly effective automated hacking tool. The honeynet easily detected and captured this attack, including the attacker's initial keystrokes on the compromised system. By deconstructing all the network packets that Snort captured, we easily determined the attack's nature and the commands executed on the remote system.
Once exploited, the attacker executed a command to download the binary foo from a remote system, install it as /usr/bin/mingetty, and execute the binary. The attacker then left the system. This demonstrates that just capturing keystrokes does not give you all the information you need. What is the binary foo, and what is the attacker attempting to achieve?
Shortly after the binary was executed, the honeynet filter mechanism for data control (in this case, IPTables) logged inbound and outbound packets being sent to and from the hacked honeypot, but our sniffer Snort was not capturing or logging any of the activity. We had a failure. After analyzing the traffic, we identified our error. Someone was now sending nonstandard IP packets to the hacked honeypot?in this case, IP protocol 11 packets, otherwise known as Network Voice Protocol. The firewall logs captured this, but the sniffer didn't. We had fallen into the trap of designing our honeynet to capture what we expected the bad guys to do but not everything that they actually could do. Fortunately, because multiple layers of data capture comprised the honeynet, when one layer failed, the other layer picked up on the traffic.
Once identified, we corrected the mistake by reconfiguring Snort to capture and log all IP traffic?not just IP protocols 1, 6, and 17. We could now capture the packets and full packet payload of all the NVP traffic sent to and from the compromised Linux honeypot. At first, the packets were hard to understand; as Figure 4 shows, the packet payload appeared to be obfuscated or encrypted. Also, a variety of sources that appeared to be spoofed (such as army.mil) sent multiple identical packets.
Honeynet Project
The Honeynet Project maintains several honeynets around the world, whose data are all stored in a central server. To allow further correlation of the data, a set of requirements that must be fulfilled during the data capture in all honeynets was defined
• a record with the configuration of all active honeypots in the honeynet must be maintained;
• the data captured by the firewall, router or IDS (Intrusion Detection System) must be stored in GMT (Greenwich Mean Time) timezone;
• each honeynet must have a unique identifier, a name convention and a mapping, that allow to identify its location and configuration; Although the description of data collection to a central server is available, there is no information about how this server is structured or which type of correlation is performed.
Honeynet Research Alliance
The Honeynet Research Alliance is a forum where the participants are organizations from several countries, that perform research on honeypots and honeynets. Up to now there is no method available for these institutions to share data or analysis among them. However, some organizations send data to a central server maintained by the Honeynet Project. This central server can deal only with data generated by this specific set of capture and control tools: the libpcap library binary files and Linux iptables firewall logs. This implies that only organizations using these technologies can send data to the central server. It is also important to notice that this architecture does not enable the exchange of information directly among the organizations. There is also no available information about which kind of data analysis or correlation is being done with the data.
Honeynet Security Console
The Honeynet Security Console (HSC) is a tool developed by Jeff Bell, from the Florida Honeynet Project, to correlate events from a local network or honeynet. This tool allows storing and querying the following types of data: libpcap files generated by tcpdump, syslog logs, firewall logs stored by syslog and logs from the Sebek tool, which captures all commands executed by an intruder in a high-interaction honeypot. The HSC authors have chosen to store the data as simply as possible in a SQL database. This was achieved by using programs already available to convert data from each application to a relational database. Each one of these programs creates its own table structure, with different mapping and data types. Thus similar data created by different applications will be represented differently. To solve these inconsistencies HSC needs to perform data type conversions and correlate some data after it performs the queries to the database. In some of the cases if the data were mapped differently, the correlations could be done directly as queries to the database.
The GenII (2nd generation) Honeynet is the next step in the evolution of honeynet technology. Based on combining old and new techniques, the GenII Honeynet can increase the flexibility, manageability, and security of honeynet deployments. This paper introduces the technology used in a GenII Honeynet architecture. However, before you proceed, it is assumed that you have already read and fully understand the concepts, risks, and issues of Honeynets outlined in Know Your Enemy: Honeynets. It is critical that you understand the basic concepts and risks of honeynets before covering the technical details.
Future
The future plans are to make the Honeynet deployment and management easy. In next phase the Honeynet Project would be releasing a bootable CDROM that will boot into a Honeynet gateway or Honeywall. The bootable gateway would have all the Data Control and Data Capture mechanisms as defined above. Once you properly boot the CDROM, all you will have to do is to place your Honeypots behind it. This will make the Honeynet deployment easy and standardized.
Conclusion
The purpose of this paper was to help you understand what Honeynets are and their importance. We discussed the mechanisms of a Honeynet, i.e. Data Control, Data Capture and Data Collection. Then we discussed the two architectures of Honeynet, GenI and GenII. In the end we discussed Virtual Honeynet and the future. Honeynets are truly high-interaction Honeypots which helps you in capturing and analyzing complex attacks.
We have just completed an overview of how to build and deploy a GenII Honeynet based on a bridging gateway using Linux. This deployment represents some of the more advanced features of honeynet technology. However, keep in mind that information security is similar to an arms race, as we release new technologies to capture attacker activities, these very same threats can develop their own counter measures. If you are interested in deploying your own honeynet, we highly recommend theHoneywall CDROM.