PraMay's Blog: October 2008

Honeynet Technology

Abstract

The deployment of honeypots is one of the methods used to collect data about attack trends in computer networks. Two or more honeypots on a network form a honeynet. The lack of a standard format for data representation makes the exchange and centralization of data generated by different technologies difficult. This also restricts the correlation and analysis of this information. This paper presents the HIDEF ( Honeypots Information and Data Exchange Format), a proposal for a format to enable the representation and exchange of data and information produced by organizations using honeypots and honeynets

Introduction

In the past few years the number of computer security incidents on Internet connected networks has continuously increased . As a result, there is an increasing need for attack data correlation and tools to help understand attacks and identify trends. The deployment of sensors in computer networks to gather malicious traffic is one of the methods used by researchers and security professionals to collect data for attack trends analysis. One type of sensor that has been used is a honeypot, a security resource whose value lies in being probed, attacked or compromised. Another type of sensor used is a honeynet, a network specifically designed for the purpose of being compromised, that has control mechanisms to prevent it from being used as a base of attacks against other networks. The honeypot technologies have considerably developed in the past few years, mainly because of the increase in research activities and the development of new ways to collect data about attacks. However, it is very important to enable a more robust and complete analysis of the traffic captured by honeypots and honeynets that use different technologies and are deployed in different locations around the world. This analysis would make it possible to better understand the attacks’ distribution and how they interrelate. To allow the correlation of information from different honeypot and honeynet implementations, it is not only necessary to have a system to collect and analyze these data, but also to have a format to represent this information. The existence of a standard format would facilitate the exchange of data, because it would enable the development of tools to automate the generation and retrieval of information from different sources. The existing research in the area of honeypot data collection and analysis is concentrated in visualizing and correlating data from a unique honeynet or from a set of honeypots using similar technologies. However, none of them considers the interoperability issue of analyzing data generated in different architectures by different technologies. On the other hand, standard formats to represent data related to attacks and intrusion detection, such as the Intrusion Detection Message Exchange Format (IDMEF) and the Incident Object Description Exchange Format (IODEF) are not adequate to represent data from honeypots. To contribute to this research area this work proposes HIDEF (Honeypots Information and Data Exchange Format), a format to represent and exchange data and information produced by organizations using honeypots and honeynets. This proposed format intends to solve the interoperability issue among different honeypot implementations, while preserving compatibility with other standards like IDMEF and IODEF.

This paper would be introducing Honeynet. It is the technology used by Pakistan Honeynet Project to gather information about the motives and tactics of the Black Hat community targeting Pakistan’s’ networks. By the end of the paper, I hope you will understand what honeynet is, what it can do for you, its benefits, different risks / issues, types of honeynet technologies available, and how you can deploy them in your network.

Overview

Normally due to the size of traffic and activity on the production network, we cannot log the level of detail that security practitioner often needs. Honeynets are a way to get much more detailed logging for certain malicious situations than would be possible with normal logging. Suppose you have firewall, which is properly configured to stop attack on port 111. It is good, but you won’t be able to learn about the attack, which can be bad. There might be situations when you want to see the content of the traffic. It can be when you want to know the intentions of the attackers and how much they know about your network. It can be when a particular system is getting lots of probes. Also, when you think that a new attack or technique has been used to exploit your network.

Honeynet is a high-interaction network of honeypots. High-interaction Honeynet is a network of actual systems running real operating systems and services. They give the ability to learn more about the attacks and attackers since they are running actual operating system with real services, which an attacker can compromise. Unlike low-interaction honeypot emulation, they are running everything real that comes with the operating system. Honeynets are like real networks comprising different systems but the difference is that they don’t have any production value and all the activity is logged and analyzed. They don’t run any production services, so they don’t have any production activity or interaction. As a result, any activity happens on the honeynet is supposed to be from an attacker.

We know Honeynet is a highly controlled network of systems designed for capturing attackers’ activity. They can be created in accordance with the existing network infrastructure to give attacker a feel of real network to which she interacts with. It all depends on the services you want to provide. It can be from clustered Exchange server on a Windows 2003 to an ISO Linux environment. It all depends on you!

Honeynet is not a single product but composed of multiple technologies and products. It is an architecture. The goal is to create a highly controlled environment in which everything is monitored and logged. Once the architecture is created, you put your target systems inside it. Normally target systems are default installations of widespread operating systems placed on external networks.

Mechanism

Honeynet is a not a single product that we install and it is ready to go. Honeynet is an architecture composed of multiple technologies and products. The architecture depends on you but the deployment can be very complex. Improper deployment can get you into trouble. There are certain requirements for a proper deployment of Honeynet. The honeynet deployment emphasizes Data Control, Data Capture, and Data Collection.

Data Control is defined as management or tracking of the activity to and from the Honeynet. You won’t like getting emails regarding unauthorized scans from your network or attackers using Honeynet to harm other systems on the network. Data Control is used to prevent attackers from attacking other systems. It has been observed that attackers use compromised systems to discover other vulnerable systems on the internet. There are different approaches to implement data control. We do not want attackers to know that they are under a controlled environment but also we do not want to give them full freedom. Normally there are three techniques used for data control, i.e. connection control, bandwidth control and intrusion prevention. These three techniques make up a powerful data controlling system. Connection control is used to limit the outbound connections from the Honeynet. Usually inbound connections to Honeynet are not controlled. We just don’t want an attacker to harm other systems through Honeynet. A certain limit is set for outbound connections and once the limit is achieved, all outbound connections are blocked. This minimizes the risks of different network attacks. Bandwidth control is used to manage the inbound and outbound network bandwidth of Honeynet. You don’t want an attacker to choke your network pipe with a DoS attack. So bandwidth control allows you to set a limit on the amount of network bandwidth your Honeynet can consume. Intrusion prevention is used to block known attacks. It is done by inspecting each packet at the gateway, and if it matches with IDS rules, the packet is dropped or modified with an alert. These techniques cannot completely eliminate the possibility of an attacker harming others systems. They can only help you in minimizing the risks, but cannot completely eliminate them.

Data Capture is defined as the logging of the entire attacker’s activity in the Honeynet. The purpose of the Honeynet is to learn and analyze the attacker’s activity. Honeynet does not have any value without logging, it is useless. There are different techniques and approaches used for data capturing. The purpose of data capture is to log as much information as we can without attackers knowing it. Data logging is done on multiple layers to avoid single point of failure. Normally there are three layers of data capture, i.e. firewall activity, network activity, and system activity. Firewall activity is logged through data control script. It logs all inbound and outbound connections in /var/log/messages. Firewall logs give an overview of the activity and provide first indication of the Honeynet compromise. Network activity is logged through a network sniffer, i.e. snort or tcpdump. The purpose of logging network activity is to capture every packet (with its full payload) crossing Honeynet. System activity logging is the most complex and critical task to accomplish. They give you exciting and ample amount of information, as the activity is captured on the honeypot itself. The advantage of capturing activity on the honeypot itself is that it makes encryption ineffective, and as we are logging on the system level, we capture everything unencrypted. We know that different risks are involved with every mechanism; therefore we also have to minimize the risk of attackers knowing that they are being logged. There are different techniques used to reduce the chances of attackers detecting the data capture mechanism. Normally, we make changes on the honeypots by installing customized data capture patches and kernel modules. Sebek is one of the tools used for logging attacker’s activity on the honeypot, which is installed as a hidden kernel module. Secondly it is recommended to store the captured data on a secured remote system rather than storing locally. It reduces the chances of attackers detecting the captured data, and deleting or modifying it. These techniques and mechanism can never completely eliminate the risks, but can only reduce them.

Data Collection is defined as the collection of data from multiple honeynets to a central location. It is not the requirement for a standalone Honeynet deployment. The purpose of data collection is to centrally capture and combine the information collected from multiple Honeynet deployments.

Architectures

We have discussed the three requirements of Honeynet architecture. There are different ways you can implement these architectures but we will discuss two architectures evolved by the Honeynet Project. These two architectures are known as GenI (first generation) and GenII (second generation). GenI was the first Honeynet architecture deployed by the project in 1999. After learning the lessons, identifying the problems and issues in GenI architecture, GenII was evolved in 2002.

GenI (1^st Generation)

GenI Honeynets were developed in 1999 by the Honeynet Project. The purpose of GenI Honeynet was to capture the maximum amount of attacker activity and give them a feel of real network. The architecture of GenI Honeynet is uncomplicated. The approach used for data capture and data control is simple, which makes it detectable by attackers sometimes. However, it can capture great deal of information and even can help in capturing unknown attacks. The ability of this architecture to control and capture attacks makes it very effective in capturing known, automated, and beginner level attacks. GenI Honeynet is not effective in capturing advance attacks because it can be easily detected by advanced attackers. We know that the purpose of data control is to keep track of attacker’s activity to and from Honeynet. We don’t want her to harm other non-Honeypots systems on the network. Data Control is done by putting a layer three firewall in front of Honeypots. The firewall works as a gateway in NAT (Network Address Translation) mode and controls all the inbound and outbound connections. It allows all inbound connections, but limits outbound connections. The firewall keeps track of all the outbound connections an attacker makes, and when a certain limit is reached, it blocks all outgoing connections from the Honeynet. The firewall we use for setting up data control mechanism is IPTables, an Open Source firewall.

As we know that the purpose of data capture is to log as much information as we can without attackers knowing it. In GenI Honeynet, Data logging is done on multiple layers to avoid single point of failure. The first layer of data logging is firewall. We don’t get detailed information through firewall logs though, but any information available is helpful in the case of Honeynet. Firewall logs give information about the source / destination IP address, source / destination port, protocol, and data / time. The second layer of data logging is network. An Intrusion Detection System (IDS) is deployed on the gateway, which logs every packet and its payload traveling on the wire. IDS logs provide most useful information, as every packet traveling to and from the Honeynet is captured. Secondly, IDS alerts when it catches any suspicious activity. Most of the IDS have signatures database of known attacks, and if the packet traveling on the wire matches with the signature, the IDS generates an alert. The IDS used for setting up data capture mechanism is Snort, an Open Source IDS. The third layer of data logging is system. We log attacker’s activity on the Honeypot itself. The keystrokes and screenshots are captured by installing a modified version of bash or a kernel module. The logs are securely forwarded to a remote server over the network. The disadvantage of transferring the logs over the network is that it can be easily detected by an advanced attacker.

GenII (2^nd Generation)

GenII Honeynets were developed in 2002 by the Honeynet Project after identifying the problems and issues in GenI architecture. The problems in GenI were solved by changing the architecture of GenII Honeynet. In GenI architecture, firewall works on layer three which is easily detectable. This problem is addressed by making the gateway a layer two device, which makes it harder to detect. The firewall works in BRIDGE mode and controls all the inbound and outbound connections like GenI architecture. The new ability added to the gateway in GenII architecture is the IPS (Intrusion Prevention System). Basically, IPS works same as an IDS but has the capability to block and modify the attacks also. As we know most of the IDS have signatures database of known attacks. So, if the packet traveling on the wire matches with the signature, the IPS can block or even modify that packet. This capability helps in distinguishing between legitimate and malicious activity. If an attacker would try to run an exploit against a non-Honeypot system, the IPS would be able to block or modify the attack even if it is under connection limit. IPS mechanism works with know attacks only, so unknown attacks can bypass this technology. That is the reason it is combined with connection control mechanism, so that the attack can be blocked after a certain limit if it doesn’t matches with the signature. This mechanism makes the Honeynet harder to detect.

Data capture mechanism in GenI architecture is somewhat same as the GenII. Data logging is done on three layers, i.e. firewall layer, network layer and system layer. The most difficult part is to capture the attacker’s activity on the Honeypot itself. The newest and greatest development that has been done for data capturing during the GenII period is Sebek. Sebek is a client-server tool designed to capture attacker’s activity on the Honeypot. It is a hidden kernel module capable of tracking attacker’s activity. Once the Sebek client is installed on the Honeypot, it starts transmitting the data using UDP to its server. Sebek client hides its activity from the attacker. Sebek server captures the activity from the client and logs it.

The capability of Data Collection and Alerting is also introduced in GenII Honeynets. Data Collection mechanism lets you collect and analyze the data from distributed Honeynet deployments. Alerting notifies if someone breaks into the Honeynet, which helps in keeping track of the activity.

Virtual Honeynet

Virtual Honeynet lets you run everything on a single computer. It is deployed by running virtualization software, that allows creating multiple virtual machines and running separate operating systems on them. This technology is very effective when we have limited availability of the resources. Also, Virtual Honeynet is easier to manage as compared to traditional Honeynet, since everything runs on a single machine. There are certain limitations for the type of architecture and operating system you can use for Virtual Honeynet. Also, there are risks involved in Virtual Honeynet deployment. If an attacker is able to compromise the operating system on which virtualization software is running, he would be able to control the whole system. Secondly, if an attacker compromises the system in your Virtual Honeynet, he may be able to detect that the system is running in a virtual environment. The possible solutions that Pakistan Honeynet Project has used and tested are VMWare Workstation, VMWare GSX Server, Microsoft Virtual PC and User Mode Linux. The advantage of using User Mode Linux is that it is open source and free. All of these products have nice features and capabilities.

Alerting

There is one last element you need to consider before finishing your honeynet, alerting. Having someone break into your honeynet is a great learning experience, unless you are unaware that someone has broken into it. Ensuring that you are notified to a compromise (and responding to it) are critical for a successful honeynet. Ideally you could have round-the-clock monitoring by a seasoned admin. However, for organizations that cannot support 24/7 staff, one alternative is automated alerting. One option for automated monitoring is Swatch, the Simple Watcher. Swatch is an automated monitoring tool that is capable of alerting administrators of possible successful attacks on the honeynet. Swatch monitors log files for patterns described in a configuration file. When a pattern is found it can disseminate alerts via email, system bells, phone calls, and can be extended to run other commands/programs. A simple Swatch rule contains the pattern to watch for followed by a list of actions to take. By default Swatch will include in email alerts the line in the log file that matched the given rule. An example email for the above rule would look like the example below.

To: admin@honeynet.org

From: yourdatacontrol@yourdomain.org

Subject: ------ ALERT!: OUTBOUND CONN --------

Apr 6 17:19:05 honeywall FIREWALL:OUTBOUND CONN UDP:IN=br0

PHYSIN=eth1 OUT=br0 PHYSOUT=eth2 SRC=192.168.1.101

DST=63.107.222.112 LEN=123 TOS=0x00 PREC=0x00 TTL=255 ID=43147

PROTO=UDP SPT=5353 DPT=79 LEN=103

Even with the automated tools described in the Data Control section, an effective honeynet requires constant supervision. Properly configured, Swatch can be used to quickly notify administrators of events on their network. However, do not depend on outbound connections as your only source of alerting. For example, attacker may compromise the system, but never attempt an outbound connection. Be sure to montior other sources of information, such as keystrokes collected by the Sebek clients. More advanced detection, reporting, and alerting mechanisms are under development for the Honeywall CDROM.

Testing

Once we have configured Data Control and Data Capture, the next step will be to test the gateway. To test your deployment, below are some basics steps you can take. The Honeywall CDROM comes with a more thorough test plan, which can be found in the documentation section. To test the gateway, we will need a system on the external interface, we will call this the test system. Based onFigure, we will use the system 192.168.1.20 as our test system. We begin by first testing Data Control, does our honeynet successfully contain inbound and outbound activity?. First, initiate a connection from the test system to one of the honeypots within the honeynet. Based on your ruleset, this connection should have most likely been allowed. If so, you would have an entry similar to this in /var/log/iptables

Once you have confirmed inbound connectivity is working, the next step is to test outbound. Begin by accessing one of the honeypots behind the gateway, (we recommend you access honeypots from the console, as the honeypot will log locally any remote connections, such as SSH) . From there, initiate multiple outbound connections to the test system. This will replicate one of the honeypots has been compromised, and an attacker is attempting to initiate outbound connections, and potentially an attack. The connections should be logged to /var/log/iptables on the Honeywall CDROM. In our case, we can attempt multiple outbound FTP connections to the test system on the production network. When our limit of 15 TCP connections is hit, a "Drop TCP" entry is logged. You would most likely have entries similar to this.

Next, we will want to confirm that our NIPS technology, snort_inline is working. Fortunately, the snort-inline toolkit comes with test rules, designed specifically for test. Be sure to enable those rules before testing snort_inline. The rulebase you are running will also determine which test you run. If you are running a drop ruleset, you test by simply by first enabling the default test rule, restart snort_inline using the start script, then attempt an outbound Telnet connection. Snort_inline should detect, drop, and log the attempt. Your Telnet attempt should not work, it should simply time out. If you are running a replace ruleset, you test by enabling the default replace test rule, restart snort_inline, then attempt a simple HTTP GET command. Snort_inline should detect, modify, and log the attmempt. To confirm the modification happens, be sure to sniff the EXTERNAL interface eth0. Do NOT sniff the internal interface eth1, as the GET command is not modified until after it passes through the internal interface, but before it leaves the external interface. Be sure that once you are done testing, you disable the rules (or the bad guys could simply run the default tests also).

Once we confirm Data Control we then want to ensure that Data Capture is working. Remember, if our honeynet is not logging all activity, then the honeynet has no value. We confirm Data Capture by looking at the logs, did the Honeynet capture the Data Control test we just ran? We begin with the firewall logs. This test is simple, all of our connections should have been logged to /var/log/iptables. Specifically, we should first see the inbound connections from the test system to the honeypot. Second, we should see the outbound connections from the honeypot to the test system. Last, we should see an alert message indicating that the outbound limit has been met, and all further connections will be dropped. We already tested this when we confirmed Data Control Next, we review the network logs. We want to be sure we captured every packet and the full payload of every connection, both inbound and outbound. Based on experience, we have found its best to rotate the logs on daily basis. The log we are primarily interested in is the binary log capture, called snort.log.*. In addition, you may find various directories made up IP addressess. These contain the any output of packets containing ASCII content, such as FTP commands or an .html page. You can confirm Snort logged all packets by analyzing the binary log file, as follows:

honeywall #snort -vdr snort.log.*

Finally we review the Sebek logs. These are the keystrokes captured by the Sebek kernel module, then dumped onto the network. These packets should have been captured by the network sniffer (Snort). Its highly recommend you use the GUI interface WAlleye that comes with the Honeywall CDROM for all Sebek analysis. If you were able to analyze the collected data, you Data Capture is working successfully! Also, check your email, you should have been alerted to the test you just ran. Swatch should have alerted you to your own tests that you just ran. Now, since you are a true security professional, we know you are going to reboot your honeynet gateway and test Data Control and Data Capture one more time, just to be sure. Also, we encourage organizations to do more extensive testing, as documented in the Honeywall CDROM docs section.

What the honeynet collects

One of the best ways to demonstrate how a honeynet works (and its value) is to review a captured attack.

In February 2002, Honeynet Project member Michael Clark deployed a virtual using GenI technology, similar to the architecture in Figure 1. Several Linux-based honeypots served as intended victims. On 18 February, a standard FTP exploit compromised one of the Linux honeypots in the honeynet?in this case, the tool used was TESO's wu-ftpd massrooter, a well-known and highly effective automated hacking tool. The honeynet easily detected and captured this attack, including the attacker's initial keystrokes on the compromised system. By deconstructing all the network packets that Snort captured, we easily determined the attack's nature and the commands executed on the remote system.

Once exploited, the attacker executed a command to download the binary foo from a remote system, install it as /usr/bin/mingetty, and execute the binary. The attacker then left the system. This demonstrates that just capturing keystrokes does not give you all the information you need. What is the binary foo, and what is the attacker attempting to achieve?

Shortly after the binary was executed, the honeynet filter mechanism for data control (in this case, IPTables) logged inbound and outbound packets being sent to and from the hacked honeypot, but our sniffer Snort was not capturing or logging any of the activity. We had a failure. After analyzing the traffic, we identified our error. Someone was now sending nonstandard IP packets to the hacked honeypot?in this case, IP protocol 11 packets, otherwise known as Network Voice Protocol. The firewall logs captured this, but the sniffer didn't. We had fallen into the trap of designing our honeynet to capture what we expected the bad guys to do but not everything that they actually could do. Fortunately, because multiple layers of data capture comprised the honeynet, when one layer failed, the other layer picked up on the traffic.

Once identified, we corrected the mistake by reconfiguring Snort to capture and log all IP traffic?not just IP protocols 1, 6, and 17. We could now capture the packets and full packet payload of all the NVP traffic sent to and from the compromised Linux honeypot. At first, the packets were hard to understand; as Figure 4 shows, the packet payload appeared to be obfuscated or encrypted. Also, a variety of sources that appeared to be spoofed (such as army.mil) sent multiple identical packets.

Honeynet Project

The Honeynet Project maintains several honeynets around the world, whose data are all stored in a central server. To allow further correlation of the data, a set of requirements that must be fulfilled during the data capture in all honeynets was defined

• a record with the configuration of all active honeypots in the honeynet must be maintained;

• the data captured by the firewall, router or IDS (Intrusion Detection System) must be stored in GMT (Greenwich Mean Time) timezone;

• each honeynet must have a unique identifier, a name convention and a mapping, that allow to identify its location and configuration; Although the description of data collection to a central server is available, there is no information about how this server is structured or which type of correlation is performed.

Honeynet Research Alliance

The Honeynet Research Alliance is a forum where the participants are organizations from several countries, that perform research on honeypots and honeynets. Up to now there is no method available for these institutions to share data or analysis among them. However, some organizations send data to a central server maintained by the Honeynet Project. This central server can deal only with data generated by this specific set of capture and control tools: the libpcap library binary files and Linux iptables firewall logs. This implies that only organizations using these technologies can send data to the central server. It is also important to notice that this architecture does not enable the exchange of information directly among the organizations. There is also no available information about which kind of data analysis or correlation is being done with the data.

Honeynet Security Console

The Honeynet Security Console (HSC) is a tool developed by Jeff Bell, from the Florida Honeynet Project, to correlate events from a local network or honeynet. This tool allows storing and querying the following types of data: libpcap files generated by tcpdump, syslog logs, firewall logs stored by syslog and logs from the Sebek tool, which captures all commands executed by an intruder in a high-interaction honeypot. The HSC authors have chosen to store the data as simply as possible in a SQL database. This was achieved by using programs already available to convert data from each application to a relational database. Each one of these programs creates its own table structure, with different mapping and data types. Thus similar data created by different applications will be represented differently. To solve these inconsistencies HSC needs to perform data type conversions and correlate some data after it performs the queries to the database. In some of the cases if the data were mapped differently, the correlations could be done directly as queries to the database.

The GenII (2nd generation) Honeynet is the next step in the evolution of honeynet technology. Based on combining old and new techniques, the GenII Honeynet can increase the flexibility, manageability, and security of honeynet deployments. This paper introduces the technology used in a GenII Honeynet architecture. However, before you proceed, it is assumed that you have already read and fully understand the concepts, risks, and issues of Honeynets outlined in Know Your Enemy: Honeynets. It is critical that you understand the basic concepts and risks of honeynets before covering the technical details.

Future

The future plans are to make the Honeynet deployment and management easy. In next phase the Honeynet Project would be releasing a bootable CDROM that will boot into a Honeynet gateway or Honeywall. The bootable gateway would have all the Data Control and Data Capture mechanisms as defined above. Once you properly boot the CDROM, all you will have to do is to place your Honeypots behind it. This will make the Honeynet deployment easy and standardized.

Conclusion

The purpose of this paper was to help you understand what Honeynets are and their importance. We discussed the mechanisms of a Honeynet, i.e. Data Control, Data Capture and Data Collection. Then we discussed the two architectures of Honeynet, GenI and GenII. In the end we discussed Virtual Honeynet and the future. Honeynets are truly high-interaction Honeypots which helps you in capturing and analyzing complex attacks.

We have just completed an overview of how to build and deploy a GenII Honeynet based on a bridging gateway using Linux. This deployment represents some of the more advanced features of honeynet technology. However, keep in mind that information security is similar to an arms race, as we release new technologies to capture attacker activities, these very same threats can develop their own counter measures. If you are interested in deploying your own honeynet, we highly recommend theHoneywall CDROM.

AI & ES Famous Problems

Water Jug Problem

This is an old chestnut, one I'm sure you will recognise! You have two jugs, one of which holds 5 pints and the other of which holds 3 pints, and an endless supply of water. Your task is to get 4 pints of water in the 5 pint jug - but you have no way of measuring 4 pints directly! You can't put "roughly" 4 pints in and hope for the best.

Solution

The way to do it is to fill the jugs to their capacity (the only accurate measurements that you know) and to transfer water between the jugs. For instance, if the 5 pint jug were full and the 3 pint jug were empty, then filling the 3 pint

jug from the 5 pint jug would leave exactly 2 pints in the 5 pint jug. In fact, here are the t

hings that you can do:

1. Fill the 5 pint jug from the tap

2. Fill the 3 pint jug from the tap

3. Empty the 5 pint jug down the drain

4. Empty the 3 pint jug down the drain

5. Pour as much of the contents of the 5 pint jug into the 3 pint jug as will fit (what I term "filling" the 3 pint jug - even though it may not end up full).

6. Filling the 5 pint jug from the 3 pint j

ug in a similar manner.

The contents of the jugs are shown as numbers in column B and C and also graphically in columns E to I (for the 5 pint jug) and K to M (for the 3 pint jug). The initial contents of the jugs are given in cells B4 and C4 and all the values below them in the column are calculated from these initial values and the commands entered in column A.

The calculations are done as a series of nested IF statements, as shown in the screen dump above. Basically, these test to see if 1 or 2 has been entered in the column A cell, in which case the value should be 5 or 3 (depending on which jug it is), or whether 3 or 4 has been entered, in which case the appropriate jug contents is set to 0.

The difficulty comes with options 5 and 6. These pour as much of the contents of one jug into the other as will fit. This requires quite a lot of temporary working out, which is done in columns P to S. Columns P and Q give the contents of the jugs assuming that the 3 pint jug is being filled from 5 pint jug, and columns R and S give the contents of the jugs assuming that the 5 pint jug is being filled from the 3 pint jug. Columns P and R give the contents of the 5 pint jug in each case, columns Q and S give the contents of the 3 pint jug.

The cells in columns E to M have had borders put round them to indicate the different sized jugs. The text colour of the cells has been set to blue and they contain formulae which display $$$ (the closest representation for water I could think of!) when the appropriate values appear in the corresponding cells in columns B and C.

Finally, there is a formula copied down column N which displays the message "Well done! You have solved the puzzle!" if it detects that the value in the corresponding cell of column B is 4 (i.e. you have succeeded in getting 4 pints in the 5 pint jug).

YOU CAN DOWNLOAD THIS SPREADSHEET FROM THE LINK AS GIVEN BELOW.

http://richardbowles.tripod.com/excel/weird/jugs.htm

8 Queen Problem

The eight queens puzzle is the problem of putting eight chess queens on an 8×8 chessboard such that none of them is able to capture any other using the standard chess queen's moves. The queens must be placed in such a way that no two queens would be able to attack each other. Thus, a solution requires that no two queens share the same row, column, or diagonal. The eight queens puzzle is an example of the more general n queens puzzle of placing n queens on an n×n chessboard, where solutions exist only for n = 1 and n ≥ 4.

Solution for 8 queen problem

The problem can be quite computationally expensive as there are 4,426,165,368 (64×63×...×58×57/8!, where the "!" stands for Factorial) possible arrangements, but only 92 solutions. It is possible to use shortcuts that reduce computational requirements or rules of thumb that avoids brute force computational techniques. For example, just by applying a simple rule that constrains each queen to a single column (or row), though still considered brute force, it is possible to reduce the number of possibilities to just 16,777,216 (8^8) possible combinations, which is computationally manageable for n=8, but would be intractable for problems of n=1,000,000. Extremely interesting advancements for this and other toy problems is the development and application of heuristics (rules of thumb) that yield solutions to the n queens puzzle at an astounding fraction of the computational requirements. This heuristic solves nqueens for any n n ≥ 4 or n = 1:

1. Divide n by 12. Remember the remainder (n is 8 for the eight queens puzzle).

2. Write a list of the even numbers from 2 to n in order.

3. If the remainder is 3 or 9, move 2 to the end of the list.

4. Append the odd numbers from 1 to n in order, but, if the remainder is 8, switch pairs (i.e. 3, 1, 7, 5, 11, 9, …).

5. If the remainder is 2, switch the places of 1 and 3, then move 5 to the end of the list.

6. If the remainder is 3 or 9, move 1 and 3 to the end of the list.

7. Place the first-column queen in the row with the first number in the list, place the second-column queen in the row with the second number in the list, etc.

For n = 8 this results in the solution shown above. A few more examples follow.

14 queens (remainder 2): 2, 4, 6, 8, 10, 12, 14, 3, 1, 7, 9, 11, 13, 5.
15 queens (remainder 3): 4, 6, 8, 10, 12, 14, 2, 5, 7, 9, 11, 13, 15, 1, 3.
20 queens (remainder 8): 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 3, 1, 7, 5, 11, 9, 15, 13, 19, 17.

The Travelling and Salesman problem

Introduction

Caveat This has been very much an occasional hobby over recent years, and I have not had the time to keep abreast of the literature. Some of what I say might be out of date.

The Travelling Salesman Problem (TSP) is a deceptively simple combinatorial problem. It can be stated very simply:

A salesman spends his time visiting n cities (or nodes) cyclically. In one tour he visits each city just once, and finishes up where he started. In what order should he visit them to minimize the distance travelled?

Many TSP's are symmetric - that is, for any two cities A and B, the distance from A to B is the same as that from B to A. In this case you will get exactly the same tour length if you reverse the order in which they are visited - so there is no need to distinguish between a tour and its reverse, and you can leave off the arrows on the tour diagram.

If there are only 2 cities then the problem is trivial, since only one tour is possible. For the symmetric case a 3 city TSP is also trivial. If all links are present then there are (n-1)! different tours for an n city asymmetric TSP. To see why this is so, pick any city as the first - then there are n-1 choices for the second city visited, n-2 choices for the third, and so on. For the symmetric case there are half as many distinct solutions - (n-1)!/2 for an n city TSP. In either case the number of solutions becomes extremely large for large n, so that an exhaustive search is intractable.

The problem has some direct importance, since quite a lot of practical applications can be put in this form. It also has a theoretical importance in complexity theory, since the TSP is one of the class of "NP Complete" combinatorial problems. NP Complete problems have intractable in the sense that no one has found any really efficient way of solving them for large n. They are also known to be more or less equivalent to each other; if you knew how to solve one kind of NP Complete problem you could solve the lot.

The holy grail is to find a solution algorithm that gives an optimal solution in a time that has apolynomial variation with the size n of the problem. If you could find a method whose solution time varies like a quadratic expression, for example, then doubling n multiplies the solution time by 4 for large n. The best that people have been able to do, however, is to solve it in a time that varies exponentially with n. Such algorithms run out of puff at a certain level of n, more or less independently of computing power. If computation varies as 2^n, say, then a thousand-fold increase in computing power will only allow you to add another 10 nodes. So an algorithm that peters out at 50 cites now will probably never get you to 100 nodes, whatever happens to hardware technology.

Alternatively there are algorithms that seem to come up with a good solution quite quickly, but you cannot say just how far it is from being optimal.

Getting the feel of it

Recently I have been experimenting with some ideas that I had back in 1970. The graphic capability and speed of a PC (compared to the IBM 1130 I used in 1970) makes it a good tool for visualizing what is going on. The result is a small DOS program that you might like to play with. What it does is to generate a sequence of symmetric travelling salesman problems, and then solve them using two different algorithms. The nodes are placed at random positions on a rectangular grid and I simply use the straight line distances between them. The virtue of doing it this way is that the problems are easy to generate and easy to visualize. As each problem is generated, the nodes are displayed graphically on the screen, so you if you like can try your hand at guessing where the shortest tour might go.

The first algorithm looks for a true optimum tour by doing a truncated search of the entire solution space - known as the Branch and Bound technique. It reaches an initial tour very quickly, and then continues to search, finding better tours and/or establishing that no shorter tour exists. As each tour is found it is drawn on the screen, and details are entered in a file calledtsp.log. This algorithm is quite fast up to about 25 nodes, but becomes painfully slow beyond 40.

The good news is that you can truncate the search at any time by hitting any key. You can also set a % bound margin; setting this to 10 means that you wind up with a tour whose length is within 10% of the shortest - but you get there rather more quickly than if you insist on a true optimum (margin=0). If you like you can set the margin at 100, so it gives up immediately after finding the solution.

The second algorithm starts from a random tour and seeks to improve it using an algorithm base on dynamic programming. When no further improvement is obtainable the tour is displayed and details are logged. The process is repeated using 5 different random starting points. You will see that the algorithm is fast for problems with up to 100 nodes (the limit of the program). It often finds the true optimum tour and, when it doesn't, gets within a couple of percent of the shortest length. Because the first algorithm runs out of puff above 40 nodes, however, it is hard to say how well it sustains its performance for really large problems.

8-puzzle Problem

8 puzzle is a simple game which consists of eigth sliding tiles, numbered by digits from 1 to 8, placed in a 3x3 squared board of nine cells. One of the cells is always empty, and any adjacent (horizontally and vertically) tile can be moved into the empty cell. The objective of the game is to start from an initial configuration and end up in a configuration which the tiles are placed in ascending number order.

a screen shot of 8 puzzle solution finder and you can download it from this link

http://www.8puzzle.com/8_puzzle_solver_download.html

Missionaries and Cannibals Problem

In the missionaries and cannibals problem, three missionaries and three cannibals must cross a river using a boat which can carry at most two people, under the constraint that, for both banks, if there are missionaries present on the bank, they cannot be outnumbered by cannibals (if they were, the cannibals would eat the missionaries.) The boat cannot cross the river by itself with no people on board.

*Trip number*	*Starting bank*	*Travel*	*Ending bank*
(start)	Aa Bb Cc
1	Bb Cc	Aa →
2	Bb Cc	← A	A
3	A B C	bc →	A
4	A B C	← a	b c
5	Aa	BC →	b c
6	Aa	← Bb	Cc
7	a b	AB →	Cc
8	a b	← c	A B C
9	b	a c →	A B C
10	b	← B	Aa Cc
11		Bb →	Aa Cc
(finish)			Aa Bb Cc

In the jealous husbands problem, the missionaries and cannibals become three married couples, with the constraint that no woman can be in the presence of another man unless her husband is also present. Under this constraint, there cannot be both women and men present on a bank with women outnumbering men, since if there were, some woman would be husbandless. Therefore, upon changing women to cannibals and men to missionaries, any solution to the jealous husbands problem will also become a solution to the missionaries and cannibals problem

The Tower of Hanoi

The Tower of Hanoi or Towers of Hanoi (also known as The Towers of Benares) is amathematical game or puzzle. It consists of three rods, and a number of disks of different sizes which can slide onto any rod. The puzzle starts with the disks neatly stacked in order of size on one rod, the smallest at the top, thus making a conical shape.

The objective of the puzzle is to move the entire stack to another rod, obeying the following rules:

Only one disk may be moved at a time.
Each move consists of taking the upper disk from one of the pegs and sliding it onto another rod, on top of the other disks that may already be present on that rod.
No disk may be placed on top of a smaller disk.

Most toy versions of the puzzle have 8 disks. The game seems impossible to many novices, yet is solvable with a simple algorithm.

Solution:

Simple solution

The following solution is a simple solution for the toy puzzle.

Alternate moves between the smallest piece and a non- smallest piece. When moving the smallest piece, always move it in the same direction (either to the left or to the right, but be consistent). If there is no tower in the chosen direction, move it to the opposite end. When the turn is to move the non-smallest piece, there is only one legal move.

Recursive solution

As in many mathematical puzzles, finding a solution is made easier by solving a slightly more general problem: how to move a tower of h (h=height) disks from a starting peg f (f=from) onto a destination peg t (t=to), r being the remaining third peg and assuming (). First, observe that the problem is symmetric for permutations of the names of the pegs (symmetric group S3). If a solution is known moving from peg f to peg t, then, by renaming the pegs, the same solution can be used for every other choice of starting and destination peg. If there is only one disk (or even none at all), the problem is trivial. If h=1, then simply move the disk from peg f to peg t. If h>1, then somewhere along the sequence of moves, the largest disk must be moved from peg f to another peg, preferably to peg t. The only situation that allows this move is when all smaller h-1 disks are on peg r. Hence, first all h-1 smaller disks must go from f to r. Subsequently move the largest disk and finally move the h-1 smaller disks from peg r to peg t. The presence of the largest disk does not impede any move of the h-1 smaller disks and can temporarily be ignored. Now the problem is reduced to moving h-1 disks from one peg to another one, first from f to r and subsequently from r to t, but the same method can be used both times by renaming the pegs. The same strategy can be used to reduce the h-1 problem to h-2, h-3, and so on until only one disk is left. This is called recursion. This algorithm can be schematized as follows. Identify the disks in order of increasing size by the natural numbers from 0 up to but not including h. Hence disk 0 is the smallest one and disk h-1 the largest one.

The following is a procedure for moving a tower of h disks from a peg f onto a peg t, with r being the remaining third peg:

Step 1: If h>1 then first use this procedure to move the h-1 smaller disks from peg f to peg r.

Step 2: Now the largest disk, i.e. disk h-1 can be moved from peg f to peg t.

Step 3: If h>1 then again use this procedure to move the h-1 smaller disks from peg r to peg t.

The monkey and Bananas Problem

Problem

A monkey is in a room. Suspended from the ceiling is a bunch of bananas, beyond the monkey's reach. In the corner of the room is a box. How can the monkey get the bananas?

Solution:

The solution is that the monkey must push the box under the bananas, then stand on the box, and then grab the bananas. However, figuring this out requires a planning algorithm.

In other variants of the problem, the bananas are in a chest and the monkey first has to open the chest using a key.

Cryptarithmetic

WHAT IS CRYPTARITHMETIC?

Cryptarithmetic is the science and art of creating and solving cryptarithms.

A cryptarithm is a genre of mathematical puzzle in which the digits are replaced by letters of the alphabet or other symbols.

The invention of Cryptarithmetic has been ascribed to ancient China. This art was originally known as letter arithmetic or verbal arithmetic. In India, during the Middle Ages, were developed the arithmetical restorations or "skeletons" a type of cryptarithms in which most or all of the digits have been replaced by asterisks.

In 1864 the first cryptarithm appeared in the USA, in American Agriculturist.

The word cryptarithmetic ("cryptarithmie" in French) was introduced by M. Vatriquant, writing under the pseudonym Minos, in the May 1931 issue of Sphinx, a Belgian magazine of recreational mathematics published in French from 1931 to 1939.

A type of alphabetic addition puzzle termed "doubly-true" was introduced in 1945 by Alan Wayne. It is made up of "number words" that, when read, also form a valid sum.

In 1955, J. A. H. Hunter coined the word alphametic to designate a cryptarithm whose letters form sensible words or phrases.

The world's best known alphametic puzzle is undoubtedly SEND + MORE = MONEY. It was created by H. E. Dudeney and first published in the July 1924 issue of Strand Magazine associated with the story of a kidnapper's ransom demand.

Modernization by introducing innovations such as computers and the Internet is making quite an impact on cryptarithmetic. If you are interested in knowing more about this revolution read the article Will cryptarithmetic survive innovation?

HOW TO SOLVE A PUZZLE

1. Preparation

Rewrite the problem, expanding the interlinear space to make room for trial numbers that will be written under the letters.

For example, the puzzle SEND + MORE = MONEY, after solving, will appear like this:

S E N D

9 5 6 7

+ M O R E

1 0 8 5

---------

M O N E Y

1 0 6 5 2

2. Remember cryptarithmetic conventions

• Each letter or symbol represents only one digit throughout the problem;

• When letters are replaced by their digits, the resultant arithmetical operation must be correct;

• The numerical base, unless specifically stated, is 10;

• Numbers must not begin with a zero;

• There must be only one solution to the problem.

3. See subtractions as "upside-down" additions

Ease the analysis of subtractions by reading them as upside-down additions. Remember that you can check a subtraction by adding the difference and the subtracter to get the subtrahend: it's the same thing. This subtraction:

C O U N T

- C O I N

---------

S N U B

must be read from the bottom to the top and from the right to the left, as if it were this series of additions:

B + N = T + C1

U + I = N + C2

N + O = U + C3

S + C = O + C4

C1, C2, C3 and C4 are the carry-overs of "0" or "1" that are to be added to the next column to the left.

4. Search for "0" and "9" in additions or subtractions

A good hint to find zero or 9 is to look for columns containing two or three identical letters.

Look at these additions:

* * * A * * * B

+ * * * A + * * * A

------- -------

* * * A * * * B

The columns A+A=A and B+A=B indicate that A=zero. In math this is called the "additive identity property of zero"; it says that you add "0" to anything and it doesn't change, therefore it stays the same.

Now look at those same additions in the body of the cryptarithm:

* A * * * B * *

+ * A * * + * A * *

------- -------

* A * * * B * *

In these cases, we may have A=zero or A=9. It depends whether or not "carry 1" is received from the previous column. In other words, the "9" mimics zero every time it gets a carry-over of "1".

5. Search for "1" in additions or subtractions

Look for left hand digits. If single, they are probably "1".

Take the world's most famous cryptarithm:

S E N D

+ M O R E

---------

M O N E Y

"M" can only equal 1, because it is the "carry 1" from the column S+M=O (+10). In other words, every time an addition of "n" digits gives a total of "n+1" digits, the left hand digit of the total must be "1".

In this Madachy's subtraction problem, "C" stands for the digit "1":

C O U N T

- C O I N

---------

S N U B

EXAMPLES WORKED OUT IN DETAIL BY MASTER PUZZLISTS

1. Geoffrey Mott-Smith

S E N D

+ M O R E

------------

M O N E Y

We see at once that M in the total must be 1, since the total of the column SM cannot reach as high as 20. Now if M in this column is replaced by 1, how can we make this column total as much as 10 to provide the 1 carried over to the left below? Only by making S very large: 9 or 8. In either case the letter O must stand for zero: the summation of SM could produce only 10 or 11,but we cannot use 1 for letter O as we have already used it for M.

If letter O is zero, then in column EO we cannot reach a total as high as 10, so that there will be no 1 to carry over from this column to SM. Hence S must positively be 9.

Since the summation EO gives N, and letter O is zero, N must be 1 greater than E and the column NR must total over 10. To put it into an equation: E + 1 = N

From the NR column we can derive the equation: N + R + (+ 1) = E + 10

We have to insert the expression (+ 1) because we don’t know yet whether 1 is carried over from column DE. But we do know that 1 has to be carried over from column NR to EO.

Subtract the first equation from the second: R + (+1) = 9

We cannot let R equal 9, since we already have S equal to 9. Therefore we will have to make R equal to 8; hence we know that 1 has to be carried over from column DE.

Column DE must total at least 12, since Y cannot be 1 or zero. What values can we give D and E to reach this total? We have already used 9 and 8 elsewhere. The only digits left that are high enough are 7, 6 and 7, 5. But remember that one of these has to be E, and N is 1 greater than E. Hence E must be 5, N must be 6, while D is 7. Then Y turns out to be 2, and the puzzle is completely solved.

2. Steven Kahan

E A T

+ T H A T

------------

A P P L E

Since every four-digit number is less than 10,000 and every three-digit number is less than 1,000, the sum of two such numbers is necessarily less than 11,000. This sum, though, is a five-digit number, hence is greater than 10,000. Consequently, A must be 1 and P must be 0. Further, we can conclude that T = 9. Otherwise, we would be adding a number less than 1,000 to one less than 9,000, leaving us short of the requisite total. The units column then produces E = 8 while generating a carryover of 1 into the tens column. Together with the previously found value of A, we learn from the tens column that L = 3. Finally, the hundreds column yields the equation E + H = P + 10, where the "10" is required to accommodate the needed carryover into the thousands column. When the values of E and P are substituted into this relationship, we get 8 + H = 10, from which it follows that H = 2. Therefore, the unique solution of the puzzle turns out to be

819 + 9219 = 10038.

3. J. A. H. Hunter

N O

G U N

+ N O

----------

H U N T

Obviously H = 1.

From the NUNN column we must have "carry 1," so G = 9, U = zero.

Since we have "carry" zero or 1 or 2 from the ONOT column, correspondingly we have N + U = 10 or 9 or 8. But duplication is not allowed, so N = 8 with "carry 2" from ONOT.

Hence, O + O = T + 20 - 8 = T + 12. Testing for T = 2, 4 or 6, we find only T = 2 acceptable, O = 7.

So we have 87 + 908 + 87 = 1082.

Conclusion

To solve all type of AI & ES Problem use your mind with using the tricks and rules. Once surely you can solve it.

Who solve these problem own, is intelligent and he/she has sharp mind.

Honeynet Technology

AI & ES Famous Problems

Visitors