Monday 4 April 2011

Logfiles and Auditing

1 Introduction
Logfiles are one of the most useful tools in detecting and investigating problems with
computer systems and services. Logs can provide information about system faults and
misuse as well as early warnings of problems. This Technical Guide discusses the logs that
should be kept, the conditions under which they should be held, the types of questions they
may be required to answer and some of the uses to which they may be put. It is concerned
mainly with logging of activity on computers, whether they act as clients, servers or proxies.
Direct logging of activity on networks is not covered.

1.1 Motivation
Without collecting and analysing logfiles, it is impossible to know what is happening on
a computer system or service. There will be no indication of faults and misuse and when
they finally result in complaints from users, there will be no evidence to show the cause of
the problem or how it can be cured. Failure to keep logfiles therefore leads rapidly to an
unreliable system on which users will naturally be unwilling to rely for any critical function.
Reliable systems can only be achieved if their performance is recorded and action taken to
prevent or remedy problems. Logfiles also provide information about the usage of a service,
and allow upgrades or alternative provision to be planned and installed before the load on
the existing system becomes a critical problem.
As well as these internal pressures to deal with problems, there are also likely to be external
pressures. Wide Area Networks, such as JANET, are shared resources and a problem on
one computer or site can soon affect others. For example, a fault that causes excessive
network traffic is likely to cause congestion for others as their traffic competes for the finite
bandwidth and routing resources available. In cases of misuse it is common for an individual
at one site to attack systems or users at others, or for a compromised computer to be used
to send spam, participate in denial of service attacks or host unwanted services such as
phishing or copyright distribution sites. If reasonable requests to deal with problems are not
satisfied then the responsible site is likely, at best, to suffer a tarnished reputation in the eyes
of its peers. The JANET community, and the policies that support it, require its members
to behave responsibly and not to cause unnecessary problems for others or harm the good
reputation of the JANET network. More widely, many organisations and networks now
employ blacklists to restrict traffic from sites or networks considered to be a frequent source
of problems. An organisation that finds itself on one of these blacklists may have difficulty
sending e-mail or other traffic and may have to spend considerable time and effort to be
removed from the list.
In extreme cases failure to deal with problems, whether arising from the lack of logfiles
or inability to use them, may even lead to legal cases. Service providers have paid large
damages to individuals or companies harmed by the actions of their users. At present we
are not aware of any cases where educational organisations have been held liable for the
computing activities of their students, but solicitors have expressed their view that courts
might indeed find against the organisation. Another area where legal action might arise is
in negligence: it has been suggested that if an organisation had been warned of a problem
but did not deal with it then subsequent victims might have a valid claim against the
organisation.
In the case of faults, the best that logging can offer is the early detection and resolution of
problems. However, in cases of misuse there is good reason to believe that a publicised
practice of recording and analysing logfiles and dealing with those who misuse the system
may itself be an effective deterrent. Logfiles can therefore act as a preventive measure,
reducing the number of problems experienced by users and system owners.
Logfiles enable an organisation to improve its service to its own users and maintain a good
reputation with others. In the near future, it appears likely that logfiles and a process to
use them may be essential to defend against threats of legal action. It is important to note,however, that simply keeping logs is unlikely to be sufficient. It is also important to have
processes for checking them, analysing the information they contain, and dealing promptly
and effectively with problems.

2.1 Privacy and Legal Issues
Any comprehensive programme of logging will capture information about the activities
of individual users. In many cases this has the potential to intrude on the privacy of those
individuals. Users and system administrators must be clear that the sole purpose of logfiles
is to provide a better service to legitimate users, by providing computers and networks that
are fit for their intended purpose and which work as reliably as possible. Article 8 of the
European Convention on Human Rights states that ‘Everyone has the right to respect for his
private and family life, his home and his correspondence’: all users and administrators must
respect this.
Most logfiles contain personal data, so are also subject to the provisions of the Data
Protection Act 1998. Users must be informed what information will be recorded and what it
may be used for (as noted above, the very fact of notification may well discourage misuse of
the system). Logfiles must also be protected from unauthorised access, use or modification.
The Act requires that personal data should not be kept for longer than necessary; guidelines
for the interpretation of ‘necessary’ are discussed in Section 2.2, on data retention, below.
The Data Protection Act allows those whose personal data is kept (“data subjects”) to
request copies of information held about them. Without preparation, such Subject Access
Requests (SARs) can be one of the hardest and most costly parts of the Act to comply with:
however, the same processes that allow logfiles to be used to investigate problems should
provide most of the apparatus required to deal with SARs.
Various Acts of Parliament make a useful distinction between traffic data and content
data. Traffic data (sometimes referred to as communications data) is information about the
existence of communications. For example, records that a particular user logged on to a
workstation at a certain time, sent e-mails to a number of other, recorded, e-mail addresses
and then logged out, would comprise traffic data about that session of use. The texts of the
e-mails would, however, constitute content data. The law appears to treat traffic data as less
likely to involve a breach of privacy than content: for example the rules for police access to
traffic data require a lower level of authorisation than for content data.
Most logfiles will contain only traffic data. However, there are a number of electronic
systems where the distinction between traffic and content data is unclear. It is considered
that the subject lines of e-mail are content, rather than traffic; similarly in the case of web
requests the identity of the server from which a page was requested is traffic data, whereas
the rest of the URL, which may well specify exactly what the user saw, is considered to be
content.
The third type of data that is essential in identifying the individuals responsible for cases
of misuse is the identity of the real world individuals who own, and should be responsible
for, each login account or other online identity. This information is known by ISPs (Internet
Service Providers) as subscriber data, but in universities and colleges it is likely to form
part of student records and staff personnel files. As information relating to an identifiable
individual it is, of course, subject to the Data Protection Act.
2.2 Data Retention
Problems are rarely detected the instant they occur so, to be useful, logfiles must be kept for
some period of time. However, logfiles can grow very large, so shortage of storage space
may put an upper limit on what this time may be. Even if logs can be physically stored,
there is little point in keeping them for so long that the quantity of information prevents
convenient searching. Where logfiles contain personal data, the Data Protection Act’s
Fifth Principle also requires that they not be kept for longer than necessary for the specific
purpose for which they were collected. The European Directive 2002/58/EC on Privacy
Logfiles and Electronic Communications (amended by Directive 2009/136/EC), which applies
Data Protection law to electronic communications networks, states that traffic data must be
anonymised or destroyed once it is no longer required, but identifies the provision of value
added services and investigation of unauthorised use as legitimate reasons to collect and
retain this data. Collecting and keeping traffic data for as long as necessary to investigate
misuse of computers and the network is therefore acceptable.
A number of Codes of Practice have been written in an attempt to establish a reasonable
balance between usefulness of logs on the one hand and privacy and practicality on
the other. Following a recognised Code of Practice should be a good defence against
accusations of keeping either too few records or too many. For some time, the Code of
Practice most relevant to computer and network logging has been that produced by the
London Internet Exchange (LINX) in 1999, which is available online at:
https://www.linx.net/good/bcp/traceability-bcp-v1_0.html
The LINX document was prepared and is maintained by members of the Exchange, who
include many of the major ISPs in the UK. The document was also reviewed by the Data
Protection Commissioner, responsible for ensuring compliance with the Data Protection
Acts. The document recommends that traffic data should be retained for a minimum of
three months to allow misuse to be traced, but that to comply with the Data Protection
Act it should not be kept for more than six months except where it relates to a known case
of misuse. If an investigation is in progress then data relating to it may be kept until the
investigation is complete. The same minimum retention time is recommended for subscriber
data. However, users of university and college computers will usually be students or staff.
Both of these legal relationships involve much longer retention periods to comply with
education and employment law, so information about these users’ identities will normally
and legitimately be held for much longer than six months.
Following increased concern that terrorism and serious crime might be organised or
committed using electronic communications, European Directive 2006/24/EC (implemented
in the UK by the Data Retention (EC Directive) Regulations 2009) made it a legal
requirement for public data and telephone networks to retain information about the use of
their e-mail and telephony services for up to two years. However, as JANET and most of
its customers’ networks are classed as private networks, these requirements do not apply.
Retention of logs for these networks therefore remains a recommendation, for the purposes
of network management and dealing with misuse, rather than a requirement. In particular,
organisations running these networks should ensure they have a legitimate reason if they
wish to keep their logs for longer than required for the investigation of normal misuse.

2.2.1 Data Preservation
On a small number of occasions, following major terrorist or other criminal incidents, the
UK police have asked the providers of communications networks (including JANET sites)
to preserve logs and other relevant files in case they contain information relevant to the
investigation. The purpose of these requests is to prevent existing information from being
overwritten or deleted, not to cause additional information to be collected. There is no
requirement to comply, but such exercises have protected useful information for the police
in the past and are considered helpful. In practice, unless the police request contains more
specific instructions, the most usual response is to take a backup of main server logs and
to reserve this along with a recent set of backup tapes that are not reused until the police
investigation is completed.
Such data preservation is permitted, but not required, under the Data Protection Act
1998, where section 29 allows processing of data for the purposes of the prevention and
detection of crime and the apprehension or prosecution of offenders. Section 28 provides
similar permission where necessary for the purpose of safeguarding national security. Data
preserved under either section may be exempted from the normal subject access provisions
of the Act where disclosure might be harmful to the purpose.
Logfiles

The preserved data should be kept by the organisation in a secure place: if the police find
they need access to it then they will use one of the legal mechanisms described in the next
section. Organisations holding preserved data should seek periodic confirmation from the
police that the information is still needed for the section 29 or section 28 purpose.

2.3 Access by Others
Evidence from logfiles may be useful to the police and other investigating authorities in
cases where unlawful acts have been committed. A number of different Acts of Parliament
include provisions under which such authorities may request or require such evidence to be
provided to them. This section attempts to summarise the provisions likely to be encountered
by universities and colleges (for that reason, provisions that only apply to public networks
have been omitted). However, it does not constitute formal legal advice. The definitive
source of information is the original Acts and Codes of Practice: web addresses are listed in
Section 8.3. Since there may be a legal requirement to comply promptly with some of these
notices, organisations should consider instituting standard procedures for responding to
them. They may also wish to discuss these procedures and any requests with their lawyers.
In the case of the police or others investigating criminal offences, access to logfiles will
normally be obtained by a notice under the Regulation of Investigatory Powers Act 2000.
If the information sought is not communications data then a request under section 29 of the
Data Protection Act 1998 will normally be used. A Production Order under Schedule 1 of
the Police and Criminal Evidence Act 1984 will only be used where neither of these routes
has been successful, or where the voluntary request under the Data Protection Act 1998
would not be appropriate to the investigation. Civil courts may make Norwich Pharmacal
Orders if the court process requires an organisation operating a network or server to reveal
the identity of one of its users.

2.3.1 Regulation of Investigatory Powers Act 2000
Part I Chapter II, and in particular section 22, of this Act deals with the disclosure of
communications data to law enforcement and other public bodies. This came into force
in January 2004 and is now the normal process for all access to communications data,
replacing section 29(3) of the Data Protection Act 1998.
Communications data is information about traffic on a network, but not the contents of
that traffic. Section 21(4) of the Act provides a full definition of Communications Data,
separating it into three types:
(a) Information forming part of a communication, that is needed by the system to deliver the
communication from its source to its destination. For example, source and destination
addresses and routing information.
(b) Other information concerning the use of the system by individual users. For example,
times when individual users were logged on and the IP addresses they were allocated.
(c) Other information about the users of the system, referred to elsewhere as subscriber data.
For example, the identity of the owner of a login name or e-mail address.

Logfiles
Logfiles may contain any or all of these types of communications data. Some logfiles
will also contain information that is not communications data such as the subject lines of
e-mails or full URLs of web requests (only the identity of the web server is communications
data), which must not be disclosed under section 22. Responding to a section 22 notice
may therefore require making edited versions of logfiles with these unauthorised types of
information removed.
The Act permits any designated authority to issue a notice to a communications provider
requiring either that existing communications data be disclosed or that particular
communications data be collected. Communications providers are widely defined (not
only public networks are covered) and would certainly include any university or college
providing Internet access to its members. A provider receiving such a notice must act on it,
otherwise it may itself be committing an offence. The Regulation of Investigatory Powers
Act makes the authority that issues a notice responsible for ensuring that it is proportionate:
the communications provider releasing the information is not required, or entitled, to make
any judgement on this. The purposes for which a notice can be served include interests of
national security, detecting crime and preventing crime or disorder, national economic wellbeing,
public safety, protection of public health, assessment of taxes and duties, preventing
death, and preventing or mitigating injury to an individual’s physical or mental health.
To be allowed to issue notices under section 22, an authority must be designated by
the Home Secretary. The initial list of authorities was published as The Regulation of
Investigatory Powers (Communications Data) Order 2003 (Statutory Instrument 2003 No.
3172). To the law enforcement authorities included in the Act (listed in Schedule 1 of the
Regulations) this adds the emergency services, central and local government departments,
the NHS and others with powers to investigate compliance with particular laws (listed
in Schedules 2 and 3). Many of these authorities do not have powers to access the whole
range of communications data set out in section 21(4) and above – many are limited to the
subscriber data of type (c) and some are restricted to particular types of communication
services – and in some cases a more senior officer is required to authorise notices for the
more intrusive types of data. The Schedules to the Regulations set out these arrangements in
detail.
For some time, police forces have had designated Single Points of Contact (SPoCs)
for dealing with the communications industry. Officers staffing the SPoCs have been
specifically trained both in the legal requirements of handling data and in what is likely to
be practical for network operators to provide. SPoCs have been useful to ensure that the
law is used properly and that the evidence obtained is suitable for the investigation and
subsequent prosecution. The Home Office has therefore granted the new authorities powers
under section 22 to ensure that their staff have equivalent training and work in a similar way
as the police SPoCs. The Home Office is maintaining a register of individuals designated
to exercise the powers on behalf of each authority, and every section 22 notice must be
approved by one of these designated persons before it is served on a communications
provider.
The process of issuing notices is described by a Code of Practice. A standard form requiring
disclosure of communications data has been published and should be used for all notices.
Notices that are received by JANET sites should be checked to confirm that they come from
a designated authority, request data which that authority is entitled to receive, and have
been issued by the appropriate designated person or SPoC. JANET CSIRT has access to
the Home Office register and can confirm that notices have been approved by the correct
designated person. Notices that appear incorrect should not be acted upon. There have
been reports that individuals have attempted in the past to use other statutory powers (see
next section) to gain access to information they did not have authority to see and the Home
Office has asked for reports of any attempts to abuse the section 22 powers in this way.
It is strongly recommended that any organisation likely to receive statutory notices under
the Regulation of Investigatory Powers Act 2000 or other statutes (see below) should
designate and train a person or office to deal with the notices, and that all enquiries
Logfiles regarding notices should be directed to that person or office. Legal advice is likely to be
helpful when setting up these procedures.

2.3.2 Other Statutory Notices
The Regulation of Investigatory Powers Act 2000 (RIPA) is just one of a number of
pieces of legislation that create rights for designated authorities to obtain information for
particular purposes. These include the Consumer Protection Act 1987 (trading standards)
and the Social Security Fraud Act 2001 (benefits agency), as well as court orders and police
warrants. The Home Office intends that all access to communications data will eventually be
done under RIPA powers: however, it is likely to be some time before the other powers stop
being used.
When presented with a valid statutory notice by a person entitled to issue that notice, it will
normally be an offence not to provide the required information. However, anyone receiving
such a notice must check both that the notice is valid and that the person is entitled to use it.
This will normally involve checks with appropriate third parties.
The LINX has published a Best Current Practice document on privacy, which contains
useful guidelines on dealing with statutory notices.

2.3.3 Court Orders
Schedule 1 of the Police and Criminal Evidence Act 1984 (PACE) enables a police constable
to ask a judge to make an order requiring a person to either produce or give the constable
access to information that the person holds or has access to. An order will only be granted if
there are reasonable grounds for believing that the information will be of substantial value in
investigating a serious offence, and could be used as evidence. Furthermore all other means
of obtaining the material must have either been tried or found inappropriate. The judge will
then decide whether it is in the public interest to make the order that the information be
produced.
A production order may be served on an individual, a partnership or corporate body, and
may be delivered either by hand or post. The person or body on whom the order is served
must then either produce the information, or give access to it, within a fixed period, typically
seven days from the issue of the order. Failing to comply with an order, or tampering with
the information once the order has been served, is a contempt of court: a serious criminal
offence.
PACE production orders are used as a last resort, generally where information is not
accessible by any other power. They may also be used in place of Data Protection Act 1998
requests (see below) in cases where it is desirable to have a judge rule on the proportionality
of disclosure before it occurs, rather than after as is the case with the Data Protection Act
process. PACE production orders should be simple to deal with: the recipient must comply
with the order or commit a serious criminal offence.
PACE orders, RIPA notices and DPA requests all apply only to criminal offences. For civil
cases a more limited order was created following the case of Norwich Pharmacal Co. v
Customs and Excise Commissioners [1974] AC 133, and therefore known as a Norwich
Pharmacal Order. These may be issued by a court where there is evidence that a civil
wrong (such as defamation or copyright breach) has been committed but the identity of the
wrongdoer is not known to the victim. Online, the victim will often only know a nickname,
e-mail address or IP address for the wrongdoer. If the court considers that a third party (for
example a network provider) is able to reveal the real-world identity of the person associated
with this information then the court may order the third party to do so. The victim can then
begin a civil case against that person. As with PACE production orders, Norwich Pharmacal
Logfiles orders should be simple to deal with: the recipient must either disclose the identity as
ordered by the court or explain that they are unable to determine it.

2.3.4 Data Protection Act 1998
Various sections of the Data Protection Act permit data controllers to disclose personal data
without breaching their obligations under the Act. In particular section 29(3) permits this
where the information is required for the prevention, detection or prosecution of crime, and
section 28 applies where the disclosure is required in the interests of national security. In
all cases, disclosure is voluntary and the data controller must consider whether the breach
of privacy is proportionate to the stated reason of why the information is needed. The
Information Commissioner has a useful guide to how to make this decision at
Although these provisions are still in force, they have been superseded for communications
data by the powers under the Regulation of Investigatory Powers Act described previously
and should no longer be used for this type of information. Where other types of information
are concerned, the request for disclosure should be made in writing on a standard form,
giving enough information about the purpose for the assessment of proportionality to be
made. In the case of a request from the police, this form should always be authorised by the
force’s Single Point of Contact (see above).
As discussed in the previous section, a PACE production order may be preferable in
some circumstances to a Data Protection Act request as it allows this assessment of
proportionality to be made by a judge rather than the recipient of the request.
Logfiles

3 Tracing Misuse
3.1 Clients
Most cases of misuse will be reported as originating from one or more electronic identities,
for example e-mail or IP addresses. Such identities are public, and can be seen by anyone
on the Internet, but in most cases it will only be the local site that can relate these electronic
identities to the individual responsible person. Ensuring that the actions of these electronic
addresses can be assigned to responsible individuals is therefore fundamental to any attempt
to track down network misuse. Since most electronic identities can be forged with various
degrees of technical difficulty (e-mail addresses are easiest, IP addresses used for UDP
packets slightly more difficult and IP addresses used in TCP connections significantly
harder) it is also important to collect, as a matter of routine, sufficient reliable information to
be able to prove when a forgery has taken place and thereby remove blame from an innocent
individual.
Individuals are usually identified to computers by login name or e-mail address, which are
not usually the identifiers that are used in complaints, so a conversion process will usually
be needed to identify the individual who may be the subject of a complaint.
In the simplest case a reported IP address will be the fixed address of a workstation in a
private office. The owner of the office will normally be responsible for activity by that IP
address and only a record of the ownerships of allocated addresses will be needed
The situation is more complicated where a number of different users may use the
same computer, either because the computer is a system that supports multiple users
simultaneously, or because it is a public workstation that may be used by different people
at different times of day. In the latter case, it should be possible to identify a single login
account that was logged in on the workstation at a particular time. To achieve this it is
essential to have login records that can be searched by workstation address and time.
A record of the ownership of login accounts and e-mail addresses is, of course, a basic
management tool. It is recommended that all users be made formally responsible, and
accountable, for the activities of accounts allocated to them.
The next level of complication arises when different client computers use a single IP
address at different times. This occurs whenever a pool of IP addresses is shared between a
number of computers, for example in dial-up, mobile or fixed networks where addresses are
allocated temporarily to active computers (the DHCP protocol is commonly used to manage
addresses in these situations). Here it is essential to have logs of which client computer was
allocated each IP address and the times when the use of the addresses began and ended.
Once the workstation has been identified, records of logins and times can once again be used
to identify the responsible person.
3.1.1 Summary of Logs
Type of System Logs required
Single-owner workstation, fixed IP • IP -> owner
Shared workstation, fixed IP • IP + time -> login login -> owner
Dynamically configured workstation • IP + time -> workstation
• workstation+time -> login
• login -> owner

3.1.2 Federated Authentication Systems
Federated Authentication Systems allow an organisation providing a service to a user to rely
on another organisation (typically the user’s home organisation) to authenticate the user,
rather than maintaining its own local database of usernames and passwords. Examples of
federated authentication systems include the UK Access Management Federation (http://
www.ukfederation.org/), used by publishers to authenticate access to online resources, and
eduroam (http://www.eduroam.org), used by education organisations across Europe to
authenticate visitors from other organisations and provide them with network access.
Some federated systems are designed to preserve the anonymity of the user. In these
cases the organisation providing a service is merely told ‘yes, this is one of our users’,
accompanied, if necessary, by attributes that may be required by the service such as whether
the user is a student or member of staff. Such systems add an additional step to the process
of tracing access through logs, since the organisation that gives the authenticated user
access to its service may not itself be able to link the service provided to the user it was
provided to (the step resulting in a ‘login’ name in the previous table). Instead this requires
two steps: the service provider needs to identify the home organisation and the home
organisation needs to identify the user they authenticated.
Details of the logs that are needed will vary between different federated authentication
systems and should form part of the federation agreement. In general, service providers
need to record at least the home site that provided the authentication for a particular service
request, together with other identifying material such as the service that was requested
and any identifier that was provided by the home site. Home sites that are providing
authentication to external organisations need to record at least the source and time of the
authentication request, the same identifying information for the request, and the local user
who was authenticated.

3.2 Intermediaries
The types of client logging discussed above deal with the situation where a direct TCP/
IP connection exists between the client machine and the server. However there are also
a number of services and configurations where some other machine is involved as an
intermediary between the client and the server. Additional logs are needed in these cases as
the end server will see the activity as originating with the intermediary, while the client logs
will only show a connection to the intermediary and not to the end server where the alleged
misuse occurred. Intermediary logs are required to link these two sets of information.
The most obvious examples of intermediaries are proxies and caches; store and forward
systems such as e-mail servers also act as intermediaries. There are also other systems,
particularly those that act as gateways between different Internet services, that may act as
intermediaries and therefore need to record appropriate logs.
Some intermediary systems handle requests for very large numbers of clients and servers so
that a simple timestamp may not be sufficient to identify a communication uniquely. Logs
on these systems will often need to record additional details of each transaction, such as
URLs for web requests or subjects of e-mail messages, to allow a particular communication
to be identified. Complaints where these details are not included are likely to be very hard
to investigate.

3.2.1 Proxies and Caches
Proxies are systems explicitly designed to act as intermediaries. Clients make requests to
a proxy and the proxy may send that request to a server on behalf of the client. Proxies are
usually designed to support one or a small number of protocols, for example HTTP and
FTP, and, unlike gateways, use the same protocol for the requests they receive and send. A
caching proxy may respond directly to some requests as an alternative to passing them on. A
Logfiles filtering proxy may determine that either the request or response is not appropriate according
to a set of rules and may either block it or replace the response with a warning.
As the server’s records will show the request coming from the IP address of the proxy, the
proxy must itself retain a log of the client IP address or authenticated user on behalf of
which each request was made. A busy proxy will often make many requests each second to
a popular server so the time of the request and identity of the server will not be sufficient to
identify an individual client request uniquely. It is therefore normal for this type of proxy to
retain additional information about each request, for example the web URL, that will allow
the responsible client to be linked to the information recorded by the server. Where the
protocol and local policy permit, a great deal of investigation time can be saved if the proxy
includes the client address in each request it passes on. If this is visible to the end-user, or
recorded in the logs on the final server, then there will be no need to search through large
volumes of proxy logs.

3.2.2 E-mail and News
Store and forward systems, such as news and electronic mail, differ from proxies in that
the transaction with the client is completed before the message is passed to another server.
However they still act as intermediaries so should retain a log of the client from which each
message was received and the server or other destination to which it was passed. In theory
each mail or news message includes as part of its content a full record of its origin and
path: however, this information is relatively easy for a malicious user to forge. Trustworthy
logs kept by servers are important tools in detecting this kind of forgery. This may be
especially important if messages are forged so as to appear the responsibility of an innocent
party. As with proxies it is common to record the message subject or ID to ensure correct
identification of a particular message.

3.2.3 Network Address Translation
Network Address Translation (NAT) is another type of intermediary, but one that works
at the network rather than the application layer. Clients of an address translation system
usually have private IP addresses, as defined in RFC1918. Clients send any packets for
external destinations to the NAT system, which rewrites them with a public source IP
address and forwards them to their destination. The NAT system must of course remember
the state of each communication so that when it receives response packets it can rewrite
their destination addresses and send them to the correct client.
NAT systems can use a variety of strategies to allocate external addresses to internal clients.
Some create static mappings between external and internal addresses for each client;
more commonly the system will behave like a proxy, using one or a small pool of external
addresses for all communications.
Address translation systems have the same basic logging requirement as other
intermediaries: to be able to relate a request made by the translation system to the client
that invoked it. However the complexity of mapping used by some systems can make this
a challenge. Attention to traceability requirements must therefore be included at the design
and implementation stages of any address translation system.

3.2.4 Gateway Servers
The final class of intermediaries is gateway systems that take a request from a client in one
form and use it to generate a request in another form to a server. The most familiar gateway
at present is probably a webmail server, which takes an HTTP form submission and uses it
to generate an SMTP request. The command sent by the gateway will not necessarily contain
any information originating from the client, so it is particularly important that gateway
servers keep reliable records of their activities. Some gateways may add client information
Logfile to their output – for example webmail servers often include the client IP address as a header
in the SMTP mail – but it is important to know how much of this information can be relied
upon, and how much is under the control of a potentially malicious user. Of course if the
gateway itself is compromised then nothing about either its output or its logs can be trusted.
Some systems can be made to act as accidental gateways, for example a badly configured
web cache may allow e-mail forgery. Such systems, and others with inadequate logging, are
a hazard to the Internet as they provide abusers with complete anonymity for unauthorised
or illegal activity.

3.2.5 Outsourced Servers
It is increasingly common for organisations to outsource some of these services (for
example e-mail or filtering proxies) to third parties. Where this is done, it is in the interests
of both parties to ensure that adequate logs are kept and can be used to investigate
problems. As with federated access management, this is likely to involve collaboration,
since the logs of what happened and who was involved are likely to be held by different
organisations. Outsourcing agreements should ensure that the different logs contain the
information necessary to link them and that responsibilities for doing this are clearly
defined.
3.2.6 Summary of Logs
Type of System Logs required
Proxy server • destination IP address + time + request ->
client/user
Mail or news server • destination IP address + time + request ->
client/user
NAT/PAT server/SOCKS proxy • source & destination IP address (+port) +
time -> client/user
Gateway server • server IP address + time + request -> client/
user
Logfiles

4 Examples
The following examples show some of the types of information that are available to the
victims of computer misuse. Real examples have been used with names and addresses
modified to protect the sites involved. These are typical of the evidence that may be sent to
a site to complain about the activities of its users. In each case the receiving site will need to
use additional logs relating to its clients and intermediaries to understand and investigate the
origin of the misuse.

4.1 Attempted Break-in
The following entries were recorded by syslog on a UNIX® system called victim, whose
clock is known to be synchronised to a reliable UK time source (see section 6.2):
Jul 4 19:08:11 4Q: victim telnetd[338556]: connection from
attacker.example.ac.uk
Jul 4 19:08:12 0F: victim telnetd[338556]: ignored attempt to
setenv(_RLD, ^?D^X^\ ^?D^X^^
^D^P^?^?$^B^Cs#^?^B^T#d~^H#e~^P/d~^P/`~^T#`~^O^C^?^?L/bin/
sh%32614c%11$hn%86
000c%12$hn)
This shows an apparently normal telnet connection from the host attacker.example.
ac.uk, during which the attacker attempts to overflow a buffer in the telnet server program.
This is clearly an attack, whose intent is to obtain a command shell (/bin/sh) with root
privilege, so the victim site would expect Example to investigate. Provided attacker.
example.ac.uk is an end-user machine, this should simply be a case of identifying
the person who was logged on to that machine at 19:08 on 4 July. If attacker is an
intermediary, for example a network address translation (NAT) system, then its logs
will need to be used to identify the internal host that was the source of the activity. In
practice it is most common for this type of attack to come from a host that has itself been
compromised, often using the same attack, so the file system and logs on attacker will
also need to be checked for signs of malicious activity. This also means that times and
logs on the attacking machine may have been modified so user records from a central
authentication server (if available) may be a more reliable source.
It is worth noting that victim was one of a number of similar machines in its department
that were subject to this attack. victim had been patched, so the attack on it was
unsuccessful. The other machines were compromised, and no trace of the attack was left in
their syslog files.

4.2 Inappropriate E-mail
A complaint was received from a Microsoft network subscriber who had received an
offensive e-mail:
From: heidi32396@example.ac.uk <heidi32396@example.ac.uk>
To: user@msn.com <user@msn.com>
Subject: Teens & Hot Horny Housewives
This was already suspicious as the username given in the From field of the e-mail is not of
a form used by Example and there is no user of that name. The full headers from the e-mail
were obtained from the complainant, and showed clearly that the e-mail had not originated
from a JANET site:
Received: from cpimssmtpa02.msn.com - 207.46.181.107 by email.msn.
com with
Microsoft SMTPSVC;
Fri, 11 May 2001 13:09:34 -0700
Received: from njkkkkkkk.com ([38.31.27.7]) by cpimssmtpa02.msn.com
Logfiles
Page 18 GD/JANET/TECH/011 (10/10)
with
Microsoft SMTPSVC(5.0.2195.3225);
Fri, 11 May 2001 12:56:53 -0700
Message-ID: <NApRVdL7bndr-.YGar-xZdAAl8Fi2SQJtihM@njkkkkkkk.com>
From: heidi32396@example.ac.uk <heidi32396@example.ac.uk>
Bcc:
To: user@msn.com <user@msn.com>
Subject: Teens & Hot Horny Housewives
Date: Sun, 05 Mar 2000 20:34:33 -0400 (EDT)
MIME-Version: 1.0
Content-Type: text/plain; charset=”US-ASCII”
Content-Transfer-Encoding: 7bit
Return-Path: heidi32396@example.ac.uk
The Received headers created within the msn.com domain indicate that the message in
fact originated from a customer of an American ISP, and that references to Example had
been forged to conceal the true origin of the message. The logfiles of the mail server at
Example further confirmed that no message had been sent to the recipient e-mail address
from that site.
The message may have been forged by the ISP’s customer, or may have been inserted into
the Internet through a badly configured proxy or other system at the customer site. Both
techniques are all too common ways to generate volume advertising by abusing services
provided and paid for by others.

4.3 Abuse of Webmail Service
A customer of an international ISP received an offensive e-mail from an address at hotmail.
com. Hotmail is a webmail gateway, so is effectively an intermediary. Like many other
webmail systems, Hotmail adds headers to the e-mail it sends that include the IP address of
the host that submitted the web request to Hotmail that caused it to generate the message.
On inspection of the full headers of the offensive mail message the following information
was found.
Received: from mail pickup service by hotmail.com with Microsoft
SMTPSVC;
Thu, 12 Apr 2001 08:12:26 -0800
Received: from 192.251.0.8 by lw3fd.law3.hotmail.msn.com with HTTP;
Thu, 12 Apr 2001 16:12:21 GMT
X-Originating-IP: [192.251.0.8]
Date: Sun, 1 Apr 2001 10:00:00
The X-Originating-IP header and lines above it were written by Hotmail and are usually
reliable; note that the Date header has been forged by the creator of the offensive mail.
Including reliable information in the outgoing message means that in most cases it will not
be necessary to search through the logs on the Hotmail intermediary. However, these logs
should still be kept, as there have been attempts to forge or conceal this X-Originating-IP
information.
The IP address 192.251.0.8 belongs to a site cache, so the logs on this cache must
be checked to determine which local host was responsible for the request. This involved
searching for the Hotmail host named in the last of the received lines: lw3fd.law3.hotma
il.msn.com. The following entry was found, but note that there is a 10 second difference
in the time stamps. In this case the correct sender was identified but even this time
difference, which was due to a failure to synchronise the system clocks to an international
standard, could have prevented or cast doubt on the identification of the offender.
Thu Apr 12 16:12:31 2001 6913 babel.comp.example.ac.uk TCP_MISS/200
12339 POST http://lw3fd.law3.hotmail.msn.com/cgi-bin/premail/4284 -
DIRECT/209.185.240.250 text/html
babel.comp.example.ac.uk is a single-user workstation and its login records showed
the identity of the account that was logged in at the time of the offensive posting.
Logfiles
GD/JANET/TECH/011 (10/10) Page 19
badguy pts/2 babel.comp Thu Apr 12 12:08-17:23 (05:14)

4.4 Denial of Service Attack with a Web Server Intermediary
The site intermediary.ac.uk observed an unusually large traffic load on its link to
the JANET network. At the same time the web server of victim.com suffered a denial
of service attack, receiving a large number of packets from www.intermediary.ac.uk.
On further investigation, the following request was found in the web logfile on www.
intermediary.ac.uk.

This indicates that the web server received a request from a host called webcache.
attacker.ac.uk. It is a reasonable guess that this host is itself a proxy. The filename
requested contains a series of ‘../’ entries which attempt to move the program out of the
initial ‘/scripts’ directory and indeed out of the area normally containing web scripts
or files. This should not be valid and should be rejected as an illegal request, but the web
server program had a well known bug, known as a directory-traversal vulnerability, which
let it accept and service requests of this type rather than returning an error message. The
directory traversal is used to move to the Windows system directory and to run the ‘ping’
command with parameters to make it generate 2000 packets, each 64K bytes in size, as fast
as possible. These caused both the unusual traffic flow and the denial of service attack.
Because the web server logs are available it is possible to identify the system webcache.
attacker.ac.uk from which the request came. However, as this system is itself a proxy,
the attack must be traced back by checking the logs on that proxy for a request made to
www.intermediary.ac.uk at that time, and containing the same request string. The cache
logs should identify the client machine responsible and from its login records the offending
user can be found.
This final example illustrates the range of logs that can often be needed to trace activity
back to its source. Not all the computers through which an attack passes will themselves
be compromised; they may be performing quite correctly or just offering a service that has
been used in an unauthorised way. Indeed, even though the web server in this case could
have been broken in to using the same vulnerability, it was not necessary to do this to make
it participate in a denial of service attack that was disruptive both to the target and to the
intermediary site.

5 Identifying Attacks
The remaining group of systems whose logfiles are likely to be of interest is servers. Whilst
logs from clients and intermediaries will usually indicate attacks against other sites, logs
from servers will normally be used to detect attacks, or attempted attacks, either on the
servers themselves or on other local systems. Public servers such as web or mail systems
are likely to be the most exposed to hostile activity on the Internet so these should always
be configured to keep good and secure logs. Internal servers should also record logs as these
may be subject to attack from within the organisation, or may be used by malicious local
users to practice before attacking systems elsewhere. Detecting and preventing such activity
at an early stage by recording and monitoring server logs can save the organisation a great
deal of trouble.
Attacks against servers generally have one of two intentions. One is to gain access,
presumably unauthorised, to the information or services supplied by that particular server;
students might well be motivated to try to gain access to the server that contains their
examination results, for example. The second aim is to make a server perform some function
for which it is not intended, for example to make it act as an intermediary in another attack,
as shown in Section 4.4. Traces of these two types of attack will often appear in different
sets of logfiles. It is therefore essential to ensure that both types of logs are recorded
and checked regularly for suspicious activity. All logs should record both successful and
unsuccessful attempts to use the system. Repeated failures to log in may indicate an attack, a
configuration problem or a user problem.

5.1 Authentication Logs
Systems that require authentication should always keep a record of the users who
authenticated successfully, and also of failed attempts to authenticate. Where a number of
consecutive failures cause an account to be locked out, this must be recorded. In most cases
the system should take additional measures to alert the operator to this event. Authentication
failures may be due to mishaps – genuine users can mistype or forget their passwords – but
any patterns of failures should be investigated. Authentication logs should also be checked
for any unexpected periods of silence, as these may indicate that an intruder has been able
to tamper with the logs to conceal evidence of their activities. Entries in authentication
logs should always be associated with an accurate time; where a single authentication gives
access to a session, rather than a single transaction, it can also be helpful to record the time
when the session ended.

5.2 Service Logs
Servers that are accessible to untrusted users should also retain logs of the requests made to
them. For example, public web servers should usually record the URLs requested by their
clients. The time and the IP address from which the request came should also be retained.
As with authentication logs, unusual events are often a sign of problems. These may
include periods of unusually low or high activity, though web servers in particular can see
unexpected surges in legitimate requests. A common way to attack a server is to present it
with unexpected input: very long requests, or those containing unusual characters, should be
investigated, as should any request containing the name of a command interpreter, such as /
bin/csh or cmd.exe. Service logs often cannot show whether an attack was successful –
even a request that failed as far as the service is concerned may have achieved its malicious
purpose before it was rejected.

6 Implementation
The ways to enable and configure logging will vary from one computer and software system
to another, and should be covered in the system documentation. This section cannot deal
with such detailed instructions, but identifies a number of common topics that have been
found to be useful in many different circumstances.

6.1 Central Logging
One of the uses of logfiles is in the investigation of attacks on computer systems. However,
a successful attack will often give the intruder complete control of the system, including
the ability to delete or modify files. Tampering with logfiles or the programs used to access
them to conceal evidence of the break-in is normally one of the intruder’s first priorities.
The risk of finding deleted or corrupted evidence can be greatly reduced by holding logs
on a different computer. A successful intruder may still have the ability to add records to
the logfile but is much less likely to be able to rewrite history. Having logs on a dedicated
central system can also make it much easier to deal with logs from a large number of
different computers, as these will automatically be gathered into one place.
The most commonly used system for writing remote logfiles is the syslog service, which
was originally written for UNIX® but is now available for most other operating systems.
Syslog allows messages of a standard form to be both written to a local file and transmitted
over the network to one or more central logging hosts. Syslog sends each message
separately, so there should be little delay between the local and remote copies of the logfile
being updated. Messages are automatically timestamped by the service, but this still relies
on the system clocks being synchronised to some common standard (see the next section).
Two potential problems need to be borne in mind when using the syslog service over a
network. The first is that messages are sent over the network using the User Datagram
Protocol (UDP), so there is no guarantee that they will arrive at their destination. On a Local
Area Network with little congestion this is not normally a problem, but messages may be
lost if there is a high traffic load on the network. Syslog over UDP may not be sufficiently
reliable to be used over a Wide Area Network, especially as some types of attack may
themselves increase the likelihood of congestion and loss of UDP messages. Some syslog
implementations allow TCP to be used instead, to increase reliability, though this may be at
the cost of longer delays between the event and its being logged. For these reasons it is good
practice to store logs locally on the system that generates them, as well as sending them to
a central logging host. The other problem is that use of remote logging can itself lead to
network congestion if a very large number of error messages are generated. For example a
denial of service attack on a network will be made much more effective if the target systems
are trying to report each attack over the same network. Avoiding this problem requires
careful design of the logging system. One option is to summarise batches of repeated log
messages into a single message including the number of repeats in a particular period of
time.
Highly reliable central logging systems can be built by using separate network connections
to carry logging messages. In such a system, the critical computers generating logs will
have dedicated network links to the central logging host. These links should not be used
for normal traffic, nor connected to the production network. Such systems can also protect
against sophisticated attacks where a denial of service attack against the logging system is
used to conceal an intrusion into vital production services. The ultimate in tamper-resistance
is to write logs immediately to an unerasable form of storage such as a write-once, readmany
CD-R or DVD-R drive.

6.2 Timestamps
Most incidents involve more than one computer so it is common for an investigation to
have to deal with logfiles from many different systems, possibly at different sites or even
in different countries. Entries in logfiles that refer to the same event are most commonly
matched by comparing their timestamps. To ensure that the times from different logs can
be compared within a site, or with a complaint made from the other side of the world, it
is essential that the times of different computers making the logs be synchronised to an
international standard. The Network Time Protocol (NTP) is the common way to do this
across computer networks and countries. A central NTP service is provided, linked to a
number of atomic clocks keeping standard international time, which JANET sites can and
should join. For more details of this service, see:
http://www.ja.net/services/ntp/
International incidents often occur across different time zones so that the numeric values
of time may not be directly comparable. All systems should be set up to record the time
zone against which they are logging, and whether daylight saving has been applied.
Unfortunately there are a number of different formats for recording this information. Indeed
some time zone names are not unique, so correlating logs is often harder than it should
be. When reporting an incident, you should always include the time zone with respect to
Coordinated Universal Time (UTC, for example ‘16:19:00 +0100’ for British Summer
Time) and whether timestamps are synchronised to an international standard. Without this a
great deal of effort can be wasted.

6.3 Automated Processing of Logs
Logfiles can grow very large, and routinely scanning them by eye may not be possible.
There are a number of computer programs that can help to monitor logs and many sites
have written their own scripts for this purpose. Such programs aim to identify patterns
in the logfiles, and they can be very effective at identifying common, known problems.
However a program is unlikely ever to be as good as a human at spotting new or unusual
patterns. A compromise solution may be to use programs to filter out entries that are
known to be harmless (though even these should be checked occasionally) and well-known
problem patterns, and then to scan the remaining information by eye. As new patterns are
identified they can be added to the known-good and known-bad filters. This process of
tuning can be time-consuming, but is the most effective way to extract information from
the logs. Raw logs should still be kept, subject to the issues relating to Data Retention in
Section 2.2, as it can be useful to review them when new patterns are identified. It is very
common for hostile activity to take some time to be noticed, and reassuring at that stage to
be able to review older logs to determine when it actually began.

6.4 Graphing Activity
Humans are quite good at spotting patterns in textual information, but they are extremely
good at finding them in graphical representations. A highly efficient way to monitor the
health of any system is to graph some appropriate measures of its performance. Graphs of
network traffic levels such as those provided by the JANET Netsight system:
http://netsight.ja.net
are fairly commonly used. Less frequent, but very useful, are graphs of, for example,
number of requests to a web server, number of failed and successful logins to a network, or
system idle time or memory usage. Regular patterns in these graphs will quickly become
familiar to the operator. Once the system’s normal behaviour is well known, unexpected
changes will often be spotted without conscious thought. Detailed investigation can then be
done using the original logfiles.

This document has dealt only with the logs that can be recorded by individual computers
and other systems. A great deal of useful information and early warnings can also be
obtained by looking at computers and networks in combination. Any organisation that is
concerned to protect its own systems and reputation should also be developing systems to
monitor these systems. Two examples are given below of what can, and should, be done.

7.1 Network Flows
Networks can be characterised very effectively by knowing what flows of traffic are taking
place along them. Flows can be classified by their source and destination IP addresses, or
groups of addresses, and ports, along with the volume of traffic making up the flow. For
example it would be quite normal in an open-access workstation room to see large numbers
of packets coming from the HTTP ports of external machines into the workstation room –
web browsing is a legitimate use for these types of systems – but much less normal to see
traffic flowing out from the HTTP port on the workstations. The latter situation may indicate
that someone has set up a web server on a workstation, either with or without authority, or
that systems have been compromised and are being controlled by a remote intruder. Thus
simple flow information can be effective in detecting both security problems and breaches of
policy.
Network flows are often the only way to trace denial of service attacks, since these
commonly use forged addresses to conceal their origin. If addresses are forged then an
attack can be hard to trace at the IP level; instead information taken from routers and
switches will be needed to determine which ports or interfaces are carrying the traffic. Such
information can rarely be gathered from a single central point but needs effective, secure
reporting and management to be set up on the network devices when they are deployed.

7.2 Intrusion Detection
Network flow monitoring examines where packets are going to and from; network Intrusion
Detection Systems (IDS) examine the content of packets or search for patterns in time. For
example an IDS might be configured to check HTTP packets for the commands used by
well-known attack tools, or could detect that a particular IP address had sent packets to all
the addresses in a range. IDSes can be very effective at warning of known problems, though
they are less good at identifying new, suspicious activity. Some IDS packages can take
action to respond to a detected threat, either by blocking the hostile traffic or by targeting
a response at the apparent source. However these options, sometimes known as Intrusion
Prevention Systems (IPS), run the risk of denying service unnecessarily (any system,
computer or human, will occasionally make mistakes) or of taking reprisals against an
innocent party. In the worst case an IPS may even assist an attacker if it can be made to react
in a way that harms the organisation. For example if an attacker can persuade an IPS that it
is being attacked by a vital internal or external system, such as a DNS resolver, then the IPS
may effectively cut off the organisation’s access to that system or the Internet.

No comments:

Post a Comment