|
Monitoring is the process by which you
identify and prevent problems with deployed client/server
or Web-based systems to ensure their continued functionality
and performance. Monitoring involves using a tool that
periodically checks a Web site's overall health and sends
you alerts when it detects problems.
The key to effective monitoring is receiving
alerts for every real problem while preventing false positive
alerts for insignificant issues. The only way to achieve
this delicate balance is to perform a mixture of application
monitoring and hardware monitoring, and ensure that urgent
alerts are issued only for problems that impact the application's
functionality and performance.
Application monitoring involves checking
application functionality and performance by exercising
the application as a user would. Application monitoring
not only exposes functionality and performance problems
that will impact current users, but also exposes emerging
issues (such as subtle performance degradation caused
by a memory leak) that might not yet be apparent to users,
but could eventually grow into serious problems. If you
receive this type of "early warning" alert, you can start
fixing a problem before it has the chance to impact how
users perceive or use the application.
The more traditional type of monitoring--hardware
monitoring--should also be performed because the data
it collects is essential for diagnosing the source of
application-level problems. However, relying on hardware
monitoring alone typically leads to false positive alerts--so
many false positive alerts that you eventually grow desensitized
to all alerts. These false positives often occur
because people try to map hardware failures to application
failures, but the two are not always connected. A piece
of the hardware might fail without affecting the user
experience, or users might experience functionality problems
even when every piece of hardware is running perfectly.
The best way to obtain a reliable understanding
of a system's health is to ensure that your monitoring
efforts cover all the pieces that come into play when
a user exercises the application--including the application
logic, the data back-end, the hardware, and so forth--
and only sends alerts when the combination of results
indicates that a real problem has occurred or is emerging.
For example, fully monitoring a Web-based enterprise system
might involve verifying whether:
- User click paths through critical
transactions do not experience unacceptable delays,
path flow changes, or path content changes.
- User click paths through critical
transactions execute within an acceptable period of
time.
- Database transactions are completed
within desired time limits and database operations function
correctly -- even as the amount of data in the database
increases.
- A Web service or other third-party
content provider delivers a valid response in the expected
format.
- Local machine hardware statistics
(CPU utilization, memory space, disk space, buffer cache,
etc.) do not reach unacceptable levels.
- Client requests that travel through
a Web service proxy match the expected security patterns
and inappropriate requests are not forwarded to the
server.
Ideally, these tests are run from strategic
locations within and without the system to collect the
data essential to rapid diagnosis.
Moreover, if you want to prevent emerging
problems as well as identify existing ones, you can run
a mixture of passive tests and active tests. Active tests
simulate user actions using test drivers, virtual users,
and so on to determine what problems could affect potential
users' experiences. Passive tests unobtrusively monitor
system and transaction details to identify major system
problems (such as an offline server) and to collect data
that helps you diagnose the source of application-level
problems. If you frequently run a well-designed test suite
that represents realistic user transactions and loads,
your tests will typically expose emerging bottlenecks
and functionality issues before your actual users have
the opportunity to notice them. With this advance warning,
you can start diagnosing and repairing the problems before
functionality is impaired for actual users and service
level agreements are violated.
Read the Sd Times article "Discovering
Design by Contract"
See also:
|