Checklists for Checklists on Checklists... Why Not Automate?

Introduction

Text

Automation of IT systems takes many forms. The goal of computers taking care of themselves has been with us since commercial systems became available. Today, new needs are driving a continuing push to automation.

The topic of this white paper is specifically around the advantages of automating one of the last bastions of manual, hands-on management—the operation and maintenance of IT platforms, systems, networks and architectures.

This white paper is not about getting rid of checklists. In fact, it is in awe of the power of a good checklist—but how can we make sure the best procedures are always in place, ready for action and available all the time? That’s where automation comes in.

Managing IT is difficult. Treading carefully is important and for many tasks missing a step or getting steps out of order can be disastrous.

This paper will explain the benefits of applying as much automation as possible, cementing in place a high quality set of actions, according to a proven plan.

This is an old topic given new impetus in the always changing world of IT. Applications become integrated; organizations become interconnected and clients, customers, partners and employees can be on-line 24 hours a day, with little tolerance for downtime or delays.

These days, Operating Systems automate many tasks. Beyond that, most organizations need more sophisticated job scheduling, more intelligence in the monitoring and alerting, and cross platform abilities. Often, effective procedures are developed, used once or twice and then forgotten. It can be difficult to make sure you have the best procedures in place, at the right time—always. Administrators move on, IT changes slightly and procedures start to weaken and become error prone. 

IT Is Changing

Text

Today, we need automation more than ever before. Applications rarely sit in their own ‘silo’ any more. The world is ‘interconnected’, both within organisations, as well as businesses talking to businesses and consumers across the globe and across time zones. The ‘Internet of Things’, devices talking to devices, will soon be bigger than the ‘Internet of People’.

There is a new workforce emerging, with BYOD (bring-your-own-device) becoming commonplace, needing access to business tools on demand, across constantly available networks. Business to business connections need to be open, with demands on reporting, data mining, compliance, and IT management.

More and more organisations are finding benefit in Cloud based solutions. This is not always an external Cloud and the Private or Hybrid Cloud is growing in popularity. A big feature of this model, whether external or internal, is the seamless delivery of services. Both availability and performance need to be high.

A more subtle driver toward automation is the increasing need to evaluate new technologies, new abilities, newly available services that will benefit the company. IT staff, with their depth of technical knowledge, are often the best placed to make sense of emerging technology. With day-to-day running automated, there can be more time available to focus on making the best use of new and existing technology.

The new demands give further weight to age-old benefits. Automation can save money by reducing the manual workload on the shoulders of IT staff. Risk is reduced by removing human error. Jobs can be scheduled and orchestrated across different platforms. Monitoring can be in place constantly, every hour of the day, not just once each hour or half-hour according to the checklist.

What's the problem?

Many IT budgets are getting smaller, rather than bigger. There is an expectation that IT can deliver services faster, securely, with no disruption and at a lower cost than ever before.

Knowledge management is a significant issue in many departments, with skilled staff coming and going, taking their silo of specialised knowledge.

The disciplines of ITIL (the IT Infrastructure Library) tell us that best practise includes management of incidents and problems, changes and new releases. Underlying this is a repository of knowledge about configuring the IT assets, as well as the processes and initiatives needed to keep everything up and running.

Automation across the range of disciplines needed for IT management can ‘lock-in’ the knowledge needed, and ensure best practice is always followed.

There can be many reasons why best practise management is not in place, or has slipped. For example:

  • Nothing is in place that provides a single view of the network, including servers and storage 
  • Management becomes isolated to individual platforms with no ‘enterprise’ view, and no coordination between systems
  • There are too many ‘fires’ to put out, with administrators constantly in break-fix mode
  • When Automation is in place, no-one takes responsibility for keeping it up to date in an ever-changing IT environment

The Historical Approach

Text

Automation is a long-time aim. Commercial computers came along in the late 1950s and early 1960s. They needed hands-on support, and were vulnerable to human error.

In 1964 IBM introduced OS/360 and computing went through a revolution. Batch processing was now the norm. The use of disk drives and storage subsystems became standardised, which also needed management. It pioneered the idea of job scheduling, with Job Entry System (JES3).

It is easy to forget these days, just how much automation is already there. The database, communications, security, work management and monitoring are no longer separate entities, needing quite separate control facilities.

IBM’s Mainframe, Pure, and Power with i, are good examples of machines that configure, run and tune themselves. In the rare event of a failure, they call ‘home’ to come and fix the problem, and automatically role-swap to a backup if needed.

 

Image
automated robot self-replication

Cross-platform abilities have now become compulsory, as well as the need for a high-level of application awareness. The automation story is still progressing and today there are many tools available to provide scheduling, backup and recovery, workload management, watching and alerting. Today’s focus is on tight integration of the management tools, with flexibility to cater for changing environments.

What’s our solution?

To provide a solution for automation in the changing world of IT needs a combination of software and people skills. The software product must be easy to configure and use. It should provide a single view across the enterprise and have the flexibility to deal with the variety of IT in use today.

A common feature of checklists for humans is to take action if something has not happened. The software needs to have this ability as well. Many monitoring products focus on detecting abnormal events and error messages.

The software should be able to start a script, or call a program; while raising an alert. Work flow is needed to send an alert to different levels of support if an error condition is not fixed quickly.

The people skills needed include having intimate knowledge of the systems, or at least being able to learn and document this knowledge from the current administrators. This brings us back to the checklist. As stated at the beginning, this paper is in praise of checklists and the power they bring. With the right software in place, a set of solid and effective procedures translates into automation rules for scheduling, monitoring and management of the systems.

The ‘translation’, as well as developing good procedures, takes particular skills. The procedures and knowledge already in-house are important and can be effectively combined with a specialist consultancy service, to provide expert knowledge on automation, and on the software product selected.

This brings 5 key particular benefits:

 

Image
continuous improvement cycle
  • Specialised skills can be combined with in-house knowledge to evaluate current management and develop a comprehensive automation plan. The expert knowledge is ‘captured’ in the software rules.
  • Automation saves money. For example, routine tasks do not consume the valuable time of a skilled administrator. Responses to unexpected errors are more rapid, with the potential to avoid downtime altogether. Vital routines are always in place and running, such as job scheduling, backups, virus protection, patch management, security monitoring, error detection and other tasks specific to your business.
  • Ÿ Automation makes money. The skills of the IT department can now focus on making the best use of available technology, for the needs of this particular business. For example, this can lead to innovations that boost the company’s marketing efforts; or perhaps improve the supply chain connections; or find new and better ways to ‘mine’ the company’s data.
  • Detection of unexpected events, as well as events that didn’t occur when they should have is immediate, with instant action.
  • Services and applications become more available, and perform better. Workload orchestration across different platforms occurs without interruption and without relying on somebody to physically check everything is OK.

For example:

Here’s a simple example of automation at work. The email server has had lots of activity and the log files are filling the disk, and trigger a space alert. If the issue isn’t fixed, emails could stop and business is impacted. Using automation we can take immediate action, and ensure we follow the typical steps of a human operator (that is - a checklist). For example, first compress all log files, then archive and delete. We only need human intervention if something goes wrong with that process.

As another example, consider the common situation where a server needs a regular full backup, however to achieve that needs no work to be active. If this is a critical production server, this usually needs hands-on attention. The application must stop and database replication (for example, to a DR site) must suspend. Once the backup completes normally, everything must restart successfully.

Traditionally, this is a manual, after-hours task. At the same time, there is a set sequence of steps that can be adapted for management by an automation engine. Once automated, we can set up alerts to make sure the process has completed.

In each of these examples we gain valuable time. The IT team can concentrate on continuous improvement of the rules, as well as improvements that add value to the business.

The Top 10 Things to Look for in an Automation Project

Text
  1. Find expertise in automation and systems management. For many organisations this means a combination of skills. In‐house knowledge is important and understanding the current IT environment, at a deep level, is obviously an important part of deciding an effective plan. Then, combine that knowledge with skills in the particular software package selected, as well as skills in managing systems and automation.
     
  2. Choose an effective software package. To begin with, cross‐platform abilities is a must‐have for most organisations.
     
  3. The software must be comprehensive, including job scheduling, monitoring and alerting, security auditing, and SLA reporting. Scheduling needs to be both calendar and event driven.
     
  4. You need to be able to watch for what does not happen as well as what does happen. For example, if a file hasn’t arrived by a specific time, or the backup fails to start, or there has been no FTP transfer to a supplier for more than 4 hours. These ‘non‐events’ warrant an alert.
     
  5. Select software with the flexibility to suit your way of working. In response to an alert, the software needs the flexibility to both send messages (Console message, email, SMS) as well as the ability to invoke a script, or procedure, or program. That in turn enables a wide range of possible responses. For example, automated analysis of log information could help to decide which possible intervention is best.
     
  6. There should be in‐built workflow to step up alerts for more rapid response to issues.
     
  7. An integrated and ‘single pane of glass’ view across the enterprise.
     
  8. The product needs to be easy to use and easy to set up. Today, administrators do not have time to attend lengthy training courses.
     
  9. Look for strong technical support for the product, locally and globally.
     
  10. Also look forregular updates and new releases, showing the vendor is continuing to invest in their software.

The most important thing to look for in people skills

Experience. There is no substitute; experience counts in IT Management. The ability to work closely with people who have specific knowledge of the current status is a key quality. A deep knowledge of the software product is also, obviously, valuable. The overriding skill, however, is in being able to take current management practices, recommend improvements if needed, and ‘translate’ these procedures into effective checklists and software rules.

Automation Objectives

Text

Automation needs a framework and a good starting point is to set objectives. This could be done using ITIL standards, and modified more specifically for your site.

Once objectives are set, procedures can be put in place to make sure they are met. This will be an ongoing process and needs ongoing review to ensure the plan is working.

Incident management should include steps to make sure that as unexpected issues arise, they are analysed and the ‘fix’ built into the automation. This is where the flexibility of the software comes into play. There needs to be a variety of facilities, even down to being able to ‘mimic’ a real person’s keystrokes, as well as things like being able to drill into and analyse a log file.

Automation doesn’t mean we no longer need people. IT is a constantly changing world and we ‘evolve or die’. The automation rules should not stay static and responsibility needs to be assigned to someone for the ongoing maintenance and health of the software and processes.

 

Image

Conclusion

Text

This discussion is aimed at provoking thought around the advantages of automation. For a strong, crossplatform, highly functional and comprehensive solution I recommend Halcyon Software.

Clients of Halcyon Software are able to benefit from a wealth of experience, gathered over 25+ years of developing automation software. Halcyon understands the need for high quality products with high quality technical support and a dedication to customer service.

Halcyon is a global organisation, with a centre in the United Kingdom, the United States and Australia; plus authorised partners across Europe, Africa and Asia. They have an impressive range of tools to help manage, and automate workloads, including seamless integration to a ‘single-pane-of-glass’ view across the enterprise.

Regular updates to software are constantly being released throughout the year and Halcyon works closely with its customers in the continual development of a product roadmap to ensure that new features and products accurately reflect changes in market demand.

In Australia, ONCALL Group specialises in helping our clients use Halcyon to manage their systems to a high, ‘best-practise’ standard. We tailor solutions to the individual needs of an organisation.

 

Image
Enterprise Wide System and Application Monitoring with Remote Access Options

 

Call us at 800-328-1000 or email [email protected] to set up a personal consultation. We'll review your current setup and see how Halcyon can help you achieve your automation goals.