You know what they say about best laid plans. No matter how careful you are, your IT processes will occasionally run into problems with the potential to seriously delay or damage your work. Luckily, a good enterprise scheduler can diagnose and deal with errors as they come along.
On the latest “No-Stress Job Scheduling,” Jared Dahl breaks down the ways an advanced job scheduling tool can handle workflow errors, whether they are at the command level, the job level, or system-wide.
Command Level Tools
The right enterprise job scheduler will have features to diagnose and respond to errors at any point in the process. At the most basic level are the tools that kick in if an error occurs in the execution of a command.
At this level, the first thing your workload automation tool should be able to monitor are exit codes, the numbers that come back from a program call to indicate success or failure. Not all programs use the same codes. An ideal job scheduler will be able to map out the return codes for every program you use.
Many programs also send data to the standard output stream, including messages that can alert you as to whether your command worked. In fact, some applications don’t use exit codes at all—they only send an error message. The second major feature you want your enterprise job scheduler to have is text scanning so it can interpret and respond to these messages.
Once your job scheduling tool has alerted you to the failure of a command, it should be able to deal with the failure in a few different ways. It could stop the process immediately to prevent the next steps of the job from trying to run when the prerequisite step hasn’t been completed, or it could repeat the step a given number of times, possibly with a specified wait in between. Finally, in certain cases errors are expected and the job scheduler could be set up to ignore them.
You’ve been notified that your job failed and your job scheduler has responded by halting the workflow. How do you find the problem to fix it? A critical part of the debug process is your job scheduler’s ability to tie back into the job log and determine which command caused the job to fail. This means it should keep a clear record indicating where in your job stream the problem occurred.
Job Level Tools
Your company probably runs a lot of the same jobs more than once. In order to help you debug your problem, it’s essential that your enterprise job scheduler keep a good job history. This will differentiate executions of the same job and tell you why the job scheduler ran the job, where it ran, when it ran, and how long it took. Your scheduler’s job log should have a record of the same data that is important at the command level, such as error messages, debug statements, or checkpoints.
When a job fails, your workload automation tool needs to be able to first notify you of the failure and then react. You might want it to immediately hold the job until you have the chance to fix it. Another option would be to run the job again every ten minutes. In some cases, the enterprise job scheduler could be triggered to run another job to clean up the error condition.
System Level Tools
On the system level, your most important tools are the ones that notify you when there is a problem. Many products have rollups, dashboards, and reports to provide information on the general health of your system and answer questions such as how many of your jobs are running on time and how many are failing. Ideally, your job scheduler will have an easy-to-use dashboard to give you a quick glance at how the system is doing.
Manage Workflow Errors with an Enterprise Job Scheduler
To keep your business running smoothly you will probably need tools to diagnose and handle problems at all three of these levels. For an enterprise scheduler with sophisticated monitoring, notification, and reporting features, try a free trial of Automate Schedule.