The Cylc Suite Engine

NIWA contacts: 

 

Weather and related environmental forecasting using computer modelling is a complex process, made up of many interdependent tasks which can, should any of them break down, cause significant problems and delays. In answer to this, NIWA has developed cylc: world-leading software which helps users to better manage the dependencies of these tasks and improve overall efficiency.

The problem

In weather and related environmental forecasting, such as that done by NIWA's EcoConnect service, large amounts of data are processed by many tasks. These tasks are interdependent, meaning that the output from one task is required as input by others, and so on - sets of interdependent tasks with a common purpose are known as suites. 

Find out more about EcoConnect

Forecasting tasks repeat at regular intervals, or forecast cycles. Some tasks depend on the outputs of tasks in a previous cycle, as well as on the outputs of other tasks in their own cycle (Fig. 1). Before cylc, an entire cycle had to finish before the next one could be started (Fig. 2). 

However, this could be problematic – if something went wrong, resulting in the first cycle being delayed until it ran into the second cycle, the requirement to run entire cycles in sequence meant many hours to get the system back up to real time operation – or worse if forecast cycles backing up caused tasks to fail.

Of course, in principle one could recover from delays much faster by running multiple cycles at once (Fig. 3). 

The problem was that the control system (a computer program) running the forecasts didn't understand the dependencies between tasks insuccessive cycles, restricting catch-up to simply running cycles closer together at best (Fig. 4). 

The solution

One way to map the dependencies of each task is to generate a diagram showing, with arrows, what depends on what (Fig. 5). But when dependencies between cycles are taken into account this becomes extremely complicated. Additionally, there is no clear boundary between forecast cycles anymore, and the system needs some sort of bootstrapping in order to start. 

An alternative, novel way of understanding all of these dependencies - the cylc suite engine – has been developed by a NIWA researcher. Cylc is software which allows the user to define and automatically control suites, and their scheduling, for applications such as forecasting.

It specializes in suites for climate, weather, and weather-driven environmental forecasting (e.g. river flow, sea state, storm surge) and related processing.

Cylc was inspired by some of the techniques used in modern gaming software. Rather than explicitly managing the complicated meshwork of dependencies seen in Fig. 5, cylc represents each task with an autonomous "agent" or "task proxy" inside the control program, and allows all the task agents to interact with each other in order to determine who can run next (Fig. 6). This task proxy only knows what its real task's inputs and outputs are. It's not interested in the entire meshwork of dependencies – only its own. 

These task proxies all interact in the 'pool' of tasks controlled by cylc, looking at each other's outputs to see whether their own inputs are satisfied – at which point they can run In effect, the suite is self-organising, with the task proxies figuring out their own scheduling (Fig. 7).

Once a task proxy is satisfied that its input requirements have been met, it 'tells' the task it represents to launch. A task proxy also tracks its task's progress using a messaging system – as the real task completes and reports its outputs, the proxy task is then able to signal that, allowing other, dependant proxy tasks to use the output as their input, and so on.

Because the task proxies only interact with whichever other task proxies happen to be in the pool at the time, a positive side-effect of this novel approach is that unlike previous systems, cylc adapts to tasks being added and removed on the fly, and a suite of tasks will also flow around tasks which have been delayed or failed wherever possible. 

Cylc is written in the Python programming language. Development commenced in late 2008, and cylc took control of NIWA's operational forecasting systems in 2010. Cylc was released under the GNU GPL v3 license in mid-2011 and it has since been adopted at a number of institutions around the world (see 'Results').

Find out more about the GNU GPL v3 

NIWA released cylc in the spirit of open scientific exchange and collaboration. Cylc itself leverages several other open source projects including the graphviz layout engine and the Jinja2 template processor.

More information about Jinja2 

The results

Recovery from operational data delays on NIWA's previous supercomputer sometimes required more than 24 hours of expert manual task management. With cylc, the system recovers automatically and very quickly, maintaining the timely delivery of forecast products and greatly reducing the need for after-hours staff callouts.

Cylc's target audience is primarily national weather forecasting centres, and meteorology, climate, and environmental forecasting researchers at universities and research institutions.

In late 2011 the UK Met Office, one of the foremost global forecasting centres, completed an extensive evaluation of available suite engines. The Met Office judged cylc to be the best and have officially adopted it to control their large operational and research systems.

The open source license is very important to the UK Met Office - they would not entrust their critical operational systems to an externally supported proprietary product. Open source allows them to make their own changes as they see fit, to recommend cylc to other collaborators, and to share in future developments made by others in the community.

Cylc is now also used by researchers at the Max Planck Institute for Meteorology in Germany, the Bureau of Meteorology in Australia, and at the Marine Meteorology Division of the Naval Research Laboratory in the US (for ensemble hurricane forecasting). Cylc is currently being trialled at the National Centre for Medium Range Weather Forecasting in India, and is likely to be adopted by other UK Met Office collaborators around the world in the next 12 months.

At NIWA, cylc controls the complex suites that underpin NIWA's EcoConnect forecasting business, providing a much higher level of automation and robustness than was previously possible.

The critical functionality needed by large institutions like the UK Met Office is already in place. However, cylc is by now sufficiently multi-faceted that there is no end in sight to the number of potential enhancements. Plans include fostering continued uptake to increase the self-help capacity of the user community, utilization of a new automated full-suite testing method to guard against bugs, supporting dependence between suites, and optimization for very large suites of perhaps ten thousand tasks.

Cylc suite engine homepage 
Cylc source repository 

Images:
• Suite control GUI screen shot
• Large suite visualization example 

Page last updated: 
26 February 2014

Cylc figures 1-4

Click to enlarge. Fig. 1: Two cycles of a small forecasting suite – the arrows show dependencies between tasks. Fig. 2: A job schedule for two consecutive cycles of the small suite in Fig. 1. The length of a box represents task run time. Fig. 3: Cycle catch-up, with the tasks of two cycles interleaved. Fig. 4: Running cycles more closely together – the best that could be done before cylc.

Cylc figures 5-7

Click to enlarge. Fig. 5: The simple example suite job schedule with maximum cycle interleaving (the second cycle finishes much sooner). Fig. 6: Cylc’s task proxy objects– each task proxy is interested only in its required inputs and outputs, so no complicated overview of the system is needed. Fig. 7: Cylc allows muti-cycle, optimal scheduling of tasks

Cylc screenshot

A screenshot of the cylc control panel running the Ecoconnect suite in graph mode.

Cylc - full Econconnect suite

The full ecoconnect suite, visualized by cylc.