January 29, 2016
by Dmitri Zimine
Fellow automators, we are happy to announce StackStorm 1.3. In this “Holiday release” (yes most of the work took place around the holidays) we took a break from “big features” and focused on addressing key learnings from extensive field usage, turning feedback from our expert users, and our own take aways from internal StackStorm use, into practical product improvements. The highlight of the release is a long-awaited ability to restart a workflow from a point of failure. We have been pushing it through for quite a while, first in upstream OpenStack Mistral, than exposing it via StackStorm, and now it’s ready for the prime time. With other highlights – making it easier to debug rules, track complex automation executions, and keep the history size under control – 1.3 release is a major step up in making StackStorm performant, operational and convenient.
Read on to learn about release highlights and what is coming up next. To upgrade, follow this KB.
StackStorm is built to be as transparent – so users trust the system to take powerful actions. However, we have learned that debugging a rule could be frustrating even for a StackStorm expert. You configure a sensor and set up a rule to call an action on an external event. The event fired, but action did not execute. Where did it fail? Did the event come to a sensor? Did the trigger instance got fired? Did the rule match? Was the action scheduled? And where do I look for all of it? v1.3 brings the tools to find the answers.
Specifically, we put in place some missed links to track the whole pipeline; added extra options to CLI commands and improved
st2-rule-tester to enable an end-to-end rule debugging workflow for a variety of scenarios. Traces and trace-tags come handy here, too (see below). Check rule troubleshooting docs for instructions; a blog post with details is coming shortly.
Firing multiple actions on external events, nested workflows, triggering more actions and workflow on action completions via rules and action-triggers give great power. But operating and troubleshooting complex automation requires good tooling. We have been improving transparency: few versions ago (0.1.3), we introduced traces and trace-tags to gain everything comprising a full end-to-end automation execution.Now, ased on the field feedback from our largest users, we are bringing extra options to help ops get to the crux of the problem faster, with less noise. With new improved trace you can:
Trace tags for an action execution can now be supplied in the WebUI. Viewing traces are still in CLI only: a convenient view in WebUI to see the whole chain of events is something we are thinking next; contributions welcome!
Running StackStorm at scale produces hundreds of thousands action executions. Over time the ever-growing operational history begins to impact performance. To make it easier to keep the history size under the tap, we introduce a garbage collector service that auto-trims the DB per your desired configuration. Commands are also available to purge history manually by a variety of criteria.
“But what is happening with my year worth of operation execution records? I need the audit, and want to do some analytics on it!” Not to worry: all audit data, all the details of executions are stored in structured
*.audit.log files. Save them, grok them to your Logstash or Splunk, slice and dice them for insides of your operations. A dedicated audit service is on the roadmap for Enterprise Edition; it will provide a native indexed searchable view for years worth of history, with analytics and reporting on top (sign up for “alpha” soon).
With transparency of workflow executions you know exactly which task has failed; we commented elsewhere on the return of workflows as the backbone of event driven automation – take a look if you are interested in the subject.
But what exactly are you supposed to do when a workflow fails – even if workflow tells you which one fails, now what? When, after a long preparation, workflow creates 100 instances and fails on 99th… and you know exactly the point of failure, it still sucks. What if it failed by external conditions, e.g network connectivity lost or a target service unavailable? Can I fix the conditions, and just continue the workflow execution from where it failed? From 1.3 the answer is “yes you can”.
Now you can re-run a failed workflow from a point of failure. Do
st2 re-run and point it to a failed task (or tasks!); StackStorm re-runs the failed task with the same input, and continues workflow execution. Read here how to do it.
This ability to recover from failure, along with clarity of execution state, is a highlight of the 1.3 release, and one big reason why workflows are triumphing over “just scripts”.
As usual, there are a number of smaller improvements, each to make StackStorm one bit better and one notch easier to use. Check the CHANGELOG to appreciate the improvements.
We are especially happy with community contributions. Folks from Plexxi, SendGrid, TCP Cloud, Netflix, Move.com, and Dimension Data, along with individual contributors brought in quite a few features and fixes. My personal favorites are the support for containers from Andrew Shaw, HipChat improvements for Chatops from Charlotte St. John, and SQS AWS actions from Adam Hamsik and kuboj. Thank you ALL from behalf of all StackStorm users!
Please follow this KB for upgrading. We strongly recommend migrate if/when possible, but the in place-upgrade is tested and should generally work. Always keep the content separated so that you can deploy full automation on a new instance of StackStorm.
These weeks we are heads down improving StackStorm installation. All-in-one installer is a great way to get the turn-key StackStorm installation for evaluation on a clean system. However we understand the need for a custom package-based installations. Stay tuned for proper self-contained deb/rpm packages, they’re just around the corner. And a docker image with StackStorm is in the works, as an alternative for quick evaluation.
Our immediate next focus is managing and operating automation content. A Forge for convenient sharing of integration and automation packs; an end-to-end user flow and tooling support for pack development, deployment, and updates; pack versioning and dependencies, UI improvements to and much more.
And, of course, ChatOps. We see it as a focal point of operations, bridging team work with automation in a magical way. StackStorm brings ChatOps as essential part of end-to-end solution, stay tuned for improvements (some hints here).
For a year of upcoming StackStorm functionality, see ROADMAP.
As always, your feedback is not welcome, it is required! Leave comments here, share and discuss ideas on stackstorm-community, and submit PRs on Github. We are excited to see StackStorm maturing, and together with our user community we will make it great.