Network Automation with StackStorm and Docker

May 24, 2016
by Matthew Stone

What is StackStorm going to do with network automation? Ever since we joined Brocade, it’s been everyone’s question, and we have been hand-waving some answers. But the talk walks, code talks: let’s show something really working. Today’s example is on Docker network automation, where StackStorm makes physical networking follow Docker containers as they get created.

Millennial kids should watch my video with detailed explanations on StackStorm’s YouTube channel.

True hackers might jump straight to automation code on GitHub to see how it is built and try it out on your StackStorm instance.

Or, just read on.

The evolution of Docker networking has been fun to watch, and with the latest additions to libnetwork, things are getting even better. Docker recently added Macvlan and Ipvlan to the list of drivers for libnetwork. (Note: At the time of writing these drivers are still considered experimental) These drivers allow Docker containers to speak directly to the physical network. Getting traffic in and out of overlay networks can be a challenge. You need to implement a VTEP on the physical switch or vSwitch to communicate with the rest of the network. Docker decided to solve the problem by allowing you to send traffic tagged with a VLAN ID. Something every network engineer has done. This allows you to treat container networking similarly to how you treat virtual machine networking. I’ll leave the detailed explanation to the writeup Docker did on the subject.

Tagging container traffic means you need to coordinate your physical network configuration with your Docker configuration, and do it automatically. We use StackStorm to trigger on creating of Docker network, and fire a workflow that reconfigures the physical network using Brocade VDX switch API. Here is now it’s done, step-by step.

We started by creating a Swarm cluster, connected to a Brocade VDX switch.

> docker info
Containers: 2
Images: 4
Server Version: swarm/1.1.3
Role: primary
Strategy: spread
Filters: health, port, dependency, affinity, constraint
Nodes: 2
vagrant-ubuntu-trusty-64: 172.28.128.6:2375
└ Status: Healthy
└ Containers: 1
└ Reserved CPUs: 0 / 1
└ Reserved Memory: 0 B / 514.5 MiB
└ Labels: executiondriver=, kernelversion=3.13.0-83-generic, operatingsystem=Ubuntu 14.04.4 LTS, storagedriver=aufs
└ Error: (none)
└ UpdatedAt: 2016-04-29T16:14:55Z
vagrant-ubuntu-trusty-64: 172.28.128.5:2375
└ Status: Healthy
└ Containers: 1
└ Reserved CPUs: 0 / 1
└ Reserved Memory: 0 B / 514.5 MiB
└ Labels: executiondriver=, kernelversion=3.13.0-83-generic, operatingsystem=Ubuntu 14.04.4 LTS, storagedriver=aufs
└ Error: (none)
└ UpdatedAt: 2016-04-29T16:14:55Z
Kernel Version: 3.13.0-83-generic
Operating System: linux
CPUs: 2
Total Memory: 1.005 GiB
Name: 96a237d545b4
> docker network ls
NETWORK ID NAME DRIVER
45a14a2a8a5a vagrant-ubuntu-trusty-64/host host
d7cd20ce6a9b vagrant-ubuntu-trusty-64/none null
c5d83e381f1f vagrant-ubuntu-trusty-64/host host
70e091519778 vagrant-ubuntu-trusty-64/bridge bridgeb
3208fb94f122 vagrant-ubuntu-trusty-64/bridge bridge
21602e11db6e vagrant-ubuntu-trusty-64/none null

As you can see this cluster has two healthy nodes and only the default networks created.

Before we get into the other details let’s briefly look at the current Ve interfaces on the Brocade VDX switch and see how that output will change based on the workflow.

Spine-198976# show ip int brief | inc Ve
Ve 10 10.1.1.21 default-vrf up up
Ve 20 20.1.1.21 default-vrf up up
Ve 30 30.1.1.21 default-vrf up up

These three existing Ve interfaces are for services outside of the docker deployment we’ll use in this example.

With the Swarm cluster in place I started exploring Docker’s event API. This is an HTTP based streaming API that notifies any subscribers of cluster-wide events. Things like container creation, network creation, etc get pushed to this API. I wrote a simple Sensor that subscribes to the event API and fires a Trigger when a new network is created:

from st2reactor.sensor.base import Sensor
import re
import json
import uuid
import requests
import ipaddress
def run(self):
# Values hardcoded for briefety of example, you should use
# self._config and put parameters to config.yaml in the pack.
r = requests.get('http://172.28.128.4:3376/events', stream=True)
key = "REPLACE WITH SWARM KEY"
for chunk in r.raw.read_chunked():
event = json.loads(chunk)
netwk_data = requests.get('http://172.28.128.4:3376/networks/%s' % event['Actor']['Attributes']['name'])
if event['Action'] == 'create':
netwk_data = json.loads(netwk_data.content)
vlan = re.findall('eth[0-9]+\.([0-9]+)', netwk_data['Options']['parent'])[0]
network = ipaddress.ip_network(netwk_data['IPAM']['Config'][0]['Subnet'])
data = dict(action=event['Action'],
rbridge="21",
subnet="%s/%s" % (network[1], network.prefixlen),
vlan=vlan,
channel="docker",
host="10.254.4.105",
username="admin",
password="password")
trigger = 'docker.NetworkEvent'
trace_tag = uuid.uuid4().hex
self._sensor_service.dispatch(trigger=trigger, payload=data,
trace_tag=trace_tag)

Next, I created an action that pushes the needed configuration changes to the VDX. The building blocks of it is VDX pack that I auto-generated from VDX YANG model – an interesting topic that I’ll save for a separate blog. The network reconfiguration action is a workflow that uses these building blocks, and looks like this:

---
name: network-trigger
pack: docker
description: Triggered when a docker network event happens.
enabled: true
trigger:
type: docker.NetworkEvent
criteria: {}
action:
parameters:
channel: '{{trigger.channel}}'
host: '{{trigger.host}}'
password: '{{trigger.password}}'
rbridge_id: '{{trigger.rbridge}}'
subnet: '{{trigger.subnet}}'
username: '{{trigger.username}}'
vlan: '{{trigger.vlan}}'

Lastly, I set up a Rule, to fire a workflow when the sensor fires a trigger:

version: '2.0'
docker.docker-network-tor:
description: Workflow to add TOR VLAN interfaces for docker MACVLAN Networks.
input:
- rbridge_id
- subnet
- vlan
- channel
- host
- username
- password
task-defaults:
on-error:
- notify_fail
tasks:
add_ve_interface:
action: vdx.interface_vlan_interface_vlan_vlan_name
input:
name: <% $.vlan %>
vlan_name: "Docker Network"
host: <% $.host %>
username: <% $.username %>
password: <% $.password %>
publish:
status_message: "Successfully added VE Interface"
on-success:
- add_global_ve_int
add_global_ve_int:
action: vdx.interface_vlan_interface_ve_gve_name
input:
gve_name: <% $.vlan %>
host: <% $.host %>
username: <% $.username %>
password: <% $.password %>
publish:
status_message: "Successfully added global VE Interface"
on-success:
- set_ve_ip
set_ve_ip:
action: vdx.rbridge_id_interface_ve_ip_ip_config_address_address
input:
rbridge_id: <% $.rbridge_id %>
name: <% $.vlan %>
address: <% $.subnet %>
host: <% $.host %>
username: <% $.username %>
password: <% $.password %>
publish:
status_message: "Successfully set VE IP Address"
on-success:
- no_shut_ve
no_shut_ve:
action: vdx.rbridge_id_interface_ve_shutdown
input:
rbridge_id: <% $.rbridge_id %>
name: <% $.vlan %>
delete_shutdown: True
host: <% $.host %>
username: <% $.username %>
password: <% $.password %>
publish:
status_message: "Successfully noshut VE Interface"
on-success:
- no_shut_ve_global
no_shut_ve_global:
action: vdx.interface_vlan_interface_ve_global_ve_shutdown
input:
gve_name: <% $.vlan %>
delete_global_ve_shutdown: True
host: <% $.host %>
username: <% $.username %>
password: <% $.password %>
publish:
status_message: "Successfully noshut VE Interface"
on-success:
- notify_success
notify_success:
action: chatops.post_message
input:
channel: <% $.channel %>
message: "Docker network has been created on TOR."
publish:
status_message: "Sent success message to chatops."
notify_fail:
action: chatops.post_message
input:
channel: <% $.channel %>
message: "Failed to create docker network on TOR."
publish:
status_message: "Sent fail message to chatops."

So let’s create a docker network and see what happens in StackStorm and Docker. Again, watch the video to see it in action. Or, follow alone:

> docker network create --driver=macvlan --subnet=172.16.112.0/24 -o parent=eth0.112 dockernetwork112
1d2154903275c95437506e0b714fbff826835c8da63bbfa08cfca8f8d0a8c188

The previous command tells the swarm cluster to create a new network named dockernetwork112 with the subnet of 172.16.112.0/24 and pinned to the host interface eth0.112.

Lets now list the networks in the swarm cluster.

> docker network ls
NETWORK ID NAME DRIVER
45a14a2a8a5a vagrant-ubuntu-trusty-64/host host
1d2154903275 vagrant-ubuntu-trusty-64/dockernetwork112 macvlan
70e091519778 vagrant-ubuntu-trusty-64/bridge bridge
d7cd20ce6a9b vagrant-ubuntu-trusty-64/none null
c5d83e381f1f vagrant-ubuntu-trusty-64/host host
3208fb94f122 vagrant-ubuntu-trusty-64/bridge bridge
21602e11db6e vagrant-ubuntu-trusty-64/none null

Unsurprisingly our new network is now listed and available to attach containers to. In a normal cluster this is where you would log into the physical network and create the related configuration for the network to work correctly. Since we’ve attached to the event API and have a StackStorm workflow waiting to create the network lets see what happened in the background once we created the network.

Let’s begin by showing you the execution that ran when the event API created a new network. (I got the hash value by running st2 execution list)

[email protected]:~$ st2 execution get 57238dae3520fd02f700565e
id: 57238dae3520fd02f700565e
action.ref: docker.docker-network-tor
parameters:
channel: docker
host: 10.254.4.105
password: password
rbridge_id: '21'
subnet: 172.16.112.1/24
username: admin
vlan: '112'
status: succeeded
start_timestamp: 2016-04-29T16:37:02.056217Z
end_timestamp: 2016-04-29T16:37:40.404525Z
+--------------------------+-----------+-------------------+------------------------------+-------------------------------+
| id | status | task | action | start_timestamp |
+--------------------------+-----------+-------------------+------------------------------+-------------------------------+
| 57238db03520fd062929960b | succeeded | add_ve_interface | vdx.interface_vlan_interface | Fri, 29 Apr 2016 16:37:04 UTC |
| | | | _vlan_vlan_name | |
| 57238db63520fd062929960d | succeeded | add_global_ve_int | vdx.interface_vlan_interface | Fri, 29 Apr 2016 16:37:10 UTC |
| | | | _ve_gve_name | |
| 57238dbc3520fd062929960f | succeeded | set_ve_ip | vdx.rbridge_id_interface_ve_ | Fri, 29 Apr 2016 16:37:16 UTC |
| | | | ip_ip_config_address_address | |
| 57238dc23520fd0629299611 | succeeded | no_shut_ve | vdx.rbridge_id_interface_ve_ | Fri, 29 Apr 2016 16:37:22 UTC |
| | | | shutdown | |
| 57238dc93520fd0629299613 | succeeded | no_shut_ve_global | vdx.interface_vlan_interface | Fri, 29 Apr 2016 16:37:29 UTC |
| | | | _ve_global_ve_shutdown | |
| 57238dcf3520fd0629299615 | succeeded | notify_success | chatops.post_message | Fri, 29 Apr 2016 16:37:35 UTC |
+--------------------------+-----------+-------------------+------------------------------+-------------------------------+

From the output you can see this workflow executed several actions. Creating a VE interface, creating the global VE interface, setting the IP address, no shutting the interface, and notify_success. We’ll come back to notify_success later. For now let’s look at the other actions which are all actions to send NETCONF formed XML to the VDX to have it create our configuration. If we log into the VDX again and use some show command we can see the additional configuration.

Spine-198976# show ip int brief | inc Ve
Ve 10 10.1.1.21 default-vrf up up
Ve 20 20.1.1.21 default-vrf up up
Ve 30 30.1.1.21 default-vrf up up
Ve 112 172.16.112.1 default-vrf up up
Spine-198976# show running-config rbridge-id 21 interface Ve 112
rbridge-id 21
interface Ve 112
ip proxy-arp
ip address 172.16.112.1/24
no shutdown
!
!

We can see here how StackStorm used the VDX action pack to push the needed configuration to the device. Reducing the time to delivery of the network and freeing up the network engineer to focus on more pressing issues.

The last piece of the workflow was the notify_success. This action sends a message into a slack channel which can notify network staff or operational staff of the net network’s availability.
This is a simple but powerful example of network automation. Enabling operational staff to capture their repetitive tasks as a workflows is an incredibly powerful concept, for network operators and devops alike. I’m personally excited to see Brocade doubling down on this event driven, workflow centric, community approach to operations. Expect to see us deliver more network automation capabilities like this over the next few months. Don’t stop there though: Use your imagination and think about how you can tie the network into your workflows!