Using New Ansible Utilities for Operational State Management and Remediation
Using New Ansible Utilities for Operational State Management and Remediation
Comparing the current operational state of your IT infrastructure to your desired state is a common use case for IT automation. This allows automation users to identify drift or problem scenarios to take corrective actions and even proactively identify and solve problems. This blog post will walk through the automation workflow for validation of operational state and even automatic remediation of issues.
We will demonstrate how the Red Hat supported and certified Ansible content can be used to:
- Collect the current operational state from the remote host and convert it into normalised structure data.
- Define the desired state criteria in a standard based format that can be used across enterprise infrastructure teams.
- Validate the current state data against the pre-defined criteria to identify if there is any deviation.
- Take corrective remediation action as required.
- Validate input data as per the data model schema
Gathering state data from a remote host
The recently released ansible.utils version 1.0.0 Collection has added support for ansible.utils.cli_parse module, which converts text data into structured JSON format. The module has the capability to either execute the command on the remote endpoint and fetch the text response, or read the text from a file on the control node to convert it into structured data. This module can work with both traditional Linux servers as well as vendor appliances, such as network devices that don't have the ability to execute Python, and the module relies on well-known text parser libraries for this conversion. The current supported CLI parser sub plugin engines are as below:
- ansible.utils.textfsm Uses textfsm python library
- ansible.utils.ttp Uses ttp python library
- ansible.netcommon.native Uses netcommon inbuilt parser engine
- ansible.netcommon.ntc_templates Uses ntc_templates python library
- ansible.netcommon.pyats Uses pyats python library
- ansible.utils.xmlUses xmltodict python library
- ansible.utils.json
The examples described in this blog uses Cisco network switch, NXOS version 7.3(0)D1(1), as the remote endpoint and Ansible version 2.9.15 running on the control node and requires ansible.utils, ansible.netcommon and cisco.nxos Collections to be installed on the control node.
The below Ansible snippet fetches the operational state of the interfaces and translates it to structured data using ansible.netcommon.pyats parser. This parse requires pyats library to be installed on the control node.
--- - hosts: nxos connection: ansible.netcommon.network_cli gather_facts: false vars: ansible_network_os: cisco.nxos.nxos ansible_user: "changeme" ansible_password: "changeme" tasks: - name: "Fetch interface state and parse with pyats" ansible.utils.cli_parse: command: show interface parser: name: ansible.netcommon.pyats register: nxos_pyats_show_interface - name: print structured interface state data ansible.builtin.debug: msg: "{{ nxos_pyats_show_interface['parsed'] }}"
The value of the command option in ansible.utils.cli_parse task is the command that should the executed on the remote host, alternatively, the task can accept a text option that accepts the value directly in string format and can be used in case the response of the command is already prefetched. The name option under the parser parent option can be any one of the above-discussed parser sub plugins.
After running the playbook, the output of ansible.utils.cli_parse task for the given host is as shown for reference:
ok: [nxos] => { "changed": false, "parsed": { "Ethernet2/1": { "admin_state": "down", "auto_mdix": "off", "auto_negotiate": false, "bandwidth": 1000000, "beacon": "off" <--snip--> }, "Ethernet2/10": { "admin_state": "down", "auto_mdix": "off", "auto_negotiate": false, "bandwidth": 1000000, "beacon": "off", <--snip--> } }
Notice the value of admin_state key for some of the interfaces is down, for the complete output refer here.
Defining state criteria and validation
The ansible.utils Collection has added support for the ansible.utils.validate module, which validates the input JSON data with the provided criteria based on the validation engine. The currently supported validation engine is jsonschema, and in future support for additional validation, engines will be added on a need basis.
In the above section, we fetched the interface state and converted to structured JSON data. Suppose if we want the desired admin state for all the interfaces to always be in up state the criteria for jsonschema will look like:
$cat criterias/nxos_show_interface_admin_criteria.json { "type" : "object", "patternProperties": { "^.*": { "type": "object", "properties": { "admin_state": { "type": "string", "pattern": "up" } } } }
After the criteria for the desired state of the resource is defined, it can be used with the ansible.utils.validate module to check if the current state of the resource matches with the desired state as shown in the below task.
- name: validate interface for admin state ansible.utils.validate: data: "{{ nxos_pyats_show_interface['parsed'] }}" criteria: - "{{ lookup('file', './criterias/nxos_show_interface_admin_criteria.json') | from_json }}" engine: ansible.utils.jsonschema ignore_errors: true register: result - name: print the interface names that does not satisfy the desired state ansible.builtin.debug: msg: "{{ item['data_path'].split('.')[0] }}" loop: "{{ result['errors'] }}" when: "'errors' in result"
The data option of ansible.utils.validate task accepts a JSON value and in this case, it is the output parsed from ansible.utils.cli_parse module as discussed above. The value of engine option is the sub plugin name of the validate module that is ansible.utils.jsonschema, and it identifies the underlying validation library to be used; in this case, we are using jsonschema library. The value of the criteria option can be a list and should be in a format that is defined by the validation engine used. For the above to run jsonschema, installing a library is required on the control node. The output of the above task run will be a list of errors indicating interfaces that do not have admin value in up state. The reference output can be seen here (note: the output will vary based on the state of the interfaces on the remote host).
TASK [validate interface for admin state] *********************************************************************************************************** fatal: [nxos02]: FAILED! => {"changed": false, "errors": [{"data_path": "Ethernet2/1.admin_state", "expected": "up", "found": "down", "json_path": "$.Ethernet2/1.admin_state", "message": "'down' does not match 'up'", "relative_schema": {"pattern": "up", "type": "string"}, "schema_path": "patternProperties.^.*.properties.admin_state.pattern", "validator": "pattern"}, {"data_path": "Ethernet2/10.admin_state", "expected": "up", "found": "down", "json_path": "$.Ethernet2/10.admin_state", "message": "'down' does not match 'up'", "relative_schema": {"pattern": "up", "type": "string"}, "schema_path": "patternProperties.^.*.properties.admin_state.pattern", "validator": "pattern"}], "msg": "Validation errors were found.\nAt 'patternProperties.^.*.properties.admin_state.pattern' 'down' does not match 'up'. \nAt 'patternProperties.^.*.properties.admin_state.pattern' 'down' does not match 'up'. \nAt 'patternProperties.^.*.properties.admin_state.pattern' 'down' does not match 'up'. "} ...ignoring
TASK [print the interface names that does not satisfy the desired state] **************************************************************************** Monday 14 December 2020 11:05:38 +0530 (0:00:01.661) 0:00:28.676 ******* ok: [nxos] => { "msg": "Ethernet2/1" } ok: [nxos] => { "msg": "Ethernet2/10" } PLAY RECAP ****************************************************************************************************************************************** nxos : ok=4 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=1
As seen from the above output, the interface Ethernet2/1 and Ethernet2/10 are not in the desired state as per the defined criteria.
Remediation
Based on the output of the ansible.utils.validate task, a number of remediation actions can be taken using Ansible modules for configuration management and/or reporting. In our case, we will be using the cisco.nxos.nxos_interfaces resource module to configure the given interfaces in admin up state as shown in the below snippet.
- name: Configure interface with drift in admin up state cisco.nxos.nxos_interfaces: config: - name: "{{ item['data_path'].split('.')[0] }}" enabled: true loop: "{{ result['errors'] }}" when: "'errors' in result"
This remediation task will be executed only when the validation from the earlier task fails and will run only for those interfaces whose admin state is not up.
Data validation
It is often required to validate the data before giving it as an input to the task to ensure the input data structure is per the expected data model. This allows us to validate data model adherence prior to pushing configuration to the network device. This use case is explained in the data validation blog from Ivan Pepelnjak.
The blog uses command-line tools to validate the input data, however with the support of the ansible.utils.validate module, this functionality can now be added in the Ansible Playbook itself as shown in the below snippet.
- name: validate bgp data data with jsonschema bgp model criteria ansible.utils.validate: data: "{{ hostvars }}" criteria: - "{{ lookup('file', './bgp_data_model_criteria.json') | from_json }}" engine: ansible.utils.jsonschema register: result
The criteria structure stored in bgp_data_model_criteria.json file
locally can be referred
here
(modified example from the original blog
post) and the
sample host_vars
file as below:
$cat host_vars/nxos.yaml --- bgp_as: 0 description: Unexpected
The output of the above task run can be seen as below:
TASK [validate bgp data data with jsonschema bgp model criteria] ******************************************************************************************* fatal: [nxos]: FAILED! => {"changed": false, "errors": [{"data_path": "nxos.bgp_as", "expected": 1, "found": 0, "json_path": "$.nxos.bgp_as", "message": "0 is less than the minimum of 1", "relative_schema": {"maximum": 65535, "minimum": 1, "type": "number"}, "schema_path": "patternProperties..*.properties.bgp_as.minimum", "validator": "minimum"}], "msg": "Validation errors were found.\nAt 'patternProperties..*.properties.bgp_as.minimum' 0 is less than the minimum of 1. "}