How to avoid unnoticed failed Flows
March 15, 2024
Has it ever happened to you that a published flow failed in production but nobody noticed until an end user came in and told you by opening an incident? I saw a case like this after a small update to a flow. The developer claimed it felt so easy. No super testing is needed (yes, automated test would help here!). Just deploy it. After two days, an incident came in that the requests were not fulfilled. Suddenly, it was found out that Flows ended up in an error and nobody knew about it.
Fortunately, ServiceNow brings a “Flow error handler” that allows developers to catch errors in the flow and do something with it. That is cool! But how to ensure that flow developers do not forget to use it?
The answer might be Instance Scan and a new Scan Check!
Let’s build a new Scan Check that can find all Flows without an error handler.
Attention! The following part is quite technical. Scroll down if you want to download the solution without reading the implementation details.
Firstly, we need to reverse engineer where the data is stored as Flow Designer is visual, but our Scan Check must be somehow based on real system tables. I found this:
- Flow with error handling enabled has a “Flow Logic Instance” [sys_hub_flow_logic] record with flow definition “Top level catch”.
- If the error handling is enabled, there is a “Variable Value” [sys_variable_value] related to the above-mentioned “Flow Logic Instance” and to the variable “enabled” with value 1.
Based on that we can write a script within a Table Check based on the Flow table [sys_hub_flow]. The reason why to do it on Flow table is that we want to see the “Flow” reference in the finding record.
Let’s set “Conditions” to run only against active Flows. Additionally, I have decided to check only Flow type “Flow” and not “Subflow”. I might change this if you would like to enforce error handling in subflows as well. I feel that catching errors in Flows is good starting point.
Here is the script that I wrote for the check (tested in Waschington DC release version).
(function(engine) {
// Find if there is a Top Level Catch flow logic existing
var catchFlowLogic = getCatchFlowLogic(current);
if (catchFlowLogic) {
// Once we have a Catch flow logic, check if the "enabled" value is 1 = true
if (!isCatchVariableEnabled(catchFlowLogic)) {
// If it is not 1, then it is not enabled and it is a finding
createFinding(finding, current);
}
} else {
// If there is none, it is a finding as there is not Catch flow logic
createFinding(finding, current);
}
})(engine);
function createFinding(finding, current) {
finding.setCurrentSource(current);
finding.increment();
}
/**
* Returns Catch Flow Logic for the Flow.
*
* @param flowGR {GlideRecord} Flow for which Catch flow logic is returned
* @returns {GlideRecord or boolean} Catch flow logic or false if not found
*/
function getCatchFlowLogic(flowGR) {
var TOP_LEVEL_CATCH_FLOW_LOGIC_ID = "35d60003e6022010a5e40cdd1254dd23";
var flowLogic = new GlideRecord("sys_hub_flow_logic");
flowLogic.addQuery("flow", flowGR.sys_id);
flowLogic.addQuery("logic_definition", TOP_LEVEL_CATCH_FLOW_LOGIC_ID);
flowLogic.setLimit(1);
flowLogic.query();
if (flowLogic.next()) {
return flowLogic;
}
return false;
}
/**
* Returns Checks if the Catch Flow Logic is enabled.
*
* @param flowLogicGR {GlideRecord} Catch Flow Logic that is checked
* @returns {boolean} true if catch logic is enabled
*/
function isCatchVariableEnabled(flowLogicGR) {
var CATCH_ENABLED_VARIABLE_ID = "dd70187cc36e20105553b740ad40dddc";
var variableValue = new GlideRecord('sys_variable_value');
variableValue.addQuery('variable', CATCH_ENABLED_VARIABLE_ID);
variableValue.addQuery('document_key', flowLogicGR.getUniqueValue());
variableValue.addQuery("value", 1);
variableValue.setLimit(1);
variableValue.query();
return variableValue.hasNext();
}
The script finds out if error handling catch block is enabled. Be aware that it does not validate what is inside of it.
How to use it?
After you import the Scan Check to your Instance Scan, your developers and admins will receive a Scan Finding when scanning their update sets in case there is a Flow without error handling in it.
Also, new findings will be created when you run full instance scan. You can find them in Instance Scan results.
What would I recommend additionally?
- Make your Scans mandatory when completing update sets, so that developers notice the findings in non-prod instances!
- Standardize your flow error handler approach. Create a subflow that you can reuse as a standard in your Flows. Decide if you prefer to receive an email notification, create an incident, or do something else when a Flow fails.
The resulting Scan Check can be downloaded for free here (test first in non-prod, use at your risk). Tested in Washington release version:
- XML - use import XML to bring it to your instance.
- Update Set - use import of update set.
Additional resources: