Validating Path Expressions In Step Functions
I created asl-path-validator to define the grammar for JSONPath expressions supported by AWS Step Functions. The library validates Path, Reference Path, and Payload Template expressions in Step Functions. The Amazon States Language (ASL) uses JSONPath for its data mapping expressions and flow control but doesn’t provide a grammar for this language. There is a reference to a Java library for the syntax but the referenced library provides additional functions and operators not supported by Step Functions.
The grammar codifies the rules from the spec in a format we can leverage for a validating parser.
Expression Types
Type | Description | Rules |
---|---|---|
Path | Selects one or more nodes. | Will illustrate with examples below |
Reference Path | A valid Path expression that MUST select a single node. | Operators selecting multiple nodes are not permitted |
Payload Template | A JSON object or array where all keys ending with .$ are evaluated as Path expressions or an Intrinsic Function | See table below for a list of Intrinsic Functions |
Expression Features
Feature | Path | Reference Path | Payload Template |
---|---|---|---|
Simple dot notation or single predicate notation$.library.movies
|
|||
Use of operators that select multiple nodes via descent, wildcard, or a filter.. @ , : ? *
|
|||
Intrinsic FunctionsStates.JsonToString($.foo) See below for the supported functions |
Examples
The spec contains examples for Reference Paths but not many useful examples for other types. Part of the exercise of writing the grammar was figuring out what works in the AWS Data flow simulator and deployed Step Functions.
Expression | Path | Reference Path | Payload Template |
---|---|---|---|
$.store.book |
|||
$.store\.book |
|||
$.\stor\e.boo\k |
|||
$.store.book.title |
|||
$.foo.\.bar |
|||
$.foo\@bar.baz\[\[.\?pretty |
|||
$.&Ж中.\uD800\uDF46 |
|||
$.ledgers.branch[0].pending.count |
|||
$.ledgers.branch[0] |
|||
$.ledgers[0][22][315 ].foo |
|||
$['store']['book'] |
|||
$['store'][0]['book'] |
|||
States.Format('Welcome to {} {}s playlist.', $$, $.lastName) |
|||
$[(@.length-1)].bar |
|||
$.library.movies[?(@.genre)] |
|||
$.library.movies[?(@.year == 1992)] |
|||
$.library.movies[0:2] |
|||
$.library.movies[0,1,2,3] |
|||
$..director |
|||
$.fooList[1:] |
|||
$.store.book[*].author |
|||
$.store.* |
|||
$..* |
|||
$.book[-2] |
|||
$.book[-2:] |
|||
$.book[?(@.price <= $['expensive'])] |
|||
$.book[?(@.author =~ /.*REES/i)] |
|||
..book.length() |
Expressions that don’t work in any context
Note that the table above contains a number of examples that don’t work in any context. This testing was done with the AWS Data Flow tool and adhoc step functions. These expressions come from the Java library referenced by both the Amazon States Language and AWS Step Function documentation. If an expression syntax isn’t supported in any context then I opted to not support it in the grammar. For example, the relational operators> >= == < <=
only work with numeric values, so the parser emits errors with a non-numeric operand.
Context Expressions
A Context Expression is a Path expression that starts with $$
. This uses the process’s Context Object as the
document to evaluate against as opposed to the state’s data.
Reference from the spec:
When a Path begins with “$$”, two dollar signs, this signals that it is intended to identify content within the Context Object. The first dollar sign is stripped, and the remaining text, which begins with a dollar sign, is interpreted as the JSONPath applying to the Context Object.
Intrinsic Functions
Name | Arguments | Comments |
---|---|---|
States.Array | 0+ | arguments MAY contain one or more Path values |
States.Format | 1+ | arguments MAY contain one or more Path values |
States.JsonToString | 1 | argument MUST be a Path |
States.StringToJson | 1 | argument MAY be a Path |
The grammar currently limits the Intrinsic Functions to those listed above. The ASL spec allows for extension functions but doesn’t describe how they are made known to the system.
How this is done
- Update schema definitions to use format for expression fields.
- Use patternProperties to validate fields ending in
.$
- Generate a parser from a PEG grammar to parse the expression
- Include additional validation for rules not encoded in the schema
Adding AJV Formats to the schemas
JSON Schema provides a format field for string
types to provide additional semantics for validation. There are a few
built in formats like date-time
or uuid
and AJV permits registering new names and validator functions. The
asl-path-validator registers three new formats with the following names:
Format | Description |
---|---|
asl_path | Field must be a Path expression |
asl_ref_path | Field must be a ** Reference Path** expression |
asl_payload_template | Field must be a Payload Template expression |
The above formats are added to the relevant types in the schema. For example:
Before
{
"OutputPath": {
"type": "string"
}
}
After
{
"OutputPath": {
"type": "string",
"format": "asl_path"
}
}
Recursive type for Payload Template
There are three rules for a Payload Template:
- if the JSON field ends in
.$
then it MUST be a Path or Intrinsic Function - if the JSON field doesn’t end in
.$
then it MUST be a scalar type OR another Payload Template - if the field is an array, all items MUST be valid Payload Template
Encoding these rules in JSON Schema delegates the traversal logic to AJV and avoids having to make multiple calls to look for nodes to validate.
The three rules above are implemented using oneOf
and patternProperties
.
{
"asl_payload_template": {
"oneOf": [
{
"type": "object",
"patternProperties": {
"^.+\\.\\$$": {
"$comment": "matches fields ending in .$",
"type": "string",
"nullable": true,
"format": "asl_payload_template"
},
"^.+(([^.][^$])|([^.][$]))$": {
"$comment": "matches fields NOT ending in .$",
"oneOf": [
{
"type": [
"number",
"boolean",
"string",
"null"
]
},
{
"type": "array",
"items": {
"$ref": "#/definitions/asl_payload_template"
}
},
{
"$ref": "#/definitions/asl_payload_template"
}
]
}
}
},
{
"type": "array",
"items": {
"$ref": "#/definitions/asl_payload_template"
}
}
]
}
}
Writing the grammar and generating a parser
Peggy provides a simple language and nice online sandbox to test the grammar against inputs. A single grammar is defined to parse all valid expression types. The parser emits an Abstract Syntax Tree (AST) if the input is valid. Additional traversals of the AST are performed to enforce any rules specific to the context (i.e. Reference Path operator limits or Intrinsic Function use).
Since we’re not evaluating the expressions, the AST only needs to capture a small amount of information about the expression. We only need to record use of specific operators or functions.
For example, the CURRENT_VALUE
operator @
is used for filtering nodes is not allowed in Reference Paths.
In the example below, the parser matches on the CURRENT_VALUE
token followed by an optional subscript. The AST for
this match records the node as @
and whatever the subscript value was.
jsonpath_
= CONTEXT_ROOT_VALUE sub:subscript? {return {node: "$$", sub}}
/ ROOT_VALUE sub:subscript? {return {node: "$", sub}}
/ CURRENT_VALUE sub:subscript? {return {node: "@", sub}}
/ intrinsic_function
Additional validation
If the parser is able to produce an AST without any errors, then the AST is checked to see if it contains any invalid operators or functions.
The AST nodes include hints to describe the nature of the operation. All the checks could be done in a single expression, but they’re split for better error reporting.
Conclusion
- The grammar describes the syntax for the subset of JSONPath expressions allowed in Step Functions.
- The JSON schemas in asl-validator use a custom format to identify the type of expression required.
- The parser produces an AST if the expression is valid.
- Regular schema validation through AJV will call our validator on each string with one of our known format values and report back if string was invalid.