Node Pattern
Node pattern is a DSL to help find specific nodes in the Abstract Syntax Tree using a simple string.
It reminds the simplicity of regular expressions but used to find specific nodes of Ruby code.
History
The Node Pattern was introduced by Alex Dowad and solves a problem that RuboCop contributors were facing for a long time:
-
Ability to declaratively define rules for node search, matching, and capture.
The code below belongs to Style/ArrayJoin
cop and it’s in favor of Array#join
over Array#*
. Then it tries to find
code like %w(one two three) * ", "
and suggest to use #join
instead.
It can also be an array of integers, and the code doesn’t check it. However, it checks if the argument sent is a string.
def on_send(node)
receiver_node, method_name, *arg_nodes = *node
return unless receiver_node && receiver_node.array_type? &&
method_name == :* && arg_nodes.first.str_type?
add_offense(node, location: :selector)
end
This code was replaced in the cop defining a new matcher that does the same as the code above:
def_node_matcher :join_candidate?, '(send $array :* $str)'
And the on_send
method is simplified to a method usage:
def on_send(node)
join_candidate?(node) { add_offense(node, location: :selector) }
end
Ruby Abstract Syntax Tree (AST)
Parser translates Ruby source code to a tree structure represented in text.
A simple integer literal like 1
is represented by (int 1)
in the AST.
A method call with two integer literals:
foo(1, 2)
is represented with:
(send nil :foo
(int 1)
(int 2)
)
Every node is represented with a sequence. The first element is the node type. Other elements are the children. They are optionally present and depend on the node type. E.g.:
-
nil
is just(nil)
-
1
is(int 1)
-
[1]
is(array (int 1))
-
[1, 2]
is(array (int 1) (int 2))
-
foo
is(send nil :foo)
-
foo(1)
is(send nil :foo (int 1))
Getting the AST representation
From the command-line with ruby-parse
$ ruby-parse --legacy -e 'foo(1)'
(send nil :foo
(int 1))
Use the --legacy ruby-parse flag to get the same AST that RuboCop AST returns.
There are several differences, e.g. without --legacy , foo(a: 1) would return kwargs , and with --legacy it returns hash .
|
Basic Node Pattern Structure
The simplest Node Pattern would match just the node type.
E.g. the int
node pattern would match the (int 1)
AST (literal 1
in Ruby code).
More sophisticated node patterns match more than one child.
(
and )
to Match Elements
Several matchers surrounded by parentheses would match a node with elements each matching a corresponding matcher, order-dependently.
Ruby code with an array with two integer literals, [1, 2]
represented in AST as (array (int 1) (int 2))
could be matched with (array int int)
node pattern.
For a literal integer, e.g. 1
Ruby code represented by (int 1)
in AST:
-
int
node pattern will match exactly the node, looking only the node type -
(int 1)
node pattern will match precisely the node -
(int 2)
node pattern will not match
(
and )
for Nested Matching
Ruby code with a method call with two integer literals as arguments, foo(1, 2)
represented in AST as (send nil :foo (int 1) (int 2))
could be matched with (send nil? :foo int int)
node pattern.
To match just those method calls where the first argument is a literal 1
, use (send nil? :foo (int 1) int)
.
Any child that is a node can be a target for nested matching.
_
for any single node
_
will check if there’s something present in the specific position, no matter the
value:
-
(int _)
will match any number -
(int _ _)
will not match becauseint
types have just one child that contains the value.
...
for several subsequent nodes
Where _
matches any single node, ...
matches any number of nodes.
Say for example you want to find instances of calls to the method sum
with any
number of arguments, be it sum(1, 2)
or sum(1, 2, 3, n)
.
First, let’s check how it looks like in the AST:
$ ruby-parse -e 'sum(1, 2)'
(send nil :sum
(int 1)
(int 2))
Or with more children:
$ ruby-parse -e 'sum(1, 2, 3, n)'
(send nil :sum
(int 1)
(int 2)
(int 3)
(send nil :n))
The following expression would only match a call with 2 arguments:
(send nil? :sum _ _)
Instead, the following expression will any number of arguments (and thus both examples above):
(send nil? :sum ...)
Note that ...
can be appear anywhere in a sequence, for example (send nil? :sum ... int)
would no longer match the second example, as the last argument is not an integer.
Nesting ...
is also supported; the only limitation is that ...
and
other "variable length" patterns can only appear once within a sequence.
For example (send ... :sum ...)
is not supported.
*
, +
, ?
for repetitions
Another way to handle a variable number of nodes is by using *
, +
, ?
to signify
a particular pattern should match any number of times, at least once and at most once respectively.
Following on the previous example, to find sums of integer literals, we could use:
(send nil? :sum int*)
This would match our first example sum(1, 2)
but not the other sum(1, 2, 3, n)
This pattern would also match a call to sum
without any argument, which might not be desirable.
Using +
would insure that only sums with at least one argument would be matched.
(send nil? :sum int+)
The ?
can limit the match only 0 or 1 nodes.
The following example would match any sum of three integer literals
optionally followed by a method call:
(send nil? :sum int int int send ?)
Note that we have to put a space between send
and ?
,
since send?
would be considered as a predicate (described below).
<>
for match in any order
You may not care about the exact order of the nodes you want to match. In this case you can put the nodes without brackets:
(send nil? :sum <(int 2) int>)
This will match our first example (sum(1, 2)
).
It won’t match our second example though, as it specifies that there must be
exactly two arguments to the method call sum
.
You can add ...
before the closing bracket to allow for additional parameters:
(send nil? :sum <(int 2) int ...>)
This will match both our examples, but not sum(1.0, 2)
or sum(2)
,
since the first node in the brackets is found, but not the second (int
).
{}
for "OR" (union)
Lets make it a bit more complex and introduce floats:
$ ruby-parse -e '1'
(int 1)
$ ruby-parse -e '1.0'
(float 1.0)
-
({int | float} _)
- int or float types, no matter the value
Branches of the union can contain more than one term:
-
(array {int int | range})
- matches an array with two integers or a single range element
If all the branches have a single term, you can omit the |
, so {int | float}
can be
simplified to {int float}
.
When checking for symbols or string, you can use regexp literals for a similar effect:
(send _ /to_s|inspect/) # => matches calls to `to_s` or `inspect`
[]
for "AND"
Imagine you want to check if the number is odd?
and also positive numbers:
(int [odd? positive?])
- is an int and the value should be odd and positive.
Refer to Predicate methods to see how odd? works.
|
!
for Negation
Node pattern (send nil? :sum !int _)
would match a sum
call where the first argument is not a literal integer.
E.g.:
-
it will match
sum(2.0, 3)
, as the first argument is of afloat
type -
it will not match
sum(2, 3)
, as the first argument is of anint
type
Negation operator works with other node pattern syntax elements, {} , [] , () , $ , but only with those that target a single element. E.g. $!(int 1) , !{false nil} , ![#positive? #even?] will work, while !{int int | sym} , !{int int | sym sym} , and any use of <> won’t.
|
$
for captures
You can capture elements or nodes along with your search, prefixing the expression
with $
. For example, in a tuple like (int 1)
, you can capture the value using (int $_)
.
You can also capture multiple things like:
(${int float} $_)
The tuple can be entirely captured using the $
before the open parens:
$({int float} _)
Or remove the parens and match directly from node head:
${int float}
All variable length patterns (...
, *
, +
, ?
, <>
) are captured as arrays.
The following pattern will have two captures, both arrays:
(send nil? $int+ (send $...))
^
for parent
One may use the ^
character to check against a parent.
For example, the following pattern would find any node with two children and with a parent that is a hash:
(^hash _key $_value)
It is possible to use ^
somewhere else than the head of a sequence; in that
case it is relative to that child (i.e. the current node). One case also use
multiple ^
to go up multiple levels.
For example, the previous example is basically the same as:
(pair ^^hash $_value)
`
for descendants
The `
character can be used to search a node and all its descendants.
For example if looking for a return
statement anywhere within a method definition,
we can write:
(def _method_name _args `return)
This would match both of these methods foo
and bar
, even though
these return
for foo
and bar
are not at the same level.
def foo # (def :foo return 42 # (args) end # (return # (int 42))) def bar # (def :bar return 42 if foo # (args) nil # (begin end # (if # (send nil :foo) # (return # (int 42)) nil) # (nil)))
Predicate methods
Words which end with a ?
are predicate methods, are called on the target
to see if it matches any Ruby method which the matched object supports can be
used.
Example:
-
int_type?
can be used herein replacement of(int _)
.
And refactoring the expression to allow both int or float types:
-
{int_type? float_type?}
can be used herein replacement of({int float} _)
You can also use it at the node level, asking for each child:
-
(int odd?)
will match only with odd numbers, asking it to the current number.
#
to call functions
Sometimes, we want to add extra logic. Let’s imagine we’re searching for prime numbers, so we have a method to detect it:
def prime?(n)
if n <= 1
false
elsif n == 2
true
else
(2..n/2).none? { |i| n % i == 0 }
end
end
We can use the #prime?
function directly in the expression:
(int #prime?)
You may call a method on a constant too. Let’s say you define:
module Util
def self.palindrome?(str)
str == str.reverse
end
end
You can refer to it like this:
(str #Util.palindrome?)
Arguments for predicate and function calls
Arguments can be passed to predicates and function calls, like literals, parameters:
def divisible_by?(value, divisor)
value % divisor == 0
end
Example patterns using this function:
(int #divisible_by?(42)) (send (int _value) :+ (int #divisible_by?(_value))
The arguments can be pattern themselves, in which case a matcher responding to ===
will be passed. This makes patterns composable:
def_node_matcher :global_const?, '(const {nil? cbase} %1)'
def_node_matcher :class_creator, '(send #global_const?({:Class :Module}) :new ...)'
Using node matcher macros
The RuboCop base includes two useful methods to use the node pattern with Ruby in a simple way. You can use the macros to define methods. The basics are def_node_matcher and def_node_search.
When you define a pattern, it creates a method that accepts a node and tries to match.
Lets create an example where we’re trying to find the symbols user
and
current_user
in expressions like: user: current_user
or
current_user: User.first
, so the objective here is pick all keys:
$ ruby-parse -e ':current_user'
(sym :current_user)
$ ruby-parse -e ':user'
(sym :user)
$ ruby-parse -e '{ user: current_user }'
(hash
(pair
(sym :user)
(send nil :current_user)))
Our minimal matcher can get it in the simple node sym
:
def_node_matcher :user_symbol?, '(sym {:current_user :user})'
Composing complex expressions with multiple matchers
Now let’s go deeply combining the previous expression and also match if the current symbol is being called from an initialization method, like:
$ ruby-parse --legacy -e 'Comment.new(user: current_user)'
(send
(const nil :Comment) :new
(hash
(pair
(sym :user)
(send nil :current_user))))
And we can also reuse this and check if it’s a constructor:
def_node_matcher :initializing_with_user?, <<~PATTERN
(send _ :new (hash (pair #user_symbol?)))
PATTERN
%
for arguments
Arguments can be passed to matchers, either as external method arguments, or to be used to compare elements. An example of method argument:
def multiple_of?(n, factor)
n % factor == 0
end
def_node_matcher :int_node_multiple?, '(int #multiple_of?(%1))'
# ...
int_node_multiple?(node, 10) # => true if node is an 'int' node with a multiple of 10
Arguments can be used to match nodes directly:
def_node_matcher :has_sensitive_data?, '(hash <(pair (_ %1) $_) ...>)'
# ...
has_sensitive_data?(node, :password) # => true if node is a hash with a key +:password+
# matching uses ===, so to match strings or symbols, 'pass' or 'password' one can:
has_sensitive_data?(node, /^pass(word)?$/i)
# one can also pass lambdas...
has_sensitive_data?(node, ->(key) { # return true or false depending on key })
Array#=== will never match a single node element (so don’t pass arrays),
but Set#=== is an alias to Set#include? (Ruby 2.5+ only), and so can be
very useful to match within many possible literals / Nodes.
|
%param_name
for named parameters
Arguments can be passed as named parameters. They will be matched using ===
(see %
above).
Contrary to positional arguments, defaults values can be passed to
def_node_matcher
and def_node_search
:
def_node_matcher :interesting_call?, '(send _ %method ...)',
method: Set[:transform_values, :transform_keys,
:transform_values!, :transform_keys!,
:to_h].freeze
# Usage:
interesting_call?(node) # use the default methods
interesting_call?(node, method: /^transform/) # match anything starting with 'transform'
Named parameters as arguments to custom methods are also supported.
CONST
or %CONST
for constants
Constants can be included in patterns. They will be matched using ===
, so
Regexp / Set / Proc can be used in addition to literals and Nodes:
SOME_CALLS = Set[:transform_values, :transform_keys,
:transform_values!, :transform_keys!,
:to_h].freeze
def_node_matcher :interesting_call?, '(send _ SOME_CALLS ...)'
Constants as arguments to custom methods are also supported.
Comments
You may have comments in node patterns at the end of lines
by preceding them with '# '
:
def_node_matcher :complex_stuff, <<~PATTERN
(send
{#global_const?(:Kernel) nil?} # check for explicit call like Kernel.p too
{:p :pp} # let's consider `pp` also
$... # capture all arguments
)
PATTERN
nil
or nil?
Take a special attention to nil behavior:
$ ruby-parse -e 'nil'
(nil)
In this case, the nil
implicit matches with expressions like: nil
, (nil)
, or nil_type?
.
But, nil is also used to represent a call from nothing
from a simple method call:
$ ruby-parse -e 'method'
(send nil :method)
Then, for such case you can use the predicate nil?
. And the code can be
matched with an expression like:
(send nil? :method)
More resources
Curious about how it works?
Check more details in the documentation or browse the source code directly. It’s easy to read and hack on.
The specs are also very useful to comprehend each feature.