Getting Started

Step 1: Identify the Pattern to Match

First, identify the instances of code that need replacement. For example, consider a Python logging statement that needs to be updated:

Original Code
print("Error occurred in the application")

Step 2: Write a Match Template Using Recognized Syntax

Create a match template to capture the relevant code segment. Use a hole (:[hole_name]) to represent the dynamic part of the code you wish to match:

Match Rule
print(:[message])

In this template, :[message] captures the argument passed to the print function. The name message is arbitrary and can be replaced with any valid identifier.

Step 3: Create a Rewrite Template

Construct a rewrite template to specify how the matched code should be transformed. For example, to replace a print statement with a structured logging call:

Rewrite Rule
logger.error("Error: %s", :[message])

:[message] will be replaced by the content matched in the match template.

Step 4: Test the Rule

Apply the match and rewrite templates to transform the code.

Original Code
print("Error occurred in the application")
Resulting Code
logger.error("Error: %s", "Error occurred in the application")

This demonstrates the process of refactoring code using match and rewrite templates.

Go Pro with Matching

Combining Regular Expressions with Structural Matching

Embed regular expressions within holes using :[hole\~regex] for pattern specificity within a structured matching framework.

Matching a function call with a numeric argument in case like :\[fn~\w+\](:[arg\~\d+]), matches foo(404), but not bar(not_a_number) because :[arg~\d+] specifies a sequence of digits.

Advanced Syntax Reference

The syntax below has special meaning for matching. Bind match contents to identifiers like hole using Named Match syntax. Using names is useful when replacing contents or writing rules. To just match patterns without giving a meaningful name, use any of the Just Match syntax.

Named MatchJust MatchDescription
:[var]...

:[_]
Match zero or more characters in a lazy fashion. When used is inside delimiters, as in {:[v1], :[v2]} or (:[v]), holes match within that group or code block, including newlines. Holes outside of delimiters stop matching at a newline, or the start of a code block, whichever comes first.
:[var~regex]:[~regex]Match an arbitrary PCRE regular expression regex. Avoid regular expressions that match special syntax like ) or .*, otherwise your pattern may fail to match balanced blocks.
:[[var]]:[~\w+]

:[[_]]
Match one or more alphanumeric characters and _.
:[var:e]:[_:e]Expression-like syntax matches contiguous non-whitespace characters like foo or foo.bar, as well as contiguous character sequences that include valid code block structures like balanced parentheses in function(foo, bar) (notice how whitespace is allowed inside the parentheses). Language-dependent.
:[var.]:[_.]Match one or more alphanumeric characters and punctuation like ., ;, and - that do not affect balanced syntax. Language dependent.
:[var\n]:[~.*\n]

:[_\n]
Match zero or more characters up to a newline, including the newline.
:[ var]:[var~[ \t]+]

:[ ]
Match only whitespace characters, excluding newlines.

Rewrite properties

Let’s Go!

We have convenient built-in properties to transform and substitute matched values for certain use cases that commonly crop up when rewriting code.

:[hole].Capitalize will capitalize a string matched by hole.

Match Rule: :[[x]]
Rewrite Rule: :[[x]].Capitalize
Test Code: these are words 123
Result: These Are Words 123

Properties are recognized in the rewrite template and substituted according to the predefined behavior. Property accesses cannot be chained. Below are the current built-in properties.

Built-in Properties

String converters

PropertyBehavior
.lowercaseConvert letters to lowercase.
.UPPERCASEConvert letters to uppercase.
.CapitalizeCapitalize the first character if it is a letter.
.uncapitalizeLowercase the first character if it is a letter.
.UPPER_SNAKE_CASEConvert camelCase to snake_case (each capital letter in camelCase gets a _ prepended). Then uppercase letters.
.lower_snake_caseConvert camelCase to snake_case (each capital letter in camelCase gets a _ prepended). Then lowercase letters.
.UpperCamelCaseConvert snake_case to CamelCase (each letter after _ in snake_case is capitalized, and the _ removed). Then capitalize the first character.
.lowerCamelCaseConvert snake_case to CamelCase (each letter after _ in snake_case is capitalized, and the _ removed). Then lowercase the first character.

Sizes

PropertyBehavior
.lengthSubstitute the number of characters of the hole value.
.linesSubstitute the number of lines of the hole value.

Positions

PropertyBehavior
.lineSubstitute the starting line number of this hole.
.line.startAlias of .line.
.line.endSubstitute the ending line number of this hole.
.columnSubstitute the starting column number of this hole (also known as character).
.column.startAlias of .column.
.column.endSubstitute the ending column number of this hole.
.offsetSubstitute the starting byte offset of this hole in the file.
.offset.startAlias of .offset.
.offset.endSubstitute the ending byte offset of this hole in the file.

File context

PropertyBehavior
.fileSubstitute the absolute file path of the file where this hole matched.
.file.pathAlias of .file.
.file.nameSubstitute the file name of the file where this hole matched (basename).
.file.directorySubstitute the file directory of the file where the hole matched (dirname).

Identity (for escaping property names)

PropertyBehavior
.valueSubstitute the text value of this hole (for escaping, see below).

Resolving clashes with property names

Let’s say you want to literally insert the text .length after a hole. We can’t use :[hole].length because that reserved syntax will substitute the length of the match, and not insert the text .length. To resolve a clash like this, simply use :[hole].value instead of :[hole] to substitute the value of :[hole]. Then, append .length to the template. This will cause the .length, to be interpreted literally:

Match Rule: :[[x]]
Rewrite Rule: :[x].value.length is :[x].length
Test Code: a word
Result: a.length is 1 word.length is 4

The way this works is that :[hole].value acts as an escape sequence so that any conflicting .<property> can be interpreted literally by simply appending it to :[hole].value.

FAQs

What does lazy evaluation in hole matching mean?

A: Lazy evaluation means holes (:[hole_name] or ... for unnamed holes) capture the smallest string that fits the pattern for efficient and precise matching.

Example:
In if (width <= 1280 && height <= 800) { return 1;}, with template if (:[var] <= :[rest]), :[var] matches until it sees the <= part coming after it and matches width. :[rest] matches the rest of the condition: 1280 && height <= 800

Way to refine matching is to add concrete context around holes based on what we care about. For example, we could match height to :[height] with either templates

  • if (... && :[height] ...) or
  • if (... :[height] <= 800)

In if (x < 10 && y > 20) {...}, the template if (:[condition] && ...) {...} matches x < 10 as :[condition].

What is structural matching in the context of code patterns?

A: Structural matching accurately handles code by recognizing balanced delimiters ((), [], {}), enabling precise manipulation of nested structures.

Example: For calculate(sum(add(2, 3), multiply(4, 5))), the template add(:[args]) matches add(2, 3).

How does understanding language constructs affect code matching?

A: It’s essential for accurately matching patterns within complex language constructs like comments and string literals.

Example: foo(bar(5 /* includes ) tax */)) matched with foo(bar(:[arg])) captures 5 /* includes ) tax */ as :[arg], ignoring ) within /* includes ) tax */ as closing bracket.

Does whitespace affect the matching process?

A: No, variations in whitespace do not impact matching, allowing templates to adapt to different code formatting styles effectively.

Examples: Match Template if (... && :[height] ...) works with:

  • Single-line: if (width <= 1280 && height <= 800) { return 1; }
  • Multi-line with standard spacing:
if (width <= 1280
 && height <= 800) {
 return 1;
}
  • Multi-line with extra spacing demonstrates matching flexibility, unaffected by whitespace differences.
if (width     <= 1280
    && height <= 800) {
    return 1;
}