NOTUVY regex is a wrapper around the standard Java java.util.regex package. It adds no new functionality, but it makes writing sophisiticated regex logic simpler and cleaner than the standard package.

Basic Concept

A regex evaluation consists of two parts -- the String input, and the regular expression Pattern. We say a PatternMatcher performs a *match*. This is an operation *on* an input string *using* a Pattern.

Terminology

Evaluation
The result of processing an pattern on a string, when there was a match.
ResultStrategy
How to transform the input string into a result string. This is enumerated into 4 specific strategies:
ReplaceFirst
Replace the first match occurence in the input string.
ReplaceAll
Replace all match occurences in the input string.
IndexExtraction
Extracting several subgroups from the input string and returning them as a list of String.
NamedExtraction
Extracting the subgroups from the input string and place the values in a map using given names.
MatchAttempt
One possible evaluation on an input string. It is composed of a Pattern and a ResultStrategy which is appied if matched.
AttemptSequence
A list of MatchAttempts. Each is tried until one successfully matches (after which the rest are ignored).
StringTransformable
An interface which delcares a class that can transform an input string to a different output value.

Replacement, Extraction, and Composition

There are several operations for which patterns are used. The standard framework supports two.

One is replacement where entire original string is transformed into a new result string. The other is extraction where individual string values are parsed from the original string and returned as separate values.

These different operations are encapsulated in the ResultStrategy, where there are two replacement operations and two extraction operations.

The NOTUVY regex framework adds a third operation, composition, which is almost a hybrid of the other two. This takes the original string, extracts individual parts from it, and combines them into a single result string.

With NOTUVY regex, when doing both replacement and composition, the final result string is computed and set, and can be retrieved with the result() method in PatternMatcher.

Replacement

In the standard framework, there are two replacement operations: replaceFirst and replaceAll. These are implemented on the Matcher class (and also in String).

In NOTUVY regex, the same two operations are implemented in the Strategy class with the first() and all() methods.

Extraction

Composition

Exceptions

All potential exceptions generated within this framework are unchecked exceptions. The majority of these come directly from java.util.regex. However, a few originate from NOTUVY regex in the form of a PatternMatcherException:

  1. Attempting to access the underlying java.util.regex.Matcher before the evaluation.
  2. An match attempt is performed with no pattern.
  3. An match attempt is performed with no input string.
  4. Creating a sequence of Extractions with inconsistent result sizes.

Avoid Exceptions at Runtime with Static Definition

The following code shows problematic error handling:

PatternMatcher pm = PatternMatcher.createOn("input String");
if (pm.using("([A-Z]").found()) {
  ...
}

The problem is that the pattern is malformed (no closing parenthesis matching the opening one). Because the pattern is not compiled until runtime, the resulting PatternSyntaxException will not be thrown until that line of code is executed. If the code is located in an infrequently executed branch, it may remain hidden until an inopportune time.

This can be avoided by using static compilation of the patterns. This way, the error is discovered immediately at class load time. There are two ways to achieve this. First, keep the logic the same but use a pre-compiled pattern:

private static final Pattern PAT = pattern.compile("([A-Z]");

PatternMatcher pm = PatternMatcher.createOn("input String");
if (pm.using(PAT).found()) {
  ...
}

Alternatively, the PatternMatcher itself can be made static:

private static final PatternMatcher PM = PatternMatcher.createUsing("([A-Z]");

if (PM.on("input String").found()) {
  ...
}

In both of these cases the pattern syntax error still exists. The advantage is that it will be reported immediately, rather than remaining buried.

Thread Safety with Static Declarations

Statically declared PatternMatcher instances can be problematic because they are not thread-safe.

private static final PatternMatcher PM = PatternMatcher.createUsing("([A-Z])");

if (PM.on("input String").found()) {
  ...
}

The problem with this logic is that PatternMatcher has internal state, so if multiple thread try to access this variable, they will interfere with each other. The correct way to do this is to make the variable immutable:

private static final PatternMatcher PM_FACTORY = PatternMatcher.createUsing("([A-Z])").immutable();

PatternMatcher pattern = PM_FACTORY.on("input String").
if (pattern.found()) {
  ...
}

This will make the logic thread-safe. The reason this works is because the call to on() now returns a clone of the original object. Thus, each thread will create its own copy to operate on.

Be aware that the new object should be captured as a separate variable if results are to be extracted from it in a later step. Note that the following code is erroneous because it attempts to extract the result value from the immutable instance.

private static final PatternMatcher PM = PatternMatcher.createUsing("([A-Z])", Strategy.extractGroup(1)).immutable();

if (PM.on("input String").found()) {
  System.out.println(PM.result());
}

String Transformation

In standard Java string pattern replacement, it is simple to perform extraction and replacement. Consider the following example where the pattern searches for the first complete word that does not contain the letter "s", and it places square brackets around that word in the returned string.

"this is the test".replaceFirst("\\b[^sS]+\\b", "[$0]");

This produces the result:

"this is [the] test"

This extracts a string value, and uses it unchanged in the result. However, if we do want to transform it (change or convert it somehow), the standard framework does not support it. This is accomplished with the NOTUVY regex framework with the use of StringTransformable.