NOTUVY regex is a wrapper around the standard Java java.util.regex package. It adds no new functionality, but it make writing sophisiticated regex logic simpler and cleaner than the standard package.

  1. Simple Usage
  2. Multiple Comparisons
  3. Rewrite Strategy
  4. Sequence of Matches
  5. Binding of Matched Names to Values
  6. Using a Built-In Formatter
  7. Extracting Values to a List
  8. Using a String Transform

Simple Usage

Here is a simple example of how the standard package is used for a simple comparison:

// See if the date string matches "dd/mm/yyyy"
static Pattern PATTERN = Pattern.compile("(\\d+)/(\\d+)/(\\d+)");

public void compare(String input) {
    Matcher matcher = PATTERN.matcher(input);
    if (matcher.find()) {
        // Output as "yyyy, mm"
        System.out.println(matcher.group(3) + ", " + matcher.group(2));
    }
}

Here is how it is replaced with NOTUVY regex:

// See if the date string matches "dd/mm/yyyy"
static Pattern PATTERN = Pattern.compile("(\\d+)/(\\d+)/(\\d+)");

public void compare(String input) {
    PatternMatcher pm = PatternMatcher.createUsing(PATTERN);
    if (pm.on(input).found()) {
        // Output as "yyyy, mm"
        System.out.println(pm.getMatcher().group(3) + ", " + pm.getMatcher().group(2));
    }
}

Notice that the overall logic is the same. PatternMatcher is merely a container for an underlying Pattern and Matcher. These are still accessed and used as with the standard framework.

Multiple Comparisons

Now let us extend the example to perform multiple comparisons

// See if the date string matches "dd/mm/yyyy"
static Pattern PATTERN1 = Pattern.compile("(\\d+)/(\\d+)/(\\d+)");

// See if the date string matches "yyyy-mm-dd"
static Pattern PATTERN2 = Pattern.compile("(\\d+)-(\\d+)-(\\d+)");

public void compare(String input) {
    Matcher matcher1 = PATTERN1.matcher(input);
    Matcher matcher2 = PATTERN2.matcher(input);

    // Output as "yyyy, mm"
    if (matcher1.find()) {
        System.out.println(matcher1.group(3) + ", " + matcher1.group(2));
    } else if (matcher2.find()) {
        System.out.println(matcher1.group(1) + ", " + matcher1.group(2));
    }
}

Notice that this compiles correctly even though there is a critical typographical error. The second comparison matches using matcher2, but uses matcher1 in its output. This will compile correctly and could be a difficult problem to debug. This is the danger of using multiple local variables, which is inevitable with the standard Java regex framework.

Here is how it is replaced with NOTUVY regex:

// See if the date string matches "dd/mm/yyyy"
static Pattern PATTERN1 = Pattern.compile("(\\d+)/(\\d+)/(\\d+)");

// See if the date string matches "yyyy-mm-dd"
static Pattern PATTERN2 = Pattern.compile("(\\d+)-(\\d+)-(\\d+)");

public void compare(String input) {
    PatternMatcher pm = PatternMatcher.createOn(input);

    // Output as "yyyy, mm"
    if (pm.using(PATTERN1).found()) {
        System.out.println(pm.getMatcher().group(3) + ", " + pm.getMatcher().group(2));
    } else if (pm.using(PATTERN2).found()) {
        System.out.println(pm.getMatcher().group(1) + ", " + pm.getMatcher().group(2));
    }
}

Notice that the flow of the logic is simpler and cleaner. It is obvious that the intention is to match on a single string using multiple patterns.

Here we start to see the benefit of NOTUVY regex. Trying to use the standard regex in a cascaded if statement is messy (and it gets more messy as the number of cases increases). The variable reduction from NOTUVY regex simplifies and streamlines the logic. Again, the basic logic is still the same (because the state formerly represented in local variables is now internal state in PatternMatcher).

Rewrite Strategy

Now let us refine the example above. Notice that after the match is made, the underlying Matcher object is referenced and the groups are indexed. This can be streamlined so that once a successful match is made the result is composed from the parts.

public void compare(String input) {
    PatternMatcher pattern = PatternMatcher.createOn(input);

    // Output as "yyyy, mm"
    if (pattern.using(PATTERN1, Strategy.first("$3, $2")).found()) {
        System.out.println(pattern.result());
    } else if (pattern.using(PATTERN2, Strategy.first("$1, $2")).found()) {
        System.out.println(pattern.result());
    }
}

Now when we use a pattern, we also supply a strategy for processing the result. In this case, we use the "replace first" strategy, and provide the rewrite string that uses variables to represent the matched groups.

Sequence of Matches

In the last example, notice that an if/then/else is used, but that the action is the same in each branch. The reason for multiple if statements is to compare against each pattern, even though the result is the same for each.

The class AttemptSequence eliminates the need for multiple if/then/else branches. The list of patterns is declared, and each one is attempted until one matches.

public void compare(String input) {
    AttemptSequence sequence = new AttemptSequence()
            .add(PATTERN1, Strategy.first("$3, $2"))  // transform to "yyyy, mm"
            .add(PATTERN2, Strategy.first("$1, $2"))  // transform to "yyyy, mm"
            ;
    PatternMatcher pattern = sequence.firstMatchOn(input);

    if (pattern.found()) {
        System.out.println(pattern.result());
    }
}

Binding of Matched Names to Values

Using numerical indices can be problematic and error-prone. Sometimes a cleaner approach is to use named values rather than indices.

public void compare(String input) {
    AttemptSequence sequence = new AttemptSequence()
            .add("(\\d+)/(\\d+)/(\\d+)", Strategy.namedResult(2, "month").and(3, "year"))
            .add("(\\d+)-(\\d+)-(\\d+)", Strategy.namedResult(1, "year").and(2, "month")
            ;
    PatternMatcher pattern = sequence.firstMatchOn(input);

    if (pattern.found()) {
        System.out.println(String.format("%s, %s", pattern.resultNamed("year"), pattern.resultNamed("month")));
    }
}

Here the pattern is specified and the strategy for extraction is immediately given using names to specify each grouping. Once the match is found, the values are extracted by name.

Using a Built-In Formatter

The above example can be changed to move the string formatting logic to the regex framework. This makes the client logic (code in the if() branch) simpler; actually trivial.

public void compare(String input) {
    AttemptSequence sequence = new AttemptSequence()
            .add(PATTERN1, Strategy.namedResult(2, "month").and(3, "year").formatted("$year, $month"))
            .add(PATTERN2, Strategy.namedResult(1, "year").and(2, "month").and("day").formatted("$year, $month"))
            ;
    PatternMatcher pattern = sequence.firstMatchOn(input);

    if (pattern.found()) {
        System.out.println(pattern.result());
    }
}

Notice here that a format pattern is given to the Strategy, so the result is composed when the match is found. Upon confirming the match, the result is simple extracted.

For named value extraction, the format string defines substitution by using the value names prepended with "$".

Extracting Values to a List

An alternative to using a replacement pattern is to use an extraction with a format.

public void compare(String input) {
    AttemptSequence sequence = new AttemptSequence()
            .add(PATTERN1, Strategy.extractGroup(3).and(2).formatted("%s, %s"))  // transform to "yyyy, mm"
            .add(PATTERN2, Strategy.extractGroup(1).and(2).formatted("%s, %s"))  // transform to "yyyy, mm"
            ;
    PatternMatcher pattern = sequence.firstMatchOn(input);

    if (pattern.found()) {
        System.out.println(pattern.result());
    }
}

Notice that this is effectively the same as section 4, but it sets up the logic to use one of the powerful facets of the framework demonstrated in the next example.

For list extraction, the format string is defined the same as the Java String format() method.

Using a String Transform

The framework allows the specification of a string transform for an individual extracted value. This allows the value to be rewritten to a new value and then used.

For example, let us assume that we want to convert the numeric month to the alphabetic name, (and do nothing extra with the year string). This is achieve with some extra logic.

static String[] MONTHS = {"Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"};

static StringTransformable to_month_name = new StringTransformable() {
    public String transform(String pInput) {
        return MONTHS[Integer.parseInt(pInput) - 1];
    }
};

public void compare(String input) {
    AttemptSequence sequence = new AttemptSequence()
            .add(PATTERN1, Strategy.extractGroup(3).and(2, to_month_name).formatted("%s, %s"))  // transform to "yyyy, mm"
            .add(PATTERN2, Strategy.extractGroup(1).and(2, to_month_name).formatted("%s, %s"))  // transform to "yyyy, mm"
            ;
    PatternMatcher pattern = sequence.firstMatchOn(input);

    if (pattern.found()) {
        System.out.println(pattern.result());
    }
}

Notice here that we define a tranform class that has logic to convert the numeric month by converting it to an integer. That values is used to index into a String array holding the month names.

The power here is that the client logic does not need to do any string manipulation. It simple extracts the result value from the PatternMatcher. This means that each pattern in the sequence contains its logic for conversion, and the user of the pattern simply lets the framework do all the work and simply extract the result. The conversion logic is tied directly to the pattern definition is is easier to read.