Regular Expressions And Template Literals

Regular Expressions And Template Literals

·

10 min read

Setup

Somewhere along the line, I heard a comment about template literals being a great tool for making regular expressions a bit easier to read. I started this article with the idea that I wanted to see if that was true and come up with some examples of this type of use.

Given the glimmer of an idea I started a new project. This is an exercise ... plain and simple. This pattern "could" be used in a production environment, but I am in now way recommending that.

There are probably some vetted tools out there that can do this for the front-end. Please list some of these in the comments, if you know of them; if only for the sake of my readers.

Previous Work With Regular Expressions

Having worked a project for a client where I had to recreate a script parser and engine for a 30-year old, mainframe driven client language, I had a lot of respect for Regular Expressions. I learned a lot (translate that into ... a lot of poor code was written and refactored). After two major refactors, I had a working set of code ... and HUNDREDS of Regular Expressions to make things work.

I used every trick I knew to make the Parser Regular Expression Service more readable. I abstracted and combined together all sorts of interesting patterns, knowing that someday this code would be managed by someone else.

Having struggled with this, using Template Literals this way sounded very efficient and clean. Certainly something that deserved some research.

What I Want To Do ...

First, I found a regular expression; something like this. I want to take this ...

Matches text avoiding additional spaces

// ^[\s]*(.*?)[\s]*$

And, generate it from something more legible, like this ...

const code0001 = `
  /* Matches text avoiding additional spaces
  */
  ^       // Beginning of line
  [\\s]*  // Zero or more whitespace
  (.*?)   // Any characters, zero to unlimited,
          //   lazy (as few times as possible, expanding as needed)
  [\\s]*  // Zero or more whitespace
  $       // End of line
`;

NOTE here that the \s still needs to be escaped ... seems odd, but there it is.

Beginning

First, I needed to get rid of comments ...

// Borrowed Function (stripComment uses the regex
// ... https://stackoverflow.com/a/47312708)
function stripComments(stringLiteral) {
  return stringLiteral
    .replace(/\/\*[\s\S]*?\*\/|([^:]|^)\/\/.*$/gm, '');
}

The code above took the code and essentially translated it into ...

"

  ^    
  [\s]*
  (.*?)
  [\s]*
  $    
"

Basically, now I need to get rid of line breaks, new lines, and spaces (yes, I know there can be a space in a regex pattern, but I'm choosing to ignore that for simplicity sake in this exercise). To remove unneeded characters ...

// Starting Demo Code Here
function createRegex(stringLiteral) {
  return stripComments(stringLiteral)
    .replace(/(\r\n|r\|\n|\s)/gm, '');
}

Which then gives me the ability to do this ...

const code0001regex = new RegExp(createRegex(code0001));

//          ORIGINAL FROM ABOVE: /^[\s]*(.*?)[\s]*$/
// GENERATED code001regex value: /^[\s]*(.*?)[\s]*$/

Let's Take A Look ...

The code0001 I defined above has been reworked for legibility (now much easier to hone in on what this regex pattern is going to do) ...

// /^[\s]*(.*?)[\s]*$/
const code0001 = `
  ^       // Beginning of line
  [\\s]*  // Zero or more whitespace

  (.*?)   // Any characters, zero to unlimited,
          //  lazy (as few times as possible, expanding as needed)

  [\\s]*  // Zero or more whitespace
  $       // End of line
`;

code0002 Matches any valid HTML tag and the corresponding closing tag ... here, I've tried to show a bit more advanced indenting (both in the code and in the supporting comments).

// <([a-z]+)([^<]+)*(?:>(.*)<\/\1>|\s+\/>)
const code0002 = `
  <               // Literal
  ([a-z]+)        // Group: First Tag (one or more)
  (               // Group
    [^<]+           // Match (one or more) NOT <
  )*              // Group-END: Zero or more times
  (?:             // Group-NON-CAPTURE
    >               // Literal
    (.*)<\\/\\1>    // Up to and including SLASH and First Tag group above
    |\\s+\\/>       // OR spaces and close tag
  )               // Group-END
`;

code0003 Matches any valid hex color inside text.

// \B#(?:[a-fA-F0–9]{6}|[a-fA-F0–9]{3})\b
const code0003 = `
  \\B#              // Non-word boundary, Literal #
  (?:               // Group-NON-CAPTURE
    [a-fA-F0–9]{6}    // 1st alternative
    |[a-fA-F0–9]{3}   // 2nd alternative
  )                 // Group-END
  \\b               // Word boundary
`;

code0004 Matches any valid email inside text.

// \b[\w.!#$%&’*+\/=?^`{|}~-]+@[\w-]+(?:\.[\w-]+)*\b
const code0004 = `
  \\b                           // Word boundary
  [\\w.!#$%&’*+\\/=?^\`{|}~-]+  // Character in this list (and word), one to unlimited
  @                             // Literal
  [\\w-]+                       // One to unlimited word and character "-"
  (?:                           // Group-NON-CAPTURE
    \\.[\\w-]+                    // Literal ".", one to unlimited word and character "-"
  )*                            // Group-END (zero or more)
  \\b                           // Word boundary
`;

code0005 Strong password: Minimum length of 6, at least one uppercase letter, at least one lowercase letter, at least one number, at least one special character.

// (?=^.{6,}$)((?=.*\w)(?=.*[A-Z])(?=.*[a-z])
// ... (?=.*[0-9])(?=.*[|!"$%&\/\(\)\?\^\'\\\+\-\*]))^.*
const code0005 = `
  (?=           // Group-POSITIVE-LOOKAHEAD
    ^             // BOL
    .{6,}         // Any six characters except line terminators
    $             // EOL
  )             // Group-POSITIVE-LOOKAHEAD-END
  (             // Group
    (?=.*\\w)     // Group-POSITIVE-LOOKAHEAD
                  // Any Characters, zero to unlimited
                  // Any Word

    (?=.*[A-Z])   // Group-POSITIVE-LOOKAHEAD
                  // Any Characters, zero to unlimited
                  // Any Character (A-Z)

    (?=.*[a-z])   // Group-POSITIVE-LOOKAHEAD
                  // Any Characters, zero to unlimited
                  // Any Character (a-z)

    (?=.*[0-9])   // Group-POSITIVE-LOOKAHEAD
                  // Any Characters, zero to unlimited
                  // Any Character (0-9)

    (?=.*[|!"$%&\\/\\(\\)\\?\\^\\'\\\\\\+\\-\\*])
                  // Group-POSITIVE-LOOKAHEAD
                  // Any Characters, zero to unlimited
                  // Any Character in the list
  )             // Group-END
  ^             // BOL
  .*            // Match Any Characters, zero to unlimited
`;

code0006 SSN — Social Security Number (simple)

// ^((?<area>[\d]{3})[-][\d]{2}[-][\d]{4})$
const code0006 = `
  ^                   // BOL
  (                   // Group
    (?<area>            // Group-NAMED area
      [\\d]{3}            // 3-Digits
    )                   // Group-NAMED-END
    [-]                 // Literal, Dash
    [\\d]{2}            //  2-Digits
    [-]                 // Literal, Dash
    [\\d]{4}            // 4-Digits
  )                   // Group-END
  $                   // EOL
`;

Conclusions

This whole article is a different take on generating Regular Expressions using some of JavaScript's template literals. This was an experiment. A successful one I believe.

This exercise also points out that writing tests against the regex can become much easier as the pattern become more understandable.

The regex generated here is much easier to read and reason about, which was the goal. This is a pattern I could get behind if there was a need for a number of regex templates within a project.