Home & blog  /  2011  /  Jun  /  view post  /

Understanding multi-line mode for JS REGEX

posted: 30 Jun '11 21:37 tags: REGEXP, Javascript, look-behind, simulation

I wanted to do a post on regular expressions' multi-line mode, since from looking around the net there appears to be a common misconception about what this does.

That's kind of understandable; you might expect something called multi-line mode to enable your REGEX pattern to match within strings that contain line breaks.

But that happens anyway, with or without multi-line mode turned on. See:

1var myStr = "This is a \n multi-line \n\ string";

2words = myStr.match(/\w+/g);

3if (words) alert(words.join('\n'));

That will find and alert out all the words (notice I pass the global flag, as I want all the words, not just the first) - even though I ran the REGEX on a multi-line string and didn't stipulate multi-line mode.

No; what multi-line mode is about is changing the behaviour of the ^ and $ anchors.

Normally, these match the start and end of the string, respectively. In multi-line mode, though - which is turned on by passing an 'm' after the final forward slash of your pattern - their meanings are extended to also match the moments before ($) and after (^) a line-break.

So imagine I have a pattern which tries to match as many characters as it can, in succession, of any kind (that's what the [\s\S] does - matches all characters, including spacial characters). Let's try it without multi-line mode first:

1var myStr = "This is a \n multi-line \n\ string";

2alert(myStr.match(/^[\s\S]+$/g));

There, I simply get back the whole string. The ^ matches the start of the string, then matches all the characters in succession, then finally hits the end of the string ($). But in multi-line mode:

1var myStr = "This is a \n multi-line \n\ string";

2alert(myStr.match(/^[\s\S]+$/gm));

This time we get back an array of 3 items - one for each word or sequence of words delimited by the line breaks.

So what's happening there is the ^ matches the start of the string, it matches "this is a " but then finds the starting edge of a line-break. In multi-line mode, the $ matches this, so that's the end of a match. And since I'm in global mode, matching continues after the line-break.

So in conclusion, not the most indicative of names, but it can be useful. Not often, mind...

post new comment

Comments (0)