
This is part of the DOS/Windows end of line sequence CR-LF, and was the EOL character on Mac 9 and earlier. The CR control character 0x0D (carriage return). This is the regular end of line under Unix systems.

The LF control character 0x0A (line feed). The FF control character 0x0C (form feed). This is only allowed inside a character class definition. The BS control character 0x08 (backspace). See also the discussion on character ranges. This trick also works with symbolic names of control characters, like For instance, in Spanish, "ch" is a single letter, though it is written using two characters. If the document is ANSI encoded, this construct is invalid.Ī single byte character whose code in octal isĬollating sequence stands for. Like above, but matches a full 16-bit Unicode character. \圎9 may match an é or a θ depending on the code page in an ANSI encoded document. What this stands for depends on the text encoding.

\d stands for "a digit", while "d" is just an ordinary letter. Adding the backslash (this is calledĮscaping) works the other way round, as it makes special a character that otherwise isn't. [ and not as the start of a character set. Г that would otherwise have a special meaning. This is useful if you have a Unicode encoded text with accents as separate, combining characters. Matches a single non-combining characer followed by any number of combining characters.

will only match characters within a line, and not the line ending characters ( matches newline", the dot will indeed do that, enabling the "any" character to run over multiple lines. In a regular expression (shortened into regex throughout), special characters interpreted are: Single-character matches.
