Tag: regex

  • Regex Cheat Sheet

    Regex Cheat Sheet

    Simple Examples

    • hello: Matches the string “hello” exactly.
    • 123: Matches the string “123” exactly.
    • .: Matches any single character.
    • \d: Matches any digit character (0-9).
    • \w: Matches any word character (a-z, A-Z, 0-9, _).
    • \s: Matches any whitespace character (space, tab, newline).

    Examples

    • hello world: Matches the string “hello world” exactly.
    • hello|world: Matches either “hello” or “world”.
    • hello.*: Matches “hello” followed by zero or more characters.
    • hello\s: Matches “hello” followed by a whitespace character.
    • hello\d: Matches “hello” followed by a digit character.
    • hello\w: Matches “hello” followed by a word character.
    • hello\s\d: Matches “hello” followed by a whitespace character and a digit character.

    Quantifiers

    • a+: Matches one or more “a” characters.
    • a*: Matches zero or more “a” characters.
    • a?: Matches zero or one “a” character.
    • a{3}: Matches exactly three “a” characters.
    • a{3,}: Matches three or more “a” characters.
    • a{3,5}: Matches between three and five “a” characters.

    Character Classes

    • [abc]: Matches any of the characters “a”, “b”, or “c”.
    • [^abc]: Matches any character that is not “a”, “b”, or “c”.
    • [a-z]: Matches any lowercase letter.
    • [A-Z]: Matches any uppercase letter.
    • [0-9]: Matches any digit character.
    • [\w]: Matches any word character.
    • [\W]: Matches any non-word character.
    • [\s]: Matches any whitespace character.
    • [\S]: Matches any non-whitespace character.

    Anchors

    • ^hello: Matches “hello” at the beginning of a line.
    • world$: Matches “world” at the end of a line.
    • \bhello\b: Matches “hello” as a whole word.
    • \Bhello\B: Matches “hello” not as a whole word.

    Groups

    • (hello): Matches “hello” and captures it as a group.
    • (hello|world): Matches either “hello” or “world”.
    • (?:hello): Matches “hello” but does not capture it as a group.
    • (?=hello): Matches any string that is followed by “hello”.
    • (?!hello): Matches any string that is not followed by “hello”.

    Flags

    • /hello/i: Matches “hello” case-insensitively.
    • /hello/g: Matches all occurrences of “hello”.
    • /hello/m: Matches “hello” across multiple lines.
    • /hello/s: Matches “hello” across multiple lines and allows “.” to match newline characters.

    Special Characters

    • \: Escapes a special character.
    • .: Matches any single character except newline characters.
    • ^: Matches the beginning of a line.
    • $: Matches the end of a line.
    • *: Matches zero or more of the preceding character.
    • +: Matches one or more of the preceding character.
    • ?: Matches zero or one of the preceding character.
    • (: Begins a capturing group.
    • ): Ends a capturing group.
    • [: Begins a character class.
    • ]: Ends a character class.
    • {: Begins a quantifier.
    • }: Ends a quantifier.
    • |: Matches either the expression before or after the operator.
    • /: Begins and ends a regular expression.
    • i: Makes the regex case-insensitive.
    • g: Matches all occurrences of the pattern.
    • m: Makes the ^ and $ anchors match the beginning and end of lines.
    • s: Allows . to match newline characters.
    • \n: Matches a newline character.
    • \r: Matches a carriage return character.
    • \t: Matches a tab character.
    • \v: Matches a vertical tab character.
    • \0: Matches a null character (U+0000 NULL).
    • \xhh: Matches a character with the given hex code (e.g. \x0A matches a newline character).
    • \uhhhh: Matches a character with the given Unicode code point (e.g. \u0009 matches a tab character).
    • \cX: Matches a control character using caret notation (e.g. \cJ matches a newline character).
    • \u{hhhh}: Matches a character with the given Unicode code point (e.g. \u{0009} matches a tab character).

    Lookarounds

    • (?=hello): Positive lookahead. Matches any string that is followed by “hello”.
    • (?!hello): Negative lookahead. Matches any string that is not followed by “hello”.
    • (?<=hello): Positive lookbehind. Matches any string that is preceded by “hello”.
    • (?<!hello): Negative lookbehind. Matches any string that is not preceded by “hello”.

    Unicode Categories

    • \p{L}: Matches any letter character.
    • \p{M}: Matches any combining mark character.
    • \p{Z}: Matches any separator character.
    • \p{S}: Matches any symbol character.
    • \p{N}: Matches any number character.
    • \p{P}: Matches any punctuation character.
    • \p{C}: Matches any control character.
    • \p{Ll}: Matches any lowercase letter.
    • \p{Lu}: Matches any uppercase letter.
    • \p{Lt}: Matches any titlecase letter.
    • \p{L&}: Matches any letter character.
    • \p{Lm}: Matches any modifier letter.
    • \p{Lo}: Matches any other letter character.
    • \p{Mn}: Matches any non-spacing mark character.
    • \p{Mc}: Matches any spacing combining mark character.
    • \p{Me}: Matches any enclosing mark character.
    • \p{Zs}: Matches any space separator character.
    • \p{Zl}: Matches any line separator character.
    • \p{Zp}: Matches any paragraph separator character.
    • \p{Sm}: Matches any mathematical symbol character.
    • \p{Sc}: Matches any currency symbol character.
    • \p{Sk}: Matches any modifier symbol character.
    • \p{So}: Matches any other symbol character.
    • \p{Nd}: Matches any decimal digit character.
    • \p{Nl}: Matches any letter number character.
    • \p{No}: Matches any other number character.
    • \p{Pc}: Matches any connector punctuation character.
    • \p{Pd}: Matches any dash punctuation character.
    • \p{Ps}: Matches any open punctuation character.
    • \p{Pe}: Matches any close punctuation character.
    • \p{Pi}: Matches any initial punctuation character.
    • \p{Pf}: Matches any final punctuation character.
    • \p{Po}: Matches any other punctuation character.
    • \p{Cc}: Matches any control character.
    • \p{Cf}: Matches any format character.
    • \p{Cs}: Matches any surrogate character.
    • \p{Co}: Matches any private-use character.
    • \p{Cn}: Matches any unassigned code point.

    POSIX Classes

    • [:alnum:]: Matches any alphanumeric character.
    • [:alpha:]: Matches any alphabetic character.
    • [:ascii:]: Matches any ASCII character.
    • [:blank:]: Matches any whitespace character.
    • [:cntrl:]: Matches any control character.
    • [:digit:]: Matches any digit character.
    • [:graph:]: Matches any visible character.
    • [:lower:]: Matches any lowercase character.
    • [:print:]: Matches any printable character.
    • [:punct:]: Matches any punctuation character.
    • [:space:]: Matches any whitespace character.
    • [:upper:]: Matches any uppercase character.
    • [:word:]: Matches any word character.
    • [:xdigit:]: Matches any hexadecimal digit character.