lang/regex/ PerlCompatibleRegularExpressions


There is a library called pcre that allows to embed these regex's into any C program.

Cheat Sheet 1

Summarised from: https://perlmaven.com/regex-cheat-sheet

Character Classes

   [bgh.]      One of the characters listed in the character class b,g,h or . in this case.
   [b-h]       The same as [bcdefgh].
   [a-z]       Lower case Latin letters.
   [bc-]       The characters b, c or - (dash).
   [^bx]       Complementary character class. Anything except b or x.
   \w          Word characters: [a-zA-Z0-9_].
   \d          Digits: [0-9]
   \s          [\f\t\n\r ] form-feed, tab, newline, carriage return and SPACE
   \W          The complementary of \w: [^\w]
   \D          [^\d]
   \S          [^\s]
   [:class:]   POSIX character classes (alpha, alnum...)
   \p{...}     Unicode definitions (IsAlpha, IsLower, IsHebrew, ...)
   \P{...}     Complementary Unicode character classes.

Quantifiers

Greedy
   a?          0-1         'a' characters
   a+          1-infinite  'a' characters
   a*          0-infinite  'a' characters
   a{n,m}      n-m         'a' characters
   a{n,}       n-infinite  'a' characters
   a{n}        n           'a' characters

Minimal
  a+?
   a*?
   a{n,m}?
   a{n,}?

   a??
   a{n}?

Other

   |                        Alternation

Grouping and Capturing

   (...)                Grouping and capturing
   \1, \2, \3, \4 ...   Capture buffers during regex matching
   $1, $2, $3, $4 ...   Capture variables after successful matching

   (?:...)              Group without capturing (don't set \1 nor $1)

Anchors

   ^           Beginning of string (or beginning of line if /m enabled)
   $           End of string (or end of line if /m enabled)
   \A          Beginning of string
   \Z          End of string (or before new-line)
   \z          End of string
   \b          Word boundary (start-of-word or end-of-word)
   \G          Match only at pos():  at the end-of-match position of prior m//g

Modifiers

  /m           Change ^ and $ to match beginning and end of line respectively
  /s           Change . to match new-line as well
  /i           Case insensitive pattern matching
  /x           Extended pattern (disregard white-space, allow comments starting with #)

Extended

(?#text)             Embedded comment
  (?adlupimsx-imsx)    One or more embedded pattern-match modifiers, to be turned on or off.
  (?:pattern)          Non-capturing group.
  (?|pattern)          Branch test.
  (?=pattern)          A zero-width positive look-ahead assertion.
  (?!pattern)          A zero-width negative look-ahead assertion.
  (?<=pattern)         A zero-width positive look-behind assertion.
  (?<!pattern)         A zero-width negative look-behind assertion.

  (?'NAME'pattern)
  (?<NAME>pattern)     A named capture group.
  \k<NAME>
  \k'NAME'             Named backreference.

  (?{ code })          Zero-width assertion with code execution.
  (??{ code })         A "postponed" regular subexpression with code execution.

Examples

ffprobe language list

This generates a list of what languages are available in a list of .mp4 files. The -P option in grep indicates to use pcre's.

ffp *.mp4 | grep -P -o '(?<=c0a2c89ef6b0d680c770b20f7b90f7043712b71816a4c4c38ddfc619bf935661e29c9c180c6279b0b02abd6a1801c7c04082cf486ec027aa13515e4f3884bb6b).*Audio:' | sort | uniq

To break the regex down:

(?<=c0a2c89ef6b0d680c770b20f7b90f7043712b71816a4c4c38ddfc619bf935661c6f3ac57944a531490cd39902d0f777715fd005efac9a30622d5f5205e7f6894)     # ensures that closing paren \) follows what is matched

and then the -o option means to only print out what is matched, not the rest of the lines containing the matches.

Perl as alternative to sed

Instead of

cat files | sed 's/pattern/replacement/g'

we can do

cat a.php | perl -pe 's/(e(\w))/$1$2$2$1<$2,$1>/'

and this gives us access to the full power of Perl's regular expressions, amongst other things. Also

cat a.php | perl -pe 'tr/[a-z]/[A-Z]/;s/\W/_x_/;'
cat a.php | perl -e 'for(<STDIN>) { tr/[a-z]/[A-Z]/;s/\W/_/g;print; }'

note that the -pe 'expression option, the -e allows code to be send in via the command line, and -p causes that code to be wrapped in

for(<STDIN>) { YOUR_CODE; print; }

where this is for perl -pe 'YOUR_CODE'.