lang/regex/ PerlCompatibleRegularExpressions
There is a library called pcre that allows to embed these regex's into any C program.
Cheat Sheet 1
Summarised from: https://perlmaven.com/regex-cheat-sheet
Character Classes
[bgh.] One of the characters listed in the character class b,g,h or . in this case.
[b-h] The same as [bcdefgh].
[a-z] Lower case Latin letters.
[bc-] The characters b, c or - (dash).
[^bx] Complementary character class. Anything except b or x.
\w Word characters: [a-zA-Z0-9_].
\d Digits: [0-9]
\s [\f\t\n\r ] form-feed, tab, newline, carriage return and SPACE
\W The complementary of \w: [^\w]
\D [^\d]
\S [^\s]
[:class:] POSIX character classes (alpha, alnum...)
\p{...} Unicode definitions (IsAlpha, IsLower, IsHebrew, ...)
\P{...} Complementary Unicode character classes.
Quantifiers
Greedy
a? 0-1 'a' characters
a+ 1-infinite 'a' characters
a* 0-infinite 'a' characters
a{n,m} n-m 'a' characters
a{n,} n-infinite 'a' characters
a{n} n 'a' characters
Minimal
a+?
a*?
a{n,m}?
a{n,}?
a??
a{n}?
Other
| Alternation
Grouping and Capturing
(...) Grouping and capturing
\1, \2, \3, \4 ... Capture buffers during regex matching
$1, $2, $3, $4 ... Capture variables after successful matching
(?:...) Group without capturing (don't set \1 nor $1)
Anchors
^ Beginning of string (or beginning of line if /m enabled)
$ End of string (or end of line if /m enabled)
\A Beginning of string
\Z End of string (or before new-line)
\z End of string
\b Word boundary (start-of-word or end-of-word)
\G Match only at pos(): at the end-of-match position of prior m//g
Modifiers
/m Change ^ and $ to match beginning and end of line respectively
/s Change . to match new-line as well
/i Case insensitive pattern matching
/x Extended pattern (disregard white-space, allow comments starting with #)
Extended
(?#text) Embedded comment
(?adlupimsx-imsx) One or more embedded pattern-match modifiers, to be turned on or off.
(?:pattern) Non-capturing group.
(?|pattern) Branch test.
(?=pattern) A zero-width positive look-ahead assertion.
(?!pattern) A zero-width negative look-ahead assertion.
(?<=pattern) A zero-width positive look-behind assertion.
(?<!pattern) A zero-width negative look-behind assertion.
(?'NAME'pattern)
(?<NAME>pattern) A named capture group.
\k<NAME>
\k'NAME' Named backreference.
(?{ code }) Zero-width assertion with code execution.
(??{ code }) A "postponed" regular subexpression with code execution.
Examples
ffprobe language list
This generates a list of what languages are available in a list of .mp4
files. The -P
option in grep indicates to use pcre's.
ffp *.mp4 | grep -P -o '(?<=e6aa2d50c9f1e4bca01ff451a29b76f98ab82ec36e8606033374fe60fbea77b2e29c9c180c6279b0b02abd6a1801c7c04082cf486ec027aa13515e4f3884bb6b).*Audio:' | sort | uniq
To break the regex down:
(?<=e6aa2d50c9f1e4bca01ff451a29b76f98ab82ec36e8606033374fe60fbea77b2c6f3ac57944a531490cd39902d0f777715fd005efac9a30622d5f5205e7f6894) # ensures that closing paren \) follows what is matched
and then the -o
option means to only print out what is matched, not the rest of the lines containing the matches.
Perl as alternative to sed
Instead of
cat files | sed 's/pattern/replacement/g'
we can do
cat a.php | perl -pe 's/(e(\w))/$1$2$2$1<$2,$1>/'
and this gives us access to the full power of Perl's regular expressions, amongst other things. Also
cat a.php | perl -pe 'tr/[a-z]/[A-Z]/;s/\W/_x_/;'
cat a.php | perl -e 'for(<STDIN>) { tr/[a-z]/[A-Z]/;s/\W/_/g;print; }'
note that the -pe 'expression
option, the -e
allows code to be send in via the command line, and
-p
causes that code to be wrapped in
for(<STDIN>) { YOUR_CODE; print; }
where this is for perl -pe 'YOUR_CODE'
.