Glob vs Regex
Globbing (wildcard patterns) is something that comes naturally to us when we search for files on a command-line of a Linux or Windows box. While its not as powerful as regular expressions, it’s less to type and extremely simple.
Most command line tools don’t process globs and are reliant on the shell to process the globs. Bash 4 also introduces the globstar (but needs it to be enabled with shopt -s globstar
; use shopt | grep globstar
to check if it is set) to match within subdirectories recursively (the match also includes path separators).
Regular expressions can become extremely complex (See positive look-ahead Eg. (?=\S)
, to ensure the next capturing group is not a space, and positive look-behind Eg. (?<=\S)
, to ensure the previous capturing group is not a space). And then there are the PCRE (Perl Compatible Regular Expressions), PCRE2, and POSIX (Portable Operating Systems Interface) forms of regular expression parsers. Although not the best choice for web scraping (DOM parsing is arguably better), it can get the job done. It’s also perfect for writing automated test cases.