Regular Expression通常是用來尋找"特定的字串樣式(pattern)",也就是所謂"格式辨認(pattern-matching)"的功能。
他的運算子是 =~ (唸成match) 和 !~ (唸成not match)。
Syntex: $string =~ /regular expression/expression modifier
Ex: $sentence =~ /Hello/
(a) Modifiers : 修飾選項可有可無,它是用來對整個敘述作修正的。
g | Match globally, i.e. find all occurrences. |
i |
Makes the search case-insensitvie. |
m |
If the string has new-line characters embedded within it, the metacharacters ^ and $ will not work correctly. This modifier tells Perl to treat this line as a multiple line. |
o |
Only compile pattern once. |
s |
The character . matches any character except a new line. This modifier treats this line as a single line, which allows . to match a new-line character. |
x |
Allows white space in the expression. |
(b) Metacharacter : 下面這些字元都具有特殊意義,可以讓你建立更複雜的搜尋樣式(searching pattern)。
\ | Tells Perl to accept the following characters as a regular character; this removes special meanings from any metacharacter. |
^ | Matches the beginning of the string, unless /m is used. |
. | Matches any character except a new line character, unless /s is used. |
$ | Matches the end of the string, unless /m is used. |
| | Expresses alternation. This means the expressions will search for multiple patterns in the same string. |
() | Groups expressions to assist in alternation and back referencing. |
[] | Looks for a set of characters. |
(c) Patterm Quantifier : 用來表示字元的數量關係。
* | Matches 0 or more times. |
+ | Matches 1 or more times. |
? | Matches 0 or 1 times. |
{n} | Matches exactly n times. |
{n,} | Matches at least n times. |
{n,m} | Matches at least n times but no more than m times. |
(d) Character Patterns : 下列的sequence用來match一些特定格式的字元:
\r | Carriage return (CR), ASCII 13(十進位) |
\n | New line, UNIX中代表ASCII 10(十進位), DOS(Windows)系統中則是ASCII 13 + ASCII 10 (十進位) |
\t | Tab, ASCII 9(十進位) |
\w | Matches an alphanumeric character. Alphanumeric also includes _. 即[A-Za-z0-9_]. |
\W | Matches a nonalphanumeric character. 即[^A-Za-z0-9_]. |
\s | Matches a white space character. This includes space, tab, FormFeed and CR/LF. 即[\ \t\f\r\n]. |
\S | Matches a non-white space character. 即[^\ \t\f\r\n]. |
\d | Matches a digit. 即[0-9]. |
\d | Matches a nondigit character. 即[^0-9]. |
\b | Matches a word boundary. |
\B | Matches a nonword boundary. |
\033 | octal char |
\x1B | hex char |
(e) Examples :
/abc/ => 找到含有abc的字串
/^abc/ => 找到開頭是abc的字串
/abc$/ => 找到結尾是abc的字串
/a|b/ => 找到有a或b的字串,也可以用來找整個字(word)
/ab{2,4}c/ => 找到a後面跟著2-4個b,再跟著c的字串,若只有/ab{2,}c/則會找二個以上的b
/ab*c/ => 找到a後面跟著0個或多個b,再跟著c的字串,如同/ab{0,}c/
/ab+c/ => 找到a後面跟著一個以上的b,再跟著c的字串,如同/ab{1,}c/
/a.c/ => .可以代表任何字元,除了new line字元(\n)外。
/[abc]/ => 找到含有這三個字元中任何一個的字串。
/\d/ => 找到含有數字的字串,如同/[0-9]/
/\w/ => 找到含有字母的字串,如同/[a-zA-Z0-9_]/
/\s/ => 找到含有white space的字串,如同/[ \t\r\n\f]/
/[^abc]/ => 找到沒有abc任一字元的字串
/\*/ => 找到含有字元*的字串,在反斜線"\"後面的字元Perl會把他當作普通字元看待。
若你不確定這個符號是否為特殊字元,乾脆全加上\以策安全。
/abc/i => 忽略abc的大小寫
/(\d+)\.(\d+)\.(\d+)\.(\d+)/
=> 找到類似IP的字串,並將IP的四個數字分別存放在$1,$2,$3$4四個特殊變數中,以便在其後加以利用。
Ex:
if($x =~ /(\d+)\.(\d+)\.(\d+)\.(\d+)/)
{
print "成功大學" if ($1 eq "140.116");
}
m//gimosx => m命令可以讓你自訂pattern的分隔符號,而gimosx則是它的修飾選項,請參考(a)Modifiers。
Ex:
$url="my.machine.tw:8080/noname/test.pl";
($host, $port, $file)=($url=~m|http://([^/:]+):{0,1}(\d*)(\S*)$|);
這個Regular Expression相當複雜,主要目的是分析指定的URL,然後取得host名稱、port號碼及對應的檔案。
我一項一項慢慢解釋:
$url=~m||
m後面跟著的就是分隔符號,||裡面的就是pattern。
([^/:]+)
match一個字串,裡面沒有/和:的字元。找到的字串存放在$1中。
:{0,1}(\d*)
match 0或1個:,後面跟著一串數字或nothing。找到的字串存在$2中,若找不到,$2就是空的。
(\S*)$
match一串非空白字元,並以找到的字串為結尾。找到的字串存在$3中。
()=()
($host, $port, $file)=($1,$2,$3)
即$host="my.machine.tw" $port="8080" $file="/noname/test.pl"
s/PATTERN/REPLACEMENT/egimox
這個取代的命令。它會尋找符合的PATTERN的字串,並取代成REPLACEMENT字串。
它的修飾選項多了e選項,其他跟上面一樣,列表如下:
e | Evaluate the right side as an expression. |
g | Replace globally, i.e. all occurrences. |
i | Do case-insensitive pattern matching. |
m | Treat string as multiple lines. |
o | Only compile pattern once. |
s | Treat string as single line. |
x | Use extended regular expressions. |
Ex:
$x =~ s/\s+//g => 把所有的white space全部去除掉
$x =~ s/([^ ]*):*([^ ]*)/$2:$1/ => 把用":"分開的兩個欄位互相對調
$path =~ s|/usr/bin|/usr/local/bin| => 可以自訂分隔符號
tr/SEARCHLIST/REPLACEMENTLIST/cds
這也是取代命令,和上一個不同的是SEARCHLIST和REPLACEMENTLIST只能是普通字串,而不是Regular Expression,
所以速度比較快。它的修飾選項也比較少:
c | Complement the SEARCHLIST |
d | Delete found but unreplaced characters. |
s | Squash duplicate replaced characters. |
Ex:
$x =~ tr/this/that => 把"this"替換成"that"
$x =~ tr/a-z/A-Z/ => 把小寫字母全部替換成大寫字母
$count = $x =~ tr/*/*/ => 計算$x中有幾個"*"
留言列表