Perl Regular Expression(正規表示式) @ Totui

Regular Expression通常是用來尋找"特定的字串樣式(pattern)"，也就是所謂"格式辨認(pattern-matching)"的功能。

他的運算子是 =~ (唸成match) 和 !~ (唸成not match)。

Syntex: $string =~ /regular expression/expression modifier

Ex: $sentence =~ /Hello/

(a) Modifiers : 修飾選項可有可無，它是用來對整個敘述作修正的。

g	Match globally, i.e. find all occurrences.
i	Makes the search case-insensitvie.
m	If the string has new-line characters embedded within it, the metacharacters ^ and $ will not work correctly. This modifier tells Perl to treat this line as a multiple line.
o	Only compile pattern once.
s	The character . matches any character except a new line. This modifier treats this line as a single line, which allows . to match a new-line character.
x	Allows white space in the expression.

(b) Metacharacter : 下面這些字元都具有特殊意義，可以讓你建立更複雜的搜尋樣式(searching pattern)。

\	Tells Perl to accept the following characters as a regular character; this removes special meanings from any metacharacter.
^	Matches the *beginning* of the string, unless /m is used.
.	Matches any character except a new line character, unless /s is used.
$	Matches the *end* of the string, unless /m is used.
\|	Expresses alternation. This means the expressions will search for multiple patterns in the same string.
()	Groups expressions to assist in alternation and back referencing.
[]	Looks for a set of characters.

*	Matches 0 or more times.
+	Matches 1 or more times.
?	Matches 0 or 1 times.
{n}	Matches exactly n times.
{n,}	Matches at least n times.
{n,m}	Matches at least n times but no more than m times.

(d) Character Patterns : 下列的sequence用來match一些特定格式的字元：

\r	Carriage return (CR), ASCII 13(十進位)
\n	New line, UNIX中代表ASCII 10(十進位), DOS(Windows)系統中則是ASCII 13 + ASCII 10 (十進位)
\t	Tab, ASCII 9(十進位)
\w	Matches an *alphanumeric* character. Alphanumeric also includes _. 即[A-Za-z0-9_].
\W	Matches a nonalphanumeric character. 即[^A-Za-z0-9_].
\s	Matches a white space character. This includes space, tab, FormFeed and CR/LF. 即[\ \t\f\r\n].
\S	Matches a non-white space character. 即[^\ \t\f\r\n].
\d	Matches a *digit*. 即[0-9].
\d	Matches a nondigit character. 即[^0-9].
\b	Matches a word boundary.
\B	Matches a nonword boundary.
\033	octal char
\x1B	hex char

(e) Examples :

/abc/ => 找到含有abc的字串

/^abc/ => 找到開頭是abc的字串

/abc$/ => 找到結尾是abc的字串

/a|b/ => 找到有a或b的字串，也可以用來找整個字(word)

/ab{2,4}c/ => 找到a後面跟著2-4個b，再跟著c的字串，若只有/ab{2,}c/則會找二個以上的b

/ab*c/ => 找到a後面跟著0個或多個b，再跟著c的字串，如同/ab{0,}c/

/ab+c/ => 找到a後面跟著一個以上的b，再跟著c的字串，如同/ab{1,}c/

/a.c/ => .可以代表任何字元，除了new line字元(\n)外。

/[abc]/ => 找到含有這三個字元中任何一個的字串。

/\d/ => 找到含有數字的字串，如同/[0-9]/

/\w/ => 找到含有字母的字串，如同/[a-zA-Z0-9_]/

/\s/ => 找到含有white space的字串，如同/[ \t\r\n\f]/

/[^abc]/ => 找到沒有abc任一字元的字串

/\*/ => 找到含有字元*的字串，在反斜線"\"後面的字元Perl會把他當作普通字元看待。

若你不確定這個符號是否為特殊字元，乾脆全加上\以策安全。

/abc/i => 忽略abc的大小寫

/(\d+)\.(\d+)\.(\d+)\.(\d+)/

=> 找到類似IP的字串，並將IP的四個數字分別存放在$1,$2,$3$4四個特殊變數中，以便在其後加以利用。

Ex:

if($x =~ /(\d+)\.(\d+)\.(\d+)\.(\d+)/)

{

print "成功大學" if ($1 eq "140.116");

}

m//gimosx => m命令可以讓你自訂pattern的分隔符號，而gimosx則是它的修飾選項，請參考(a)Modifiers。

Ex:

$url="my.machine.tw:8080/noname/test.pl";

($host, $port, $file)=($url=~m|http://([^/:]+):{0,1}(\d*)(\S*)$|);

這個Regular Expression相當複雜，主要目的是分析指定的URL，然後取得host名稱、port號碼及對應的檔案。

我一項一項慢慢解釋：

$url=~m||

m後面跟著的就是分隔符號，||裡面的就是pattern。

([^/:]+)

match一個字串，裡面沒有/和:的字元。找到的字串存放在$1中。

:{0,1}(\d*)

match 0或1個:，後面跟著一串數字或nothing。找到的字串存在$2中，若找不到，$2就是空的。

(\S*)$

match一串非空白字元，並以找到的字串為結尾。找到的字串存在$3中。

()=()

($host, $port, $file)=($1,$2,$3)

即$host="my.machine.tw" $port="8080" $file="/noname/test.pl"

s/PATTERN/REPLACEMENT/egimox

這個取代的命令。它會尋找符合的PATTERN的字串，並取代成REPLACEMENT字串。

它的修飾選項多了e選項，其他跟上面一樣，列表如下：

e	Evaluate the right side as an expression.
g	Replace globally, i.e. all occurrences.
i	Do case-insensitive pattern matching.
m	Treat string as multiple lines.
o	Only compile pattern once.
s	Treat string as single line.
x	Use extended regular expressions.

Ex:

$x =~ s/\s+//g => 把所有的white space全部去除掉

$x =~ s/([^ ]*):*([^ ]*)/$2:$1/ => 把用":"分開的兩個欄位互相對調

$path =~ s|/usr/bin|/usr/local/bin| => 可以自訂分隔符號

tr/SEARCHLIST/REPLACEMENTLIST/cds

這也是取代命令，和上一個不同的是SEARCHLIST和REPLACEMENTLIST只能是普通字串，而不是Regular Expression，

所以速度比較快。它的修飾選項也比較少：

c	Complement the SEARCHLIST
d	Delete found but unreplaced characters.
s	Squash duplicate replaced characters.

Ex:

$x =~ tr/this/that => 把"this"替換成"that"

$x =~ tr/a-z/A-Z/ => 把小寫字母全部替換成大寫字母

$count = $x =~ tr/*/*/ => 計算$x中有幾個"*"

Totui

Totui 發表在痞客邦留言(0) 人氣()

E-mail轉寄

Totui

在追求程式的夢想之下，慢慢地走上一條不歸路

Perl Regular Expression(正規表示式)

留言列表

站方公告

活動快報

【全民...

我的好友

熱門文章

文章分類

Linux (3)

程式 (3)

JavaScript (0)

考試 (0)

最新文章

最新留言

動態訂閱

文章精選

文章搜尋

新聞交換(RSS)

誰來我家

參觀人氣

QR Code

POWERED BY