正则表达式(Regular Expression)是一种强大的文本处理工具,用于匹配、查找、替换和验证字符串。下面我将通过实例来讲解正则表达式的基本语法和常见用法。
1. 基本匹配
正则表达式由普通字符(例如字母、数字)和特殊字符(称为元字符)组成。最简单的正则表达式就是普通字符,例如:
-
正则表达式
hello会匹配字符串中出现的第一个"hello"。
2. 元字符
元字符是正则表达式中具有特殊意义的字符。常见的元字符包括:
-
.匹配任意单个字符(除了换行符) -
^匹配字符串的开始 -
$匹配字符串的结束 -
*匹配前面的子表达式零次或多次 -
+匹配前面的子表达式一次或多次 -
?匹配前面的子表达式零次或一次 -
{n}匹配前面的子表达式恰好n次 -
{n,}匹配前面的子表达式至少n次 -
{n,m}匹配前面的子表达式至少n次,至多m次 -
[]字符集合,匹配所包含的任意一个字符 -
或,匹配两者之一 -
()分组,将多个字符组合成一个单元,并可捕获匹配的文本
3. 转义
如果你想要匹配元字符本身,需要使用反斜杠\进行转义。例如,匹配点号.,需要使用\.。
4. 字符类
用方括号[]表示,匹配括号内的任意一个字符。
-
[abc]匹配a、b或c -
[a-z]匹配任意小写字母 -
[0-9]匹配任意数字 -
[^abc]匹配除了a、b、c之外的任意字符(在字符类中的^表示否定)
5. 预定义字符类
-
\d匹配数字,等价于[0-9] -
\D匹配非数字,等价于[^0-9] -
\w匹配字母、数字、下划线,等价于[A-Za-z0-9_] -
\W匹配非字母、数字、下划线 -
\s匹配空白字符(空格、制表符、换行符等) -
\S匹配非空白字符
6. 实例
实例1:匹配邮箱地址
假设我们要匹配一个简单的邮箱地址,格式为:用户名@域名.后缀
-
用户名:可以包含字母、数字、点、下划线、减号,且不能以点开头或结尾,不能连续两个点。
-
域名:可以包含字母、数字、减号,且不能以减号开头或结尾。
-
后缀:一般为2-5个字母。
正则表达式:^[a-zA-Z0-9]+([._-]?[a-zA-Z0-9]+)*@[a-zA-Z0-9]+(-?[a-zA-Z0-9]+)*(\.[a-zA-Z]{2,5})+$
解释:
-
^表示字符串开始 -
[a-zA-Z0-9]+至少一个字母或数字开头 -
([._-]?[a-zA-Z0-9]+)*接下来可以是零个或多个以下结构:一个可选的点、下划线或减号,然后至少一个字母或数字。注意这里我们使用了?表示前面的字符可选,这样允许中间有点、下划线或减号,但不会连续两个点。 -
@匹配字面量@ -
[a-zA-Z0-9]+域名部分以至少一个字母或数字开头 -
(-?[a-zA-Z0-9]+)*之后可以跟零个或多个:一个可选的减号然后至少一个字母或数字,这样确保域名中可以有减号但不能连续减号或开头结尾有减号。 -
(\.[a-zA-Z]{2,5})+匹配一个点后跟2到5个字母,这个整体可以出现一次或多次(例如:.com、.co.uk) -
$字符串结束
注意:这只是一个简单的示例,实际邮箱地址更复杂,此表达式可能不覆盖所有情况。
实例2:匹配URL
假设我们要匹配HTTP/HTTPS的URL。
正则表达式:^https?://[^\s/$.?#].[^\s]*$
解释:
-
^https?匹配http或https,s可选 -
://匹配字面量 -
[^\s/$.?#]匹配一个不是空白字符、不是/、$、.、?、#的字符,确保域名部分不以这些字符开头 -
[^\s]*匹配任意非空白字符,直到字符串结束
实例3:匹配日期(YYYY-MM-DD)
正则表达式:^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12][0-9]|3[01])$
解释:
-
^\d{4}匹配4位数字的年份 -
-匹配字面量减号 -
(0[1-9]|1[0-2])匹配月份,01-09或10-12 -
-匹配字面量减号 -
(0[1-9]|[12][0-9]|3[01])匹配日期,01-09、10-29、30-31
实例4:匹配手机号码(以中国手机为例)
中国手机号码:1开头,第二位是3-9,后面9位数字。
正则表达式:^1[3-9]\d{9}$
实例5:替换HTML标签
如果我们想移除字符串中所有的HTML标签,可以使用替换功能,将匹配到的标签替换为空字符串。
正则表达式:<[^>]+>
解释:匹配以<开头,后面跟着一个或多个不是>的字符,然后以>结尾。
基本语法
1. 字面量匹配
hello # 匹配 "hello"
test123 # 匹配 "test123"
2. 特殊字符转义
<span class="token special-escape escape">\.</span> # 匹配点号字符
<span class="token special-escape escape">\\</span> # 匹配反斜杠
<span class="token special-escape escape">\$</span> # 匹配美元符号
常用元字符
| 元字符 | 说明 | 示例 |
|---|---|---|
. |
匹配任意单个字符 | a.c 匹配 "abc", "a c", "a-c" |
^ |
匹配字符串开始 | ^Hello 匹配以 "Hello" 开头的字符串 |
$ |
匹配字符串结束 | end$ 匹配以 "end" 结尾的字符串 |
\d |
匹配数字 | \d\d 匹配 "12", "45" 等 |
\w |
匹配字母、数字、下划线 | \w+ 匹配单词 |
\s |
匹配空白字符 | \s+ 匹配空格、制表符等 |
量词
| 量词 | 说明 | 示例 |
|---|---|---|
* |
0次或多次 | ab*c 匹配 "ac", "abc", "abbc" |
+ |
1次或多次 | ab+c 匹配 "abc", "abbc" |
? |
0次或1次 | colou?r 匹配 "color" 或 "colour" |
{n} |
恰好n次 | a{3} 匹配 "aaa" |
{n,} |
至少n次 | a{2,} 匹配 "aa", "aaa" 等 |
{n,m} |
n到m次 | a{2,4} 匹配 "aa", "aaa", "aaaa" |
字符类
1. 自定义字符类
<span class="token char-class"><span class="token char-class-punctuation punctuation">[</span>aeiou<span class="token char-class-punctuation punctuation">]</span></span> # 匹配任意元音字母
<span class="token char-class"><span class="token char-class-punctuation punctuation">[</span><span class="token range">0<span class="token range-punctuation operator">-</span>9</span><span class="token char-class-punctuation punctuation">]</span></span> # 匹配任意数字
<span class="token char-class"><span class="token char-class-punctuation punctuation">[</span><span class="token range">a<span class="token range-punctuation operator">-</span>z</span><span class="token range">A<span class="token range-punctuation operator">-</span>Z</span><span class="token char-class-punctuation punctuation">]</span></span> # 匹配任意字母
2. 排除字符类
<span class="token char-class"><span class="token char-class-punctuation punctuation">[</span><span class="token char-class-negation operator">^</span>aeiou<span class="token char-class-punctuation punctuation">]</span></span> # 匹配非元音字母
<span class="token char-class"><span class="token char-class-punctuation punctuation">[</span><span class="token char-class-negation operator">^</span><span class="token range">0<span class="token range-punctuation operator">-</span>9</span><span class="token char-class-punctuation punctuation">]</span></span> # 匹配非数字字符
分组和捕获
1. 基本分组
<span class="token group punctuation">(</span>abc<span class="token group punctuation">)</span><span class="token quantifier number">+</span> # 匹配 "abc", "abcabc" 等
2. 非捕获分组
<span class="token group punctuation">(?:</span>abc<span class="token group punctuation">)</span><span class="token quantifier number">+</span> # 匹配但不捕获分组
3. 命名捕获组
<span class="token group punctuation">(?<<span class="token group-name variable">year</span>></span><span class="token char-set class-name">\d</span><span class="token quantifier number">{4}</span><span class="token group punctuation">)</span>-<span class="token group punctuation">(?<<span class="token group-name variable">month</span>></span><span class="token char-set class-name">\d</span><span class="token quantifier number">{2}</span><span class="token group punctuation">)</span> # 命名捕获年月
实际应用示例
1. 邮箱验证
<span class="token anchor function">^</span><span class="token char-class"><span class="token char-class-punctuation punctuation">[</span><span class="token range">a<span class="token range-punctuation operator">-</span>z</span><span class="token range">A<span class="token range-punctuation operator">-</span>Z</span><span class="token range">0<span class="token range-punctuation operator">-</span>9</span>._%+-<span class="token char-class-punctuation punctuation">]</span></span><span class="token quantifier number">+</span>@<span class="token char-class"><span class="token char-class-punctuation punctuation">[</span><span class="token range">a<span class="token range-punctuation operator">-</span>z</span><span class="token range">A<span class="token range-punctuation operator">-</span>Z</span><span class="token range">0<span class="token range-punctuation operator">-</span>9</span>.-<span class="token char-class-punctuation punctuation">]</span></span><span class="token quantifier number">+</span><span class="token special-escape escape">\.</span><span class="token char-class"><span class="token char-class-punctuation punctuation">[</span><span class="token range">a<span class="token range-punctuation operator">-</span>z</span><span class="token range">A<span class="token range-punctuation operator">-</span>Z</span><span class="token char-class-punctuation punctuation">]</span></span><span class="token quantifier number">{2,}</span><span class="token anchor function">$</span>
示例匹配:
2. 手机号码验证
<span class="token anchor function">^</span>1<span class="token char-class"><span class="token char-class-punctuation punctuation">[</span><span class="token range">3<span class="token range-punctuation operator">-</span>9</span><span class="token char-class-punctuation punctuation">]</span></span><span class="token char-set class-name">\d</span><span class="token quantifier number">{9}</span><span class="token anchor function">$</span>
示例匹配:
-
13812345678
-
15987654321
3. URL 提取
https<span class="token quantifier number">?</span>://<span class="token char-class"><span class="token char-class-punctuation punctuation">[</span><span class="token char-class-negation operator">^</span><span class="token char-set class-name">\s</span>/$.?#<span class="token char-class-punctuation punctuation">]</span></span><span class="token char-set class-name">.</span><span class="token char-class"><span class="token char-class-punctuation punctuation">[</span><span class="token char-class-negation operator">^</span><span class="token char-set class-name">\s</span><span class="token char-class-punctuation punctuation">]</span></span><span class="token quantifier number">*</span>
示例匹配:
4. 日期匹配
<span class="token char-set class-name">\d</span><span class="token quantifier number">{4}</span>-<span class="token char-set class-name">\d</span><span class="token quantifier number">{2}</span>-<span class="token char-set class-name">\d</span><span class="token quantifier number">{2}</span>
示例匹配:
-
2023-12-25
-
1990-01-01
5. HTML 标签提取
<<span class="token group punctuation">(</span><span class="token char-class"><span class="token char-class-punctuation punctuation">[</span><span class="token range">a<span class="token range-punctuation operator">-</span>z</span><span class="token char-class-punctuation punctuation">]</span></span><span class="token char-class"><span class="token char-class-punctuation punctuation">[</span><span class="token range">a<span class="token range-punctuation operator">-</span>z</span><span class="token range">0<span class="token range-punctuation operator">-</span>9</span><span class="token char-class-punctuation punctuation">]</span></span><span class="token quantifier number">*</span><span class="token group punctuation">)</span><span class="token char-class"><span class="token char-class-punctuation punctuation">[</span><span class="token char-class-negation operator">^</span>><span class="token char-class-punctuation punctuation">]</span></span><span class="token quantifier number">*</span>><span class="token group punctuation">(</span><span class="token char-set class-name">.</span><span class="token quantifier number">*?</span><span class="token group punctuation">)</span></<span class="token backreference keyword">\1</span>>
示例匹配:
-
<div>内容</div> -
<p class="text">段落</p>
6. 密码强度验证
<span class="token anchor function">^</span><span class="token group punctuation">(?=</span><span class="token char-set class-name">.</span><span class="token quantifier number">*</span><span class="token char-class"><span class="token char-class-punctuation punctuation">[</span><span class="token range">a<span class="token range-punctuation operator">-</span>z</span><span class="token char-class-punctuation punctuation">]</span></span><span class="token group punctuation">)</span><span class="token group punctuation">(?=</span><span class="token char-set class-name">.</span><span class="token quantifier number">*</span><span class="token char-class"><span class="token char-class-punctuation punctuation">[</span><span class="token range">A<span class="token range-punctuation operator">-</span>Z</span><span class="token char-class-punctuation punctuation">]</span></span><span class="token group punctuation">)</span><span class="token group punctuation">(?=</span><span class="token char-set class-name">.</span><span class="token quantifier number">*</span><span class="token char-set class-name">\d</span><span class="token group punctuation">)</span><span class="token group punctuation">(?=</span><span class="token char-set class-name">.</span><span class="token quantifier number">*</span><span class="token char-class"><span class="token char-class-punctuation punctuation">[</span>@$!%*?&<span class="token char-class-punctuation punctuation">]</span></span><span class="token group punctuation">)</span><span class="token char-class"><span class="token char-class-punctuation punctuation">[</span><span class="token range">A<span class="token range-punctuation operator">-</span>Z</span><span class="token range">a<span class="token range-punctuation operator">-</span>z</span><span class="token char-set class-name">\d</span>@$!%*?&<span class="token char-class-punctuation punctuation">]</span></span><span class="token quantifier number">{8,}</span><span class="token anchor function">$</span>
要求:
-
至少8个字符
-
包含大小写字母
-
包含数字
-
包含特殊字符
不同语言中的使用
Python 示例
<span class="token comment"># 匹配邮箱</span>
pattern <span class="token operator">=</span> <span class="token string">r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'</span>
email <span class="token operator">=</span> <span class="token string">"user@example.com"</span>
<span class="token keyword">if</span> re<span class="token punctuation">.</span><span class="token keyword">match</span><span class="token punctuation">(</span>pattern<span class="token punctuation">,</span> email<span class="token punctuation">)</span><span class="token punctuation">:</span>
<span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string">"邮箱格式正确"</span><span class="token punctuation">)</span>
<span class="token comment"># 提取所有数字</span>
text <span class="token operator">=</span> <span class="token string">"价格是123元,重量是45公斤"</span>
numbers <span class="token operator">=</span> re<span class="token punctuation">.</span>findall<span class="token punctuation">(</span><span class="token string">r'\d+'</span><span class="token punctuation">,</span> text<span class="token punctuation">)</span>
<span class="token keyword">print</span><span class="token punctuation">(</span>numbers<span class="token punctuation">)</span> <span class="token comment"># ['123', '45']</span>
JavaScript 示例
<span class="token comment">// 验证手机号码</span>
<span class="token keyword">const</span> phonePattern <span class="token operator">=</span> <span class="token regex"><span class="token regex-delimiter">/</span><span class="token regex-source language-regex">^1[3-9]\d{9}$</span><span class="token regex-delimiter">/</span></span><span class="token punctuation">;</span>
<span class="token keyword">const</span> phone <span class="token operator">=</span> <span class="token string">"13812345678"</span><span class="token punctuation">;</span>
<span class="token keyword">if</span> <span class="token punctuation">(</span>phonePattern<span class="token punctuation">.</span><span class="token function">test</span><span class="token punctuation">(</span>phone<span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token punctuation">{</span>
console<span class="token punctuation">.</span><span class="token function">log</span><span class="token punctuation">(</span><span class="token string">"手机号码格式正确"</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>
<span class="token comment">// 替换文本</span>
<span class="token keyword">const</span> text <span class="token operator">=</span> <span class="token string">"今天是2023-12-25"</span><span class="token punctuation">;</span>
<span class="token keyword">const</span> newText <span class="token operator">=</span> text<span class="token punctuation">.</span><span class="token function">replace</span><span class="token punctuation">(</span><span class="token regex"><span class="token regex-delimiter">/</span><span class="token regex-source language-regex">(\d{4})-(\d{2})-(\d{2})</span><span class="token regex-delimiter">/</span></span><span class="token punctuation">,</span> <span class="token string">"$1年$2月$3日"</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
console<span class="token punctuation">.</span><span class="token function">log</span><span class="token punctuation">(</span>newText<span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token comment">// "今天是2023年12月25日"</span>
Java 示例
<span class="token keyword">import</span> <span class="token import"><span class="token namespace">java<span class="token punctuation">.</span>util<span class="token punctuation">.</span>regex<span class="token punctuation">.</span></span><span class="token operator">*</span></span><span class="token punctuation">;</span>
<span class="token keyword">public</span> <span class="token keyword">class</span> <span class="token class-name">RegexExample</span> <span class="token punctuation">{</span>
<span class="token keyword">public</span> <span class="token keyword">static</span> <span class="token keyword">void</span> <span class="token function">main</span><span class="token punctuation">(</span><span class="token class-name">String</span><span class="token punctuation">[</span><span class="token punctuation">]</span> args<span class="token punctuation">)</span> <span class="token punctuation">{</span>
<span class="token comment">// 验证邮箱</span>
<span class="token class-name">String</span> emailPattern <span class="token operator">=</span> <span class="token string">"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$"</span><span class="token punctuation">;</span>
<span class="token class-name">String</span> email <span class="token operator">=</span> <span class="token string">"user@example.com"</span><span class="token punctuation">;</span>
<span class="token keyword">if</span> <span class="token punctuation">(</span>email<span class="token punctuation">.</span><span class="token function">matches</span><span class="token punctuation">(</span>emailPattern<span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token punctuation">{</span>
<span class="token class-name">System</span><span class="token punctuation">.</span>out<span class="token punctuation">.</span><span class="token function">println</span><span class="token punctuation">(</span><span class="token string">"邮箱格式正确"</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>
<span class="token comment">// 提取URL</span>
<span class="token class-name">String</span> text <span class="token operator">=</span> <span class="token string">"访问 https://example.com 获取更多信息"</span><span class="token punctuation">;</span>
<span class="token class-name">Pattern</span> urlPattern <span class="token operator">=</span> <span class="token class-name">Pattern</span><span class="token punctuation">.</span><span class="token function">compile</span><span class="token punctuation">(</span><span class="token string">"https?://[^\\s/$.?#].[^\\s]*"</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token class-name">Matcher</span> matcher <span class="token operator">=</span> urlPattern<span class="token punctuation">.</span><span class="token function">matcher</span><span class="token punctuation">(</span>text<span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token keyword">while</span> <span class="token punctuation">(</span>matcher<span class="token punctuation">.</span><span class="token function">find</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token punctuation">{</span>
<span class="token class-name">System</span><span class="token punctuation">.</span>out<span class="token punctuation">.</span><span class="token function">println</span><span class="token punctuation">(</span><span class="token string">"找到URL: "</span> <span class="token operator">+</span> matcher<span class="token punctuation">.</span><span class="token function">group</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>
<span class="token punctuation">}</span>
<span class="token punctuation">}</span>
实用技巧
-
使用在线测试工具:如 regex101.com 测试和调试正则表达式
-
注释复杂表达式:使用
(?#注释)或x模式添加注释 -
性能优化:避免过度使用回溯,尽量使用具体字符类
-
测试边界情况:确保正则表达式处理各种边界情况
这个教程涵盖了正则表达式的基础知识和常见应用场景。建议在实际使用中多加练习,逐步掌握这个强大的文本处理工具!
声明:本文为原创文章,版权归旷野小站所有,欢迎分享本文,转载请保留出处!