ky818smKy818sm  2025-10-08 11:11 旷野小站 隐藏边栏 |   抢沙发  0 
文章评分 0 次,平均分 0.0

正则表达式(Regular Expression)是一种强大的文本处理工具,用于匹配、查找、替换和验证字符串。下面我将通过实例来讲解正则表达式的基本语法和常见用法。

1. 基本匹配

正则表达式由普通字符(例如字母、数字)和特殊字符(称为元字符)组成。最简单的正则表达式就是普通字符,例如:

  • 正则表达式 hello 会匹配字符串中出现的第一个"hello"。

2. 元字符

元字符是正则表达式中具有特殊意义的字符。常见的元字符包括:

  • . 匹配任意单个字符(除了换行符)

  • ^ 匹配字符串的开始

  • $ 匹配字符串的结束

  • * 匹配前面的子表达式零次或多次

  • + 匹配前面的子表达式一次或多次

  • ? 匹配前面的子表达式零次或一次

  • {n} 匹配前面的子表达式恰好n次

  • {n,} 匹配前面的子表达式至少n次

  • {n,m} 匹配前面的子表达式至少n次,至多m次

  • [] 字符集合,匹配所包含的任意一个字符

  •  或,匹配两者之一

  • () 分组,将多个字符组合成一个单元,并可捕获匹配的文本

3. 转义

如果你想要匹配元字符本身,需要使用反斜杠\进行转义。例如,匹配点号.,需要使用\.

4. 字符类

用方括号[]表示,匹配括号内的任意一个字符。

  • [abc] 匹配a、b或c

  • [a-z] 匹配任意小写字母

  • [0-9] 匹配任意数字

  • [^abc] 匹配除了a、b、c之外的任意字符(在字符类中的^表示否定)

5. 预定义字符类

  • \d 匹配数字,等价于[0-9]

  • \D 匹配非数字,等价于[^0-9]

  • \w 匹配字母、数字、下划线,等价于[A-Za-z0-9_]

  • \W 匹配非字母、数字、下划线

  • \s 匹配空白字符(空格、制表符、换行符等)

  • \S 匹配非空白字符

6. 实例

实例1:匹配邮箱地址

假设我们要匹配一个简单的邮箱地址,格式为:用户名@域名.后缀

  • 用户名:可以包含字母、数字、点、下划线、减号,且不能以点开头或结尾,不能连续两个点。

  • 域名:可以包含字母、数字、减号,且不能以减号开头或结尾。

  • 后缀:一般为2-5个字母。

正则表达式:^[a-zA-Z0-9]+([._-]?[a-zA-Z0-9]+)*@[a-zA-Z0-9]+(-?[a-zA-Z0-9]+)*(\.[a-zA-Z]{2,5})+$

解释:

  • ^ 表示字符串开始

  • [a-zA-Z0-9]+ 至少一个字母或数字开头

  • ([._-]?[a-zA-Z0-9]+)* 接下来可以是零个或多个以下结构:一个可选的点、下划线或减号,然后至少一个字母或数字。注意这里我们使用了?表示前面的字符可选,这样允许中间有点、下划线或减号,但不会连续两个点。

  • @ 匹配字面量@

  • [a-zA-Z0-9]+ 域名部分以至少一个字母或数字开头

  • (-?[a-zA-Z0-9]+)* 之后可以跟零个或多个:一个可选的减号然后至少一个字母或数字,这样确保域名中可以有减号但不能连续减号或开头结尾有减号。

  • (\.[a-zA-Z]{2,5})+ 匹配一个点后跟2到5个字母,这个整体可以出现一次或多次(例如:.com、.co.uk

  • $ 字符串结束

注意:这只是一个简单的示例,实际邮箱地址更复杂,此表达式可能不覆盖所有情况。

实例2:匹配URL

假设我们要匹配HTTP/HTTPS的URL。

正则表达式:^https?://[^\s/$.?#].[^\s]*$

解释:

  • ^https? 匹配http或https,s可选

  • :// 匹配字面量

  • [^\s/$.?#] 匹配一个不是空白字符、不是/、$、.、?、#的字符,确保域名部分不以这些字符开头

  • [^\s]* 匹配任意非空白字符,直到字符串结束

实例3:匹配日期(YYYY-MM-DD)

正则表达式:^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12][0-9]|3[01])$

解释:

  • ^\d{4} 匹配4位数字的年份

  • - 匹配字面量减号

  • (0[1-9]|1[0-2]) 匹配月份,01-09或10-12

  • - 匹配字面量减号

  • (0[1-9]|[12][0-9]|3[01]) 匹配日期,01-09、10-29、30-31

实例4:匹配手机号码(以中国手机为例)

中国手机号码:1开头,第二位是3-9,后面9位数字。

正则表达式:^1[3-9]\d{9}$

实例5:替换HTML标签

如果我们想移除字符串中所有的HTML标签,可以使用替换功能,将匹配到的标签替换为空字符串。

正则表达式:<[^>]+>

解释:匹配以<开头,后面跟着一个或多个不是>的字符,然后以>结尾。

基本语法

1. 字面量匹配

hello        # 匹配 "hello"
test123      # 匹配 "test123"

2. 特殊字符转义

<span class="token special-escape escape">\.</span>           # 匹配点号字符
<span class="token special-escape escape">\\</span>           # 匹配反斜杠
<span class="token special-escape escape">\$</span>           # 匹配美元符号

常用元字符

元字符 说明 示例
. 匹配任意单个字符 a.c 匹配 "abc", "a c", "a-c"
^ 匹配字符串开始 ^Hello 匹配以 "Hello" 开头的字符串
$ 匹配字符串结束 end$ 匹配以 "end" 结尾的字符串
\d 匹配数字 \d\d 匹配 "12", "45" 等
\w 匹配字母、数字、下划线 \w+ 匹配单词
\s 匹配空白字符 \s+ 匹配空格、制表符等

量词

量词 说明 示例
* 0次或多次 ab*c 匹配 "ac", "abc", "abbc"
+ 1次或多次 ab+c 匹配 "abc", "abbc"
? 0次或1次 colou?r 匹配 "color" 或 "colour"
{n} 恰好n次 a{3} 匹配 "aaa"
{n,} 至少n次 a{2,} 匹配 "aa", "aaa" 等
{n,m} n到m次 a{2,4} 匹配 "aa", "aaa", "aaaa"

字符类

1. 自定义字符类

<span class="token char-class"><span class="token char-class-punctuation punctuation">[</span>aeiou<span class="token char-class-punctuation punctuation">]</span></span>      # 匹配任意元音字母
<span class="token char-class"><span class="token char-class-punctuation punctuation">[</span><span class="token range">0<span class="token range-punctuation operator">-</span>9</span><span class="token char-class-punctuation punctuation">]</span></span>        # 匹配任意数字
<span class="token char-class"><span class="token char-class-punctuation punctuation">[</span><span class="token range">a<span class="token range-punctuation operator">-</span>z</span><span class="token range">A<span class="token range-punctuation operator">-</span>Z</span><span class="token char-class-punctuation punctuation">]</span></span>     # 匹配任意字母

2. 排除字符类

<span class="token char-class"><span class="token char-class-punctuation punctuation">[</span><span class="token char-class-negation operator">^</span>aeiou<span class="token char-class-punctuation punctuation">]</span></span>     # 匹配非元音字母
<span class="token char-class"><span class="token char-class-punctuation punctuation">[</span><span class="token char-class-negation operator">^</span><span class="token range">0<span class="token range-punctuation operator">-</span>9</span><span class="token char-class-punctuation punctuation">]</span></span>       # 匹配非数字字符

分组和捕获

1. 基本分组

<span class="token group punctuation">(</span>abc<span class="token group punctuation">)</span><span class="token quantifier number">+</span>       # 匹配 "abc", "abcabc" 等

2. 非捕获分组

<span class="token group punctuation">(?:</span>abc<span class="token group punctuation">)</span><span class="token quantifier number">+</span>     # 匹配但不捕获分组

3. 命名捕获组

<span class="token group punctuation">(?<<span class="token group-name variable">year</span>></span><span class="token char-set class-name">\d</span><span class="token quantifier number">{4}</span><span class="token group punctuation">)</span>-<span class="token group punctuation">(?<<span class="token group-name variable">month</span>></span><span class="token char-set class-name">\d</span><span class="token quantifier number">{2}</span><span class="token group punctuation">)</span>  # 命名捕获年月

实际应用示例

1. 邮箱验证

<span class="token anchor function">^</span><span class="token char-class"><span class="token char-class-punctuation punctuation">[</span><span class="token range">a<span class="token range-punctuation operator">-</span>z</span><span class="token range">A<span class="token range-punctuation operator">-</span>Z</span><span class="token range">0<span class="token range-punctuation operator">-</span>9</span>._%+-<span class="token char-class-punctuation punctuation">]</span></span><span class="token quantifier number">+</span>@<span class="token char-class"><span class="token char-class-punctuation punctuation">[</span><span class="token range">a<span class="token range-punctuation operator">-</span>z</span><span class="token range">A<span class="token range-punctuation operator">-</span>Z</span><span class="token range">0<span class="token range-punctuation operator">-</span>9</span>.-<span class="token char-class-punctuation punctuation">]</span></span><span class="token quantifier number">+</span><span class="token special-escape escape">\.</span><span class="token char-class"><span class="token char-class-punctuation punctuation">[</span><span class="token range">a<span class="token range-punctuation operator">-</span>z</span><span class="token range">A<span class="token range-punctuation operator">-</span>Z</span><span class="token char-class-punctuation punctuation">]</span></span><span class="token quantifier number">{2,}</span><span class="token anchor function">$</span>

示例匹配:

2. 手机号码验证

<span class="token anchor function">^</span>1<span class="token char-class"><span class="token char-class-punctuation punctuation">[</span><span class="token range">3<span class="token range-punctuation operator">-</span>9</span><span class="token char-class-punctuation punctuation">]</span></span><span class="token char-set class-name">\d</span><span class="token quantifier number">{9}</span><span class="token anchor function">$</span>

示例匹配:

  • 13812345678

  • 15987654321

3. URL 提取

https<span class="token quantifier number">?</span>://<span class="token char-class"><span class="token char-class-punctuation punctuation">[</span><span class="token char-class-negation operator">^</span><span class="token char-set class-name">\s</span>/$.?#<span class="token char-class-punctuation punctuation">]</span></span><span class="token char-set class-name">.</span><span class="token char-class"><span class="token char-class-punctuation punctuation">[</span><span class="token char-class-negation operator">^</span><span class="token char-set class-name">\s</span><span class="token char-class-punctuation punctuation">]</span></span><span class="token quantifier number">*</span>

示例匹配:

4. 日期匹配

<span class="token char-set class-name">\d</span><span class="token quantifier number">{4}</span>-<span class="token char-set class-name">\d</span><span class="token quantifier number">{2}</span>-<span class="token char-set class-name">\d</span><span class="token quantifier number">{2}</span>

示例匹配:

  • 2023-12-25

  • 1990-01-01

5. HTML 标签提取

<<span class="token group punctuation">(</span><span class="token char-class"><span class="token char-class-punctuation punctuation">[</span><span class="token range">a<span class="token range-punctuation operator">-</span>z</span><span class="token char-class-punctuation punctuation">]</span></span><span class="token char-class"><span class="token char-class-punctuation punctuation">[</span><span class="token range">a<span class="token range-punctuation operator">-</span>z</span><span class="token range">0<span class="token range-punctuation operator">-</span>9</span><span class="token char-class-punctuation punctuation">]</span></span><span class="token quantifier number">*</span><span class="token group punctuation">)</span><span class="token char-class"><span class="token char-class-punctuation punctuation">[</span><span class="token char-class-negation operator">^</span>><span class="token char-class-punctuation punctuation">]</span></span><span class="token quantifier number">*</span>><span class="token group punctuation">(</span><span class="token char-set class-name">.</span><span class="token quantifier number">*?</span><span class="token group punctuation">)</span></<span class="token backreference keyword">\1</span>>

示例匹配:

  • <div>内容</div>

  • <p class="text">段落</p>

6. 密码强度验证

<span class="token anchor function">^</span><span class="token group punctuation">(?=</span><span class="token char-set class-name">.</span><span class="token quantifier number">*</span><span class="token char-class"><span class="token char-class-punctuation punctuation">[</span><span class="token range">a<span class="token range-punctuation operator">-</span>z</span><span class="token char-class-punctuation punctuation">]</span></span><span class="token group punctuation">)</span><span class="token group punctuation">(?=</span><span class="token char-set class-name">.</span><span class="token quantifier number">*</span><span class="token char-class"><span class="token char-class-punctuation punctuation">[</span><span class="token range">A<span class="token range-punctuation operator">-</span>Z</span><span class="token char-class-punctuation punctuation">]</span></span><span class="token group punctuation">)</span><span class="token group punctuation">(?=</span><span class="token char-set class-name">.</span><span class="token quantifier number">*</span><span class="token char-set class-name">\d</span><span class="token group punctuation">)</span><span class="token group punctuation">(?=</span><span class="token char-set class-name">.</span><span class="token quantifier number">*</span><span class="token char-class"><span class="token char-class-punctuation punctuation">[</span>@$!%*?&<span class="token char-class-punctuation punctuation">]</span></span><span class="token group punctuation">)</span><span class="token char-class"><span class="token char-class-punctuation punctuation">[</span><span class="token range">A<span class="token range-punctuation operator">-</span>Z</span><span class="token range">a<span class="token range-punctuation operator">-</span>z</span><span class="token char-set class-name">\d</span>@$!%*?&<span class="token char-class-punctuation punctuation">]</span></span><span class="token quantifier number">{8,}</span><span class="token anchor function">$</span>

要求:

  • 至少8个字符

  • 包含大小写字母

  • 包含数字

  • 包含特殊字符

不同语言中的使用

Python 示例

<span class="token comment"># 匹配邮箱</span>
pattern <span class="token operator">=</span> <span class="token string">r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'</span>
email <span class="token operator">=</span> <span class="token string">"user@example.com"</span>

<span class="token keyword">if</span> re<span class="token punctuation">.</span><span class="token keyword">match</span><span class="token punctuation">(</span>pattern<span class="token punctuation">,</span> email<span class="token punctuation">)</span><span class="token punctuation">:</span>
    <span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string">"邮箱格式正确"</span><span class="token punctuation">)</span>

<span class="token comment"># 提取所有数字</span>
text <span class="token operator">=</span> <span class="token string">"价格是123元,重量是45公斤"</span>
numbers <span class="token operator">=</span> re<span class="token punctuation">.</span>findall<span class="token punctuation">(</span><span class="token string">r'\d+'</span><span class="token punctuation">,</span> text<span class="token punctuation">)</span>
<span class="token keyword">print</span><span class="token punctuation">(</span>numbers<span class="token punctuation">)</span>  <span class="token comment"># ['123', '45']</span>

JavaScript 示例

<span class="token comment">// 验证手机号码</span>
<span class="token keyword">const</span> phonePattern <span class="token operator">=</span> <span class="token regex"><span class="token regex-delimiter">/</span><span class="token regex-source language-regex">^1[3-9]\d{9}$</span><span class="token regex-delimiter">/</span></span><span class="token punctuation">;</span>
<span class="token keyword">const</span> phone <span class="token operator">=</span> <span class="token string">"13812345678"</span><span class="token punctuation">;</span>

<span class="token keyword">if</span> <span class="token punctuation">(</span>phonePattern<span class="token punctuation">.</span><span class="token function">test</span><span class="token punctuation">(</span>phone<span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token punctuation">{</span>
    console<span class="token punctuation">.</span><span class="token function">log</span><span class="token punctuation">(</span><span class="token string">"手机号码格式正确"</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>

<span class="token comment">// 替换文本</span>
<span class="token keyword">const</span> text <span class="token operator">=</span> <span class="token string">"今天是2023-12-25"</span><span class="token punctuation">;</span>
<span class="token keyword">const</span> newText <span class="token operator">=</span> text<span class="token punctuation">.</span><span class="token function">replace</span><span class="token punctuation">(</span><span class="token regex"><span class="token regex-delimiter">/</span><span class="token regex-source language-regex">(\d{4})-(\d{2})-(\d{2})</span><span class="token regex-delimiter">/</span></span><span class="token punctuation">,</span> <span class="token string">"$1年$2月$3日"</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
console<span class="token punctuation">.</span><span class="token function">log</span><span class="token punctuation">(</span>newText<span class="token punctuation">)</span><span class="token punctuation">;</span>  <span class="token comment">// "今天是2023年12月25日"</span>

Java 示例

<span class="token keyword">import</span> <span class="token import"><span class="token namespace">java<span class="token punctuation">.</span>util<span class="token punctuation">.</span>regex<span class="token punctuation">.</span></span><span class="token operator">*</span></span><span class="token punctuation">;</span>

<span class="token keyword">public</span> <span class="token keyword">class</span> <span class="token class-name">RegexExample</span> <span class="token punctuation">{</span>
    <span class="token keyword">public</span> <span class="token keyword">static</span> <span class="token keyword">void</span> <span class="token function">main</span><span class="token punctuation">(</span><span class="token class-name">String</span><span class="token punctuation">[</span><span class="token punctuation">]</span> args<span class="token punctuation">)</span> <span class="token punctuation">{</span>
        <span class="token comment">// 验证邮箱</span>
        <span class="token class-name">String</span> emailPattern <span class="token operator">=</span> <span class="token string">"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$"</span><span class="token punctuation">;</span>
        <span class="token class-name">String</span> email <span class="token operator">=</span> <span class="token string">"user@example.com"</span><span class="token punctuation">;</span>
        
        <span class="token keyword">if</span> <span class="token punctuation">(</span>email<span class="token punctuation">.</span><span class="token function">matches</span><span class="token punctuation">(</span>emailPattern<span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token punctuation">{</span>
            <span class="token class-name">System</span><span class="token punctuation">.</span>out<span class="token punctuation">.</span><span class="token function">println</span><span class="token punctuation">(</span><span class="token string">"邮箱格式正确"</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token punctuation">}</span>
        
        <span class="token comment">// 提取URL</span>
        <span class="token class-name">String</span> text <span class="token operator">=</span> <span class="token string">"访问 https://example.com 获取更多信息"</span><span class="token punctuation">;</span>
        <span class="token class-name">Pattern</span> urlPattern <span class="token operator">=</span> <span class="token class-name">Pattern</span><span class="token punctuation">.</span><span class="token function">compile</span><span class="token punctuation">(</span><span class="token string">"https?://[^\\s/$.?#].[^\\s]*"</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token class-name">Matcher</span> matcher <span class="token operator">=</span> urlPattern<span class="token punctuation">.</span><span class="token function">matcher</span><span class="token punctuation">(</span>text<span class="token punctuation">)</span><span class="token punctuation">;</span>
        
        <span class="token keyword">while</span> <span class="token punctuation">(</span>matcher<span class="token punctuation">.</span><span class="token function">find</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token punctuation">{</span>
            <span class="token class-name">System</span><span class="token punctuation">.</span>out<span class="token punctuation">.</span><span class="token function">println</span><span class="token punctuation">(</span><span class="token string">"找到URL: "</span> <span class="token operator">+</span> matcher<span class="token punctuation">.</span><span class="token function">group</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token punctuation">}</span>
    <span class="token punctuation">}</span>
<span class="token punctuation">}</span>

实用技巧

  1. 使用在线测试工具:如 regex101.com 测试和调试正则表达式

  2. 注释复杂表达式:使用 (?#注释) 或 x 模式添加注释

  3. 性能优化:避免过度使用回溯,尽量使用具体字符类

  4. 测试边界情况:确保正则表达式处理各种边界情况

这个教程涵盖了正则表达式的基础知识和常见应用场景。建议在实际使用中多加练习,逐步掌握这个强大的文本处理工具!

声明:本站许多内容均从网上收集整理,若有内容侵犯到您的权益,请通过邮件【6167555@qq.com】联系本站,我们将及时删除!

有问题请点我联系站长

「点点赞赏,手留余香」
HIDE

声明:本文为原创文章,版权归所有,欢迎分享本文,转载请保留出处!

发表评论

表情 格式 链接 私密 签到
扫一扫二维码分享