Skip to content

010 regular expression matching

010. Regular Expression Matching

题目: https://leetcode.com/problems/regular-expression-matching/

难度:

Hard

先尝试暴力解法,难点就在 * 身上, * 不会单独出现,它一定是和前面一个字母或"."配成一对。看成一对后"X*",它的性质就是:要不匹配0个,要不匹配连续的“X”.所以尝试暴力解法的时候一个trick是从后往前匹配.

暴力解法居然也能AC?

是这样来分情况看得:

  • 如果s[i] = p[j] 或者 p[j]= . : 往前匹配一位
  • 如果p[j] = ' * ', 检查一下,如果这个时候p[j-1] = . 或者p[j-1] = s[i] ,那么就往前匹配,如果这样能匹配过,就return True, 否者我们忽略 ' X* ',这里注意里面的递推关系
  • 再处理一下边界状况:
    • s已经匹配完了, 如果此时p还有,那么如果剩下的是 X* 这种可以过,所以检查
    • p匹配完毕,如果s还有那么报错
class Solution(object):
    def isMatch(self, s, p):
        """
        :type s: str
        :type p: str
        :rtype: bool
        """
        def helper(s, i, p, j):
            if j == -1:
                return i == -1
            if i == -1:
                if p[j] != '*':
                    return False
                return helper(s, i, p, j-2)
            if p[j] == '*':
                if p[j-1] == '.' or p[j-1] == s[i]:
                    if helper(s, i-1, p, j):
                        return True
                return helper(s, i, p, j-2)
            if p[j] == '.' or p[j] == s[i]:
                return helper(s, i-1, p, j-1)
            return False

        return helper(s, len(s)-1, p, len(p)-1)

dp优化,感觉和edit distance很像。 DP优化待代码化,感觉学DP的一个重点除了递归学好以外,另一点是一定要会画表格。

画一个表格来看一下状况

            c   *   a   *   b
        0   1   2   3   4   5
    0   1   0   1   0   1   0       
a   1   0   0   0   1   1   0                   
a   2   0   0   0   0   1   0                   
b   3   0   0   0   0   0   1           

这里有几个取巧/容易出问题的敌方,这里画的表用的是1-based string。一上来,做的事包括:

  • 初始化,空字符匹配:dp[0][0] =1
  • 第一行,c 可以匹配空字符,c a* 可以匹配空字符,p[j-1] != s[i],匹配空字符
  • 然后进入第二行再来看,实际上我们可以看到,如果没有碰到 * 匹配还是很朴素的,但是碰到 * :
    • 1这个匹配可以从左侧传来,dp[i][j] = dp[i][j-1],that is 匹配 1个
    • 1 也可以有上方传来,这种情况是p[j-1] = s[i],匹配多个 dp[i][j] = dp[i-1][j]
    • 1 这个匹配也可以从间隔一个的左侧传来,that is也可以有个性的匹配0个,如同匹配空字符一样dp[i][j] = dp[i][j-2],但是注意匹配0个实际上有两种状况,如果p[j-1]!=s[i],强制匹配0个,即使p[j-1] == s[i],我们也可以傲娇的用它来匹配0个。

再代码化一点:

  • s[i] == p[j] 或者 p[j] == '.' : dp[i][j] = dp[i-1][j-1]
  • p[j] == '*': 然后分几种情况
    • p[j-1] != s[i] : dp[i][j] = dp[i][j-2] 匹配0个的状况
    • p[j-1] == s[i] or p[i-1] == '.':
      • dp[i][j] = dp[i-1][j] 匹配多个s[i]
      • dp[i][j] = dp[i][j-2] 匹配0个

AC代码,注意一下,因为上表为了表达方便,用的是1-based string系统,实际写代码的时候我们心里还是清楚这个string还是从0开始的,不过也可以尝试往前面添东西来方便。

AC代码

class Solution(object):
    def isMatch(self, s, p):
        """
        :type s: str
        :type p: str
        :rtype: bool
        """
        m, n = len(s), len(p)
        dp = [ [0 for i in range(n+1)] for j in range(m+1)]

        dp[0][0] = 1

        # init the first line
        for i in range(2,n+1):
            if p[i-1] == '*':
                dp[0][i] = dp[0][i-2]

        for i in range(1,m+1):
            for j in range(1,n+1):
                if p[j-1] == '*':
                    if p[j-2] != s[i-1] and p[j-2] != '.':
                        dp[i][j] = dp[i][j-2]
                    elif p[j-2] == s[i-1] or p[j-2] == '.':
                        dp[i][j] = dp[i-1][j] or dp[i][j-2]

                elif s[i-1] == p[j-1] or p[j-1] == '.':
                    dp[i][j] = dp[i-1][j-1]

        return dp[m][n] == 1 

写个测试案例

import unittest
class Solution(object):
    def isMatch(self, s, p):
        """
        :type s: str
        :type p: str
        :rtype: bool
        """
        m, n = len(s), len(p)
        dp = [ [0 for i in range(n+1)] for j in range(m+1)]

        dp[0][0] = 1

        # init the first line
        for i in range(2,n+1):
            if p[i-1] == '*':
                dp[0][i] = dp[0][i-2]

        for i in range(1,m+1):
            for j in range(1,n+1):
                if p[j-1] == '*':
                    if p[j-2] != s[i-1] and p[j-2] != '.':
                        dp[i][j] = dp[i][j-2]
                    elif p[j-2] == s[i-1] or p[j-2] == '.':
                        dp[i][j] = dp[i-1][j] or dp[i][j-2]

                elif s[i-1] == p[j-1] or p[j-1] == '.':
                    dp[i][j] = dp[i-1][j-1]

        return dp[m][n] == 1


class TestSolution(unittest.TestCase):
    def test_none_0(self):
        s = ""
        p = ""
        self.assertTrue(Solution().isMatch(s, p))

    def test_none_1(self):
        s = ""
        p = "a"
        self.assertFalse(Solution().isMatch(s, p))

    def test_no_symbol_equal(self):
        s = "abcd"
        p = "abcd"
        self.assertTrue(Solution().isMatch(s, p))

    def test_no_symbol_not_equal_0(self):
        s = "abcd"
        p = "efgh"
        self.assertFalse(Solution().isMatch(s, p))

    def test_no_symbol_not_equal_1(self):
        s = "ab"
        p = "abb"
        self.assertFalse(Solution().isMatch(s, p))

    def test_symbol_0(self):
        s = ""
        p = "a*"
        self.assertTrue(Solution().isMatch(s, p))

    def test_symbol_1(self):
        s = "a"
        p = "ab*"
        self.assertTrue(Solution().isMatch(s, p))

    def test_symbol_2(self):
        # E.g.
        #   s a b b
        # p 1 0 0 0
        # a 0 1 0 0
        # b 0 0 1 0
        # * 0 1 1 1
        s = "abb"
        p = "ab*"
        self.assertTrue(Solution().isMatch(s, p))


if __name__ == "__main__":
    unittest.main()



输出:
........

Ran 8 tests in 0.001s

OK

参考:

动态规划



回到顶部