287.Find the Duplicate Number 找到重复的数

难度：Medium

刷题内容

原题链接

中文：https://leetcode-cn.com/problems/find-the-duplicate-number
英文：https://leetcode.com/problems/find-the-duplicate-number

内容描述

给定一个包含 n + 1 个整数的数组 nums，其数字都在 1 到 n 之间（包括 1 和 n），可知至少存在一个重复的整数。假设只有一个重复的整数，找出这个重复的数。

示例 1:
输入: [1,3,4,2,2]
输出: 2

示例 2:
输入: [3,1,3,4,2]
输出: 3

说明：
不能更改原数组（假设数组是只读的）。
只能使用额外的 O(1) 的空间。
时间复杂度小于 O(n2) 。
数组中只有一个重复的数字，但它可能不止重复出现一次。

解题方案

思路 1

实际上，我们在阅读完题目的时候，直观感觉，题目不是很难。但是，在我们看到下面的说明，也就是对我们题目的限制条件的时候，会猛然发现，题目的难度瞬间提升了好多倍。

下面我列出咱们需要注意的点： - 包含 n+1 个整数的数组 - 数组中的数字都在 1 到 n 之间（包括 1 和 n ，但是可以不连续，比如 [1,4,4,3,4]） - 重复的数字，可以重复很多次，并不限于重复 1 次，2次，3次.. - 不能更改原数组（这条是最要命的，原本我的想法是，我们可以排序完成之后，再进行二分查找法，碍于这个规定，无法进行） - 只能用额外的 O(1) 的空间。（O(1) 空间的意思是不会因数组的长度改变而改变，对应本题就是不管我们的数组多长，我们能用的额外控制只能是 m 这个m 是一个确定的数，不会随着 n 的改变而改变。） - 时间的复杂度小于 \(O(n^2)\)

注意的点差不多就是上面所说的了。

这个思路我们使用 二分查找（binary search）+ 鸽笼原理（Pigeonhole Principle）
参考维基百科关于鸽笼原理的词条链接：https://en.wikipedia.org/wiki/Pigeonhole_principle

"不允许修改数组" 与 "常数空间复杂度" 这两个限制条件意味着：禁止排序，并且不能使用 Map 等数据结构

小于 O(n^2) 的运行时间复杂度可以联想到使用二分将其中的一个 n 化简为 log n
可以参考 LeetCode Discuss:https://leetcode.com/discuss/60830/python-solution-explanation-without-changing-input-array

二分枚举答案范围，使用鸽笼原理进行检验

根据鸽笼原理，给定 n+1 个范围为 [1, n]的整数，其中一定存在数字出现至少两次。
假设枚举的数字为 n / 2 ：
遍历数组，若数组中不大于 n / 2 的数字个数超过 n / 2 ，则可以确定 [1, n/2] 范围内一定有解，否则可以确定解落在 (n/2, n]范围内。

也可以这样分析一下：


如果n 是5，那么就会有1 2 3 4 5 一共5个数字的可能，而array size 是6，那么其中一个数字肯定会至少出现两次。

如果没有重复的数字，小于等于1的数字 出现的次数 等于 1；

小于等于2的数字 出现的次数 等于 2；

... 同理3；4；5。

如果有重复的数字，如果重复的是1，那么 小于等于1的数字 出现的次数 肯定大于1；

基于这个理论，我们可以在1 2 3 4 5 选出一个 mid， 遍历array来count 小于等于mid 的数字个数 小于等于 它自己mid 还是 大于 mid？

如果count 小于等于mid， 说明 1 到 mid 这些数字 没有重复项， 重复项在 右半边 mid 到n， 所以缩小到右半边继续搜索；

如果count 大于mid， 说明  1 到 mid 这些数字中 有重复项，缩小到 左半边继续搜索。

class Solution(object):
    def findDuplicate(self, nums):
        """
        :type nums: List[int]
        :rtype: int
        """
        low, high = 1, len(nums) - 1
        while low <= high:
            mid = (low + high) >> 1
            cnt = sum(x <= mid for x in nums)
            if cnt > mid:
                high = mid - 1
            else:
                low = mid + 1
        return low

s = Solution()
nums = [1,2,3,3]
print(s.findDuplicate(nums))

思路 1 的解法是时间复杂度为 \(O(nlogn)\) 的解法，思路 2 则是时间复杂度为 \(O(n)\) ，但是相对来说，是投机取巧了些。

思路 2

一次遍历统计，另一次遍历输出即可。

class Solution(object):
    def findDuplicate(self, nums):
        """
        :type nums: List[int]
        :rtype: int
        """
        dic = dict()
        for n in nums:
            dic[n] = dic.get(n, 0) + 1
            if dic[n] >= 2:
                return n

s = Solution()
nums = [1,2,3,3]
print(s.findDuplicate(nums))

思路 3

思路 3 则是更加完整的 \(O(n)\) 的解法，现在我还没有完全搞懂，我先写在下面吧，大佬们可以提前学习了解。

class Solution(object):
    def findDuplicate(self, nums):
        # The "tortoise and hare" step.  We start at the end of the array and try
        # to find an intersection point in the cycle.
        slow = 0
        fast = 0

        # Keep advancing 'slow' by one step and 'fast' by two steps until they
        # meet inside the loop.
        while True:
            slow = nums[slow]
            fast = nums[nums[fast]]

            if slow == fast:
                break

        # Start up another pointer from the end of the array and march it forward
        # until it hits the pointer inside the array.
        finder = 0
        while True:
            slow   = nums[slow]
            finder = nums[finder]

            # If the two hit, the intersection index is the duplicate element.
            if slow == finder:
                return slow

s = Solution()
nums = [1,2,3,3]
print(s.findDuplicate(nums))

我们一直在努力

apachecn/AiLearning