692-前K个高频单词(Top K Frequent Words)

原文链接: https://leetcode-cn.com/problems/top-k-frequent-words

英文原文

Given an array of strings words and an integer k, return the k most frequent strings.

Return the answer sorted by the frequency from highest to lowest. Sort the words with the same frequency by their lexicographical order.

Example 1:

Input: words = ["i","love","leetcode","i","love","coding"], k = 2
Output: ["i","love"]
Explanation: "i" and "love" are the two most frequent words.
Note that "i" comes before "love" due to a lower alphabetical order.

Example 2:

Input: words = ["the","day","is","sunny","the","the","the","sunny","is","is"], k = 4
Output: ["the","is","sunny","day"]
Explanation: "the", "is", "sunny" and "day" are the four most frequent words, with the number of occurrence being 4, 3, 2 and 1 respectively.

Constraints:

1 <= words.length <= 500
1 <= words[i] <= 10
words[i] consists of lowercase English letters.
k is in the range [1, The number of unique words[i]]

Follow-up: Could you solve it in O(n log(k)) time and O(n) extra space?

中文题目

给一非空的单词列表，返回前 k 个出现次数最多的单词。

返回的答案应该按单词出现频率由高到低排序。如果不同的单词有相同出现频率，按字母顺序排序。

示例 1：

输入: ["i", "love", "leetcode", "i", "love", "coding"], k = 2
输出: ["i", "love"]
解析: "i" 和 "love" 为出现次数最多的两个单词，均为2次。
    注意，按字母顺序 "i" 在 "love" 之前。

示例 2：

输入: ["the", "day", "is", "sunny", "the", "the", "the", "sunny", "is", "is"], k = 4
输出: ["the", "is", "sunny", "day"]
解析: "the", "is", "sunny" 和 "day" 是出现次数最多的四个单词，
    出现次数依次为 4, 3, 2 和 1 次。

注意：

假定 k 总为有效值， 1 ≤ k ≤ 集合元素数。
输入的单词均由小写字母组成。

扩展练习：

尝试以 O(n log k) 时间复杂度和 O(n) 空间复杂度解决。

通过代码

高赞题解

哈希表 & 优先队列（堆）

这道题是在「优先队列（堆）」裸题的基础上增加了字典序大小的比较。

相应的，我们不能只根据「词频大小」构建小根堆来获取前 $k$ 个元素，还需要结合字典序大小来做。

具体的，我们可以使用「哈希表」&「优先队列」进行求解：

使用「哈希表」来统计所有的词频
构建大小为 $k$ 按照「词频升序 + (词频相同)字典序倒序」的优先队列：
- 如果词频不相等，根据词频进行升序构建，确保堆顶元素是堆中词频最小的元素
- 如果词频相等，根据字典序大小进行倒序构建，结合 $2.1$ 可以确保堆顶元素是堆中「词频最小 & 字典序最大」的元素
对所有元素进行遍历，尝试入堆：
- 堆内元素不足 $k$ 个：直接入堆
- 词频大于堆顶元素：堆顶元素不可能是前 $k$ 大的元素。将堆顶元素弹出，并将当前元素添加到堆中
- 词频小于堆顶元素；当前元素不可能是前 $k$ 大的元素，直接丢弃。
- 词频等于堆顶元素：根据当前元素与堆顶元素的字典序大小决定（如果字典序大小比堆顶元素要小则入堆）
输出堆内元素，并翻转

代码：

[]
class Solution {
    public List<String> topKFrequent(String[] ws, int k) {
        Map<String, Integer> map = new HashMap<>();
        for (String w : ws) map.put(w, map.getOrDefault(w, 0) + 1);
        PriorityQueue<Object[]> q = new PriorityQueue<>(k, (a, b)->{ 
            // 如果词频不同，根据词频升序
            int c1 = (Integer)a[0], c2 = (Integer)b[0];
            if (c1 != c2) return c1 - c2;
            // 如果词频相同，根据字典序倒序
            String s1 = (String)a[1], s2 = (String)b[1];
            return s2.compareTo(s1);
        });
        for (String s : map.keySet()) {
            int cnt = map.get(s);
            if (q.size() < k) { // 不足 k 个，直接入堆
                q.add(new Object[]{cnt, s});
            } else {
                Object[] peek = q.peek();
                if (cnt > (Integer)peek[0]) { // 词频比堆顶元素大，弹出堆顶元素，入堆
                    q.poll();
                    q.add(new Object[]{cnt, s});
                } else if (cnt == (Integer)peek[0]) { // 词频与堆顶元素相同
                    String top = (String)peek[1];
                    if (s.compareTo(top) < 0) { // 且字典序大小比堆顶元素小，弹出堆顶元素，入堆
                        q.poll();
                        q.add(new Object[]{cnt, s});
                    }
                }
            }
        }
        List<String> ans = new ArrayList<>();
        while (!q.isEmpty()) ans.add((String)q.poll()[1]);
        Collections.reverse(ans);
        return ans;
    }
}

时间复杂度：使用哈希表统计词频，复杂度为 $O(n)$；使用最多 $n$ 个元素维护一个大小为 $k$ 的堆，复杂度为 $O(n\log{k})$；输出答案复杂度为 $O(k)$（同时 $k \leq n$）。整体复杂度为 $O(n\log{k})$
空间复杂度：$O(n)$

统计信息

通过次数	提交次数	AC比率
70779	122879	57.6%

提交历史

提交时间	提交结果	执行时间	内存消耗	语言

相似题目

题目	难度
前 K 个高频元素	中等
最接近原点的 K 个点	中等

凤起丶宛南

从来没有真正的绝境，只有心灵的迷途

英文原文

中文题目

通过代码

高赞题解

哈希表 & 优先队列（堆）

统计信息

提交历史

相似题目

你的赏识是我前进的动力

英文原文

中文题目

通过代码

高赞题解

哈希表 & 优先队列（堆）

统计信息

提交历史

相似题目

你的赏识是我前进的动力

微信扫一扫：分享