加载中...
面试题 17.13-恢复空格(Re-Space LCCI)
发表于:2021-12-03 | 分类: 中等
字数统计: 1.5k | 阅读时长: 6分钟 | 阅读量:

原文链接: https://leetcode-cn.com/problems/re-space-lcci

英文原文

Oh, no! You have accidentally removed all spaces, punctuation, and capitalization in a lengthy document. A sentence like "I reset the computer. It still didn't boot!" became "iresetthecomputeritstilldidntboot''. You'll deal with the punctuation and capi­talization later; right now you need to re-insert the spaces. Most of the words are in a dictionary but a few are not. Given a dictionary (a list of strings) and the document (a string), design an algorithm to unconcatenate the document in a way that minimizes the number of unrecognized characters. Return the number of unrecognized characters.

Note: This problem is slightly different from the original one in the book.

 

Example:

Input: 
dictionary = ["looked","just","like","her","brother"]
sentence = "jesslookedjustliketimherbrother"
Output:  7
Explanation:  After unconcatenating, we got "jess looked just like tim her brother", which containing 7 unrecognized characters.

Note:

  • 0 <= len(sentence) <= 1000
  • The total number of characters in dictionary is less than or equal to 150000.
  • There are only lowercase letters in dictionary and sentence.

中文题目

哦,不!你不小心把一个长篇文章中的空格、标点都删掉了,并且大写也弄成了小写。像句子"I reset the computer. It still didn’t boot!"已经变成了"iresetthecomputeritstilldidntboot"。在处理标点符号和大小写之前,你得先把它断成词语。当然了,你有一本厚厚的词典dictionary,不过,有些词没在词典里。假设文章用sentence表示,设计一个算法,把文章断开,要求未识别的字符最少,返回未识别的字符数。

注意:本题相对原题稍作改动,只需返回未识别的字符数

 

示例:

输入:
dictionary = ["looked","just","like","her","brother"]
sentence = "jesslookedjustliketimherbrother"
输出: 7
解释: 断句后为"jess looked just like tim her brother",共7个未识别字符。

提示:

  • 0 <= len(sentence) <= 1000
  • dictionary中总字符数不超过 150000。
  • 你可以认为dictionarysentence中只包含小写字母。

通过代码

高赞题解

🙋 今日打卡~

哦,不!工作日打卡题不全都是 EASY 嘛?!🤷‍♀️

题目分析

题意:给定字符串,尽可能多地匹配字典内的单词,即最少未匹配数。

:拿到题目,先想贪心能不能做,显然不行。比如给定字符串:abcdef,字典:["abcd", "ab", "def"],贪心匹配是 abcd ef,有2个字符未匹配。显然有更优匹配策略: ab c def ,只有 1 个字符未匹配。所以这题需要用动态规划来解决(怎么思考一个题目可不可以用 dp 来解决,先假设 dp 表达含义,再想转移方程,很多时候自然而然就出来了)。

1、状态定义:

dp[i] 表示字符串的前 i 个字符的最少未匹配数。

2、状态转移:

假设当前我们已经考虑完了前 i 个字符了,对于前 i + 1 个字符对应的最少未匹配数:

  1. i + 1 个字符未匹配,则 dp[i + 1] = dp[i] + 1,即不匹配数加 1;
  2. 遍历前 i 个字符,若以其中某一个下标 idx 为开头、以第 i + 1 个字符为结尾的字符串正好在词典里,则 dp[i] = min(dp[i], dp[idx]) 更新 dp[i]

于是,有了解法一

3、解法一代码:

时间复杂度是 $O(n^2)$,$n$ 为待匹配字符串的长度。

class Solution {
    public int respace(String[] dictionary, String sentence) {
        Set<String> dict = new HashSet<>(Arrays.asList(dictionary));
        int n = sentence.length();
        int[] dp = new int[n + 1];
        for (int i = 1; i <= n; i++) {
            dp[i] = dp[i - 1] + 1;
            for (int idx = 0; idx < i; idx++) {
                if (dict.contains(sentence.substring(idx, i))) {
                    dp[i] = Math.min(dp[i], dp[idx]);
                }
            }
        }
        return dp[n];
    }
}

4、使用 Trie 进行优化:

对于上述解法,计算 dp[i + 1]时,我们需要用 idx 来遍历前 i 个字符,逐个判断以 idx 为开头,以第 i + 1 个字符为结尾的字符串是否在字典里。
这一步可以利用字典树来加速,通过字典树我们可以查询以第 i + 1 个字符为结尾的单词有哪些(构建字典树时将单词逆序插入即可)。
关于字典树的学习,可以看下 这篇题解,这就是解法二
时间复杂度是 $O(m + n^2)$,$m$ 是字典长度,$n$ 为待匹配字符串的长度。为什么还是 $n^2$ 呢?因为有可能状态转移的时候,每个位置都需要转移,这是最坏情况,绝大多数情况下远小于 $n$,所以解法二最终耗时会远小于解法一。

5、解法二代码:

class Solution {
    public int respace(String[] dictionary, String sentence) {
        // 构建字典树
        Trie trie = new Trie();
        for (String word: dictionary) {
            trie.insert(word);
        }
        // 状态转移,dp[i] 表示字符串的前 i 个字符的最少未匹配数
        int n = sentence.length();
        int[] dp = new int[n + 1];
        for (int i = 1; i <= n; i++) {
            dp[i] = dp[i - 1] + 1;
            for (int idx: trie.search(sentence, i - 1)) {
                dp[i] = Math.min(dp[i], dp[idx]);
            }
        }
        return dp[n];
    }
}

class Trie {
    TrieNode root;

    public Trie() {
        root = new TrieNode();
    }

    // 将单词倒序插入字典树
    public void insert(String word) {
        TrieNode cur = root;
        for (int i = word.length() - 1; i >= 0; i--) {
            int c = word.charAt(i) - 'a';
            if (cur.children[c] == null) {
                cur.children[c] = new TrieNode();
            }
            cur = cur.children[c];
        }
        cur.isWord = true;
    }

    // 找到 sentence 中以 endPos 为结尾的单词,返回这些单词的开头下标。
    public List<Integer> search(String sentence, int endPos) {
        List<Integer> indices = new ArrayList<>(); 
        TrieNode cur = root;
        for (int i = endPos; i >= 0; i--) {
            int c = sentence.charAt(i) - 'a';
            if (cur.children[c] == null) {
                break;
            }
            cur = cur.children[c];
            if (cur.isWord) {
                indices.add(i);
            }  
        }
        return indices;
    }
}

class TrieNode {
    boolean isWord;
    TrieNode[] children = new TrieNode[26];

    public TrieNode() {}
}

统计信息

通过次数 提交次数 AC比率
25006 45285 55.2%

提交历史

提交时间 提交结果 执行时间 内存消耗 语言
上一篇:
面试题 17.12-BiNode(BiNode LCCI)
下一篇:
面试题 17.14-最小K个数(Smallest K LCCI)
本文目录
本文目录