本文最后更新于 2025-12-17,文章内容可能已经过时。

学习使用jieba分词模块来进行词频统计

统计《大奉打更人》这本小说中出现频率前20的人名

import jieba.posseg as posseg
import pandas as pd
if __name__ == '__main__':
    with open('./大奉打更人.txt','r',encoding='utf-8') as fin:
        content  = fin.read()
    words = []
    for word,flag in posseg.cut(content):
        if flag == 'nr':
            words.append(word)
    result = str(pd.Series(words).value_counts()[:20])
    print(result[:-13:],end='')