凑个小热闹：python采集《狂飙》评论-百木园

2023年首部爆款剧集《狂飙》一度冲上热搜第一，害的我两倍速熬夜看完了。

“是非面前稍不留神，就会步入万丈深渊，唯有坚守信仰，才能守得初心”

面对这么多广大网友的讨论，我也来凑上一个热闹

用python爬取《狂飙》评论数据

代码展示

部分代码展示

import requests
import parsel
# 我还录制了详细讲解的视频，直接在这个裙 708525271 自取，包括完整代码

headers = {
    \'Cookie\': \'数据我都删除了，建议用自己的\',
    \'Host\': \'\',
    \'User-Agent\': \'\',
}
for page in range(0, 4000):
    print(page)
    url = f\'https://movie.douban.com/subject/35465232/comments?start={page*20}&limit=20&status=P&sort=new_score\'
    response = requests.get(url=url, headers=headers)
    select = parsel.Selector(response.text)
    comments = select.css(\'.comment-item .comment\')
    for comment in comments:
        name = comment.css(\'.comment-info a::text\').get()
        try:
            score_str = comment.css(\'.comment-info .rating::attr(class)\').get()
            score = score_str.replace(\'0 rating\', \'\').replace(\'allstar\', \'\')
        except:
            score = 0
        comment_time = comment.css(\'.comment-info .comment-time::text\').get().strip()
        vote_count = comment.css(\'.comment-vote .votes.vote-count::text\').get()
        comment_content = comment.css(\'.comment-content span::text\').get()
        print(name, score, comment_time, vote_count, comment_content)

效果展示

不登录的话，只能采集部分，全部评论需要登录后才能爬取。

浏览器数据容易泄密，我都删掉了，大家自己修改一下。

最后

感谢你观看我的文章~本次航班到这里就结束?

希望本篇文章有对你带来帮助 ?，有学习到一点知识~

躲起来的星星?也在努力发光，你也要努力加油（让我们一起努力）

来源：https://www.cnblogs.com/hahaa/p/17105263.html
本站部分图文来源于网络，如有侵权请联系删除。

凑个小热闹：python采集《狂飙》评论

相关推荐

热门文章