22级大数据二班 03 穆俊.zip2401_85830799ZIP22级大数据二班 03 穆俊.zip 631.07KB 立即下载资源文件列表:ZIP 22级大数据二班 03 穆俊.zip 大约有2个文件 03 穆俊.docx 705.13KB 源代码.txt 1.27KB 资源介绍: 22级大数据二班 03 穆俊.zip <link href="/image.php?url=https://csdnimg.cn/release/download_crawler_static/css/base.min.css" rel="stylesheet"/><link href="/image.php?url=https://csdnimg.cn/release/download_crawler_static/css/fancy.min.css" rel="stylesheet"/><link href="/image.php?url=https://csdnimg.cn/release/download_crawler_static/89500456/raw.css" rel="stylesheet"/><div id="sidebar" style="display: none"><div id="outline"></div></div><div class="pf w0 h0" data-page-no="1" id="pf1"><div class="pc pc1 w0 h0"><img alt="" class="bi x0 y0 w1 h1" src="/image.php?url=https://csdnimg.cn/release/download_crawler_static/89500456/bg1.jpg"/><div class="c x1 y1 w2 h2"><div class="t m0 x0 h3 y2 ff1 fs0 fc0 sc0 ls0 ws0">1</div></div><div class="t m0 x2 h4 y3 ff2 fs1 fc0 sc0 ls0 ws0"> </div><div class="t m0 x3 h5 y4 ff3 fs2 fc0 sc1 ls0 ws0">天津农<span class="_ _0"></span>学院<span class="_ _1"> </span>课内实<span class="_ _0"></span>验任</div><div class="t m0 x4 h5 y5 ff3 fs2 fc0 sc1 ls0 ws0">务书</div><div class="t m0 x5 h6 y6 ff4 fs3 fc0 sc0 ls0 ws0">2023 ——2024 <span class="ff5">学年第</span> <span class="ff5">二</span> <span class="ff5">学期</span></div><div class="t m0 x6 h7 y7 ff1 fs3 fc0 sc0 ls0 ws0"> </div><div class="t m0 x7 h6 y8 ff3 fs3 fc0 sc1 ls0 ws0">课<span class="ff2 sc0"> <span class="_"> </span></span>程<span class="ff2 sc0"> <span class="_"> </span></span>名<span class="ff2 sc0"> <span class="_"> </span></span>称<span class="ff5 sc0">:<span class="ff1"> <span class="_"> </span> <span class="_ _2"> </span></span>大数据采集与清洗<span class="ff1"> </span></span></div><div class="t m0 x7 h6 y9 ff3 fs3 fc0 sc1 ls0 ws0">课内实践学时<span class="_ _0"></span><span class="ff5 sc0">:<span class="ff1"> <span class="_"> </span> <span class="_ _2"> </span>24<span class="_"> </span></span>学时<span class="ff1"> </span></span></div><div class="t m0 x7 h6 ya ff3 fs3 fc0 sc1 ls0 ws0">专<span class="ff2 sc0"> <span class="_"> </span></span>业<span class="ff2 sc0"> <span class="_"> </span></span>班<span class="ff2 sc0"> <span class="_"> </span></span>级<span class="ff5 sc0">:<span class="ff1"> <span class="_"> </span> <span class="_ _3"> </span>22<span class="_"> </span></span>大数据<span class="_ _4"> </span><span class="ff1">1<span class="_"> </span></span>班、<span class="ff1">2<span class="_"> </span></span>班<span class="ff1"> </span></span></div><div class="t m0 x7 h6 yb ff3 fs3 fc0 sc1 ls0 ws0">主<span class="ff2 sc0"> <span class="_"> </span></span>讲<span class="ff2 sc0"> <span class="_"> </span></span>教<span class="ff2 sc0"> <span class="_"> </span></span>师<span class="ff5 sc0">:<span class="ff1"> <span class="_ _5"> </span> <span class="_ _6"> </span></span>王育欣<span class="ff1"> </span></span></div><div class="t m0 x8 h7 yc ff1 fs3 fc0 sc0 ls0 ws0"> </div></div><div class="pi" data-data='{"ctm":[1.611830,0.000000,0.000000,1.611830,0.000000,0.000000]}'></div></div><div id="pf2" class="pf w0 h0" data-page-no="2"><div class="pc pc2 w0 h0"><img class="bi x0 y0 w1 h1" alt="" src="/image.php?url=https://csdnimg.cn/release/download_crawler_static/89500456/bg2.jpg"><div class="c x1 y1 w2 h2"><div class="t m0 x0 h3 y2 ff1 fs0 fc0 sc0 ls0 ws0">2</div></div><div class="t m0 x9 h8 yd ff6 fs4 fc0 sc1 ls0 ws0">课内实验<span class="_ _0"></span>任务书</div><div class="c xa ye w3 h9"><div class="t m0 xb ha yf ff6 fs5 fc0 sc1 ls0 ws0">实验一</div></div><div class="c xc ye w4 h9"><div class="t m0 xd ha yf ff7 fs5 fc0 sc0 ls0 ws0">爬取豆瓣电影<span class="_ _7"> </span><span class="ff8">TOP<span class="_ _7"> </span></span>榜单</div><div class="t m0 xe hb y10 ff7 fs6 fc0 sc0 ls0 ws0">(<span class="ff8">22<span class="_ _8"> </span></span>级大数据<span class="_ _8"> </span><span class="ff8">2<span class="_ _8"> </span></span>班,穆俊,<span class="ff8">03</span>)</div></div><div class="c xa y11 w5 hc"><div class="t m0 xf hd y12 ff3 fs5 fc0 sc1 ls0 ws0">实验要求:</div><div class="t m0 x10 h6 y13 ff5 fs3 fc0 sc0 ls0 ws0">根据实验<span class="_ _0"></span>题目三人一组<span class="_ _0"></span>设计具体要<span class="_ _0"></span>爬取的<span class="_ _4"> </span><span class="ff4">TOP<span class="_ _4"> </span></span>榜<span class="_ _0"></span>单内容,重<span class="_ _0"></span>新设计具体<span class="_ _0"></span>实验内容和<span class="_ _0"></span>组内</div><div class="t m0 x11 h6 y14 ff5 fs3 fc0 sc0 ls0 ws0">同学不同分工,要求既要有分工又要有配合(如调试代码及三人调试中遇到的不同问题)。</div><div class="t m0 x10 h6 y15 ff5 fs3 fc0 sc0 ls0 ws0">格式要求:</div><div class="t m0 x10 h6 y16 ff5 fs3 fc0 sc0 ls0 ws0">(<span class="ff4">1</span>)题目用“四号黑体字”,居中处理。</div><div class="t m0 x10 h6 y17 ff5 fs3 fc0 sc0 ls0 ws0">(<span class="ff4">2<span class="_ _0"></span></span>)<span class="_ _0"></span>在<span class="_ _0"></span>题目<span class="_ _0"></span>下<span class="_ _0"></span>方写<span class="_ _0"></span>清<span class="_ _0"></span>个<span class="_ _0"></span>人信<span class="_ _0"></span>息<span class="_ _0"></span>,包<span class="_ _0"></span>含<span class="_ _0"></span><span class="ff4">“<span class="_ _0"></span></span>年级<span class="_ _0"></span>、<span class="_ _0"></span>专业<span class="_ _0"></span>、<span class="_ _0"></span>班级<span class="_ _0"></span>、<span class="_ _0"></span>姓<span class="_ _0"></span>名、<span class="_ _0"></span>学<span class="_ _0"></span>号<span class="ff4">”<span class="_ _0"></span></span>。<span class="_ _0"></span>个<span class="_ _0"></span>人信<span class="_ _0"></span>息</div><div class="t m0 x11 h6 y18 ff5 fs3 fc0 sc0 ls0 ws0">用<span class="_ _9"></span><span class="ff4">“<span class="_ _9"></span></span>小<span class="_ _9"></span>括<span class="_ _9"></span>号<span class="_ _9"></span><span class="ff4">”<span class="_ _9"></span></span>括<span class="_ _9"></span>起<span class="_ _9"></span>来<span class="_ _9"></span>,<span class="_ _9"></span>如<span class="_ _9"></span><span class="ff4">“<span class="_ _9"></span></span>(<span class="_ _9"></span><span class="ff4">22<span class="_ _a"> </span></span>级<span class="_ _9"></span>大<span class="_ _9"></span>数<span class="_ _9"></span>据<span class="_ _a"> </span><span class="ff4">1<span class="_ _a"> </span></span>班<span class="_ _9"></span>,<span class="_ _9"></span><span class="ff4">XX<span class="_ _9"></span></span>,<span class="_ _9"></span><span class="ff4">03<span class="_ _9"></span></span>)<span class="_ _9"></span><span class="ff4">”<span class="_ _9"></span></span>。<span class="_ _9"></span>个<span class="_ _9"></span>人<span class="_ _9"></span>信<span class="_ _9"></span>息<span class="_ _9"></span>用<span class="_ _9"></span><span class="ff4">“<span class="_ _9"></span></span>五<span class="_ _9"></span>号<span class="_ _9"></span>黑<span class="_ _9"></span>体</div><div class="t m0 x11 h6 y19 ff5 fs3 fc0 sc0 ls0 ws0">字”,居中处理。</div><div class="t m0 x10 h6 y1a ff5 fs3 fc0 sc0 ls0 ws0">(<span class="ff4">3</span>)正文使用小四号宋体字。</div><div class="t m0 x10 h6 y1b ff5 fs3 fc0 sc0 ls0 ws0">(<span class="ff4">4</span>)字间距为标准字间距,行距为<span class="_ _4"> </span><span class="ff4">1.5<span class="_ _4"> </span></span>倍行距;页码置于页面底端右侧。</div><div class="t m0 x10 h6 y1c ff5 fs3 fc0 sc0 ls0 ws0">上传提交注意事项:</div><div class="t m0 x10 h6 y1d ff5 fs3 fc0 sc0 ls0 ws0">请同学们将<span class="_ _0"></span>电子版文档<span class="_ _0"></span>命名格式为<span class="_ _0"></span><span class="ff4">“</span>学号后两<span class="_ _0"></span>位<span class="ff4">+</span>姓名<span class="ff4">”<span class="_ _0"></span></span>,如<span class="ff4">“03 XX”<span class="_ _0"></span></span>。将实验报<span class="_ _0"></span>告和</div><div class="t m0 x11 h6 y1e ff5 fs3 fc0 sc0 ls0 ws0">源程<span class="_ _0"></span>序文件<span class="_ _0"></span>的电<span class="_ _0"></span>子版<span class="_ _0"></span>放入自<span class="_ _0"></span>建的<span class="_ _0"></span>文件<span class="_ _0"></span>夹中,<span class="_ _0"></span>文件<span class="_ _0"></span>夹的<span class="_ _0"></span>名称按<span class="_ _0"></span><span class="ff4">“22<span class="_ _7"> </span></span>级大数据<span class="_ _7"> </span><span class="ff4">1<span class="_ _4"> </span></span>班<span class="ff4">+<span class="_ _0"></span></span>学号后<span class="_ _0"></span>两位</div><div class="t m0 x11 h6 y1f ff4 fs3 fc0 sc0 ls0 ws0">+<span class="ff5">姓名”的格式命名。</span></div></div></div><div class="pi" data-data='{"ctm":[1.611830,0.000000,0.000000,1.611830,0.000000,0.000000]}'></div></div><div id="pf3" class="pf w0 h0" data-page-no="3"><div class="pc pc3 w0 h0"><img class="bi x0 y0 w1 h1" alt="" src="/image.php?url=https://csdnimg.cn/release/download_crawler_static/89500456/bg3.jpg"><div class="c x1 y1 w2 h2"><div class="t m0 x0 h3 y2 ff1 fs0 fc0 sc0 ls0 ws0">3</div></div><div class="c xa y20 w5 he"><div class="t m0 xf hd y21 ff3 fs5 fc0 sc1 ls0 ws0">实验方案设<span class="_ _0"></span>计</div><div class="t m0 x10 h6 y22 ff5 fs3 fc0 sc0 ls0 ws0">一、实验目的:</div><div class="t m0 x10 h6 y23 ff5 fs3 fc0 sc0 ls0 ws0">通过爬取豆瓣电影<span class="_ _4"> </span><span class="ff4">TOP<span class="_ _4"> </span></span>榜单的案例掌握分析请求地址、爬虫代码的实现。</div><div class="t m0 x10 h6 y24 ff5 fs3 fc0 sc0 ls0 ws0">二、实验重难点:</div><div class="t m0 x10 h6 y25 ff4 fs3 fc0 sc0 ls0 ws0">requests<span class="_ _4"> </span><span class="ff5">模块与<span class="_ _4"> </span></span>lxml<span class="_ _4"> </span><span class="ff5">模块中的<span class="_ _4"> </span></span>XPath<span class="_ _4"> </span><span class="ff5">解析器的使用。</span></div><div class="t m0 x10 h6 y26 ff5 fs3 fc0 sc0 ls0 ws0">三、软件运行环境:</div><div class="t m0 x10 h6 y27 ff4 fs3 fc0 sc0 ls0 ws0">PyCharm Community Edition2024.1.1</div><div class="t m0 x10 h6 y28 ff4 fs3 fc0 sc0 ls0 ws0">Python3.6</div><div class="t m0 x10 h6 y29 ff5 fs3 fc0 sc0 ls0 ws0">三、实验步骤:</div><div class="t m0 x10 h6 y2a ff4 fs3 fc0 sc0 ls0 ws0">1.<span class="ff5">安装所需要的库</span></div><div class="t m0 x10 h6 y2b ff4 fs3 fc0 sc0 ls0 ws0">2.<span class="ff5">在浏览器中找到爬取的豆瓣网页</span></div></div></div><div class="pi" data-data='{"ctm":[1.611830,0.000000,0.000000,1.611830,0.000000,0.000000]}'></div></div>