2024--2025-1爬虫复习题库 (1).zip
资源内容介绍
2024--2025-1爬虫复习题库 (1).zip <link href="/image.php?url=https://csdnimg.cn/release/download_crawler_static/css/base.min.css" rel="stylesheet"/><link href="/image.php?url=https://csdnimg.cn/release/download_crawler_static/css/fancy.min.css" rel="stylesheet"/><link href="/image.php?url=https://csdnimg.cn/release/download_crawler_static/90182467/2/raw.css" rel="stylesheet"/><div id="sidebar" style="display: none"><div id="outline"></div></div><div class="pf w0 h0" data-page-no="1" id="pf1"><div class="pc pc1 w0 h0"><img alt="" class="bi x0 y0 w1 h1" src="/image.php?url=https://csdnimg.cn/release/download_crawler_static/90182467/bg1.jpg"/><div class="t m0 x1 h2 y1 ff1 fs0 fc0 sc0 ls0 ws0">Python<span class="_"> </span><span class="ff2">爬虫课程中关于<span class="_ _0"> </span></span>lxml<span class="_ _0"> </span><span class="ff2">模块的<span class="_ _1"> </span></span>etree<span class="_ _0"> </span><span class="ff2">分析页面数据章节设计的试题及答案:</span></div><div class="t m0 x1 h3 y2 ff3 fs0 fc0 sc0 ls0 ws0">试题</div><div class="t m0 x1 h3 y3 ff3 fs0 fc0 sc0 ls0 ws0">一、选择题</div><div class="t m0 x2 h2 y4 ff1 fs0 fc0 sc0 ls0 ws0">1.<span class="_ _2"> </span><span class="ff2">在<span class="_ _0"> </span></span>Pytho<span class="_ _3"></span>n<span class="_ _0"> </span><span class="ff2">爬虫中,<span class="_ _4"></span>使用<span class="_ _0"> </span><span class="ff1">lxml<span class="_"> </span></span>模块的<span class="_ _0"> </span><span class="ff1">etree<span class="_"> </span></span>分析页面数据时,<span class="_ _5"></span>首先需要实例化一个什</span></div><div class="t m0 x3 h2 y5 ff2 fs0 fc0 sc0 ls0 ws0">么对象?</div><div class="t m0 x3 h2 y6 ff1 fs0 fc0 sc0 ls0 ws0">A. HTML<span class="_"> </span><span class="ff2">对象</span></div><div class="t m0 x3 h2 y7 ff1 fs0 fc0 sc0 ls0 ws0">B. etree<span class="_ _0"> </span><span class="ff2">对象</span></div><div class="t m0 x3 h2 y8 ff1 fs0 fc0 sc0 ls0 ws0">C. XML<span class="_"> </span><span class="ff2">对象</span></div><div class="t m0 x3 h2 y9 ff1 fs0 fc0 sc0 ls0 ws0">D. Parser<span class="_ _0"> </span><span class="ff2">对象</span></div><div class="t m0 x2 h2 ya ff1 fs0 fc0 sc0 ls0 ws0">2.<span class="_ _2"> </span><span class="ff2">使用<span class="_ _0"> </span></span>etree<span class="_"> </span><span class="ff2">对象的<span class="_ _0"> </span></span>xpath<span class="_"> </span><span class="ff2">方法时,以下哪个符号用于表示从任意节点开始定位?</span></div><div class="t m0 x3 h2 yb ff1 fs0 fc0 sc0 ls0 ws0">A. /</div><div class="t m0 x3 h2 yc ff1 fs0 fc0 sc0 ls0 ws0">B. //</div><div class="t m0 x3 h2 yd ff1 fs0 fc0 sc0 ls0 ws0">C. @</div><div class="t m0 x3 h2 ye ff1 fs0 fc0 sc0 ls0 ws0">D. []</div><div class="t m0 x2 h2 yf ff1 fs0 fc0 sc0 ls0 ws0">3.<span class="_ _2"> </span><span class="ff2">在<span class="_ _0"> </span></span>x<span class="_ _3"></span>path<span class="_ _0"> </span><span class="ff2">表达式中,如何定位具有特定<span class="_ _1"> </span></span>class<span class="_ _0"> </span><span class="ff2">属性的<span class="_ _1"> </span></span>div<span class="_ _0"> </span><span class="ff2">元素?</span></div><div class="t m0 x3 h2 y10 ff1 fs0 fc0 sc0 ls0 ws0">A. //div[@class]</div><div class="t m0 x3 h2 y11 ff1 fs0 fc0 sc0 ls0 ws0">B. //div[class="<span class="ff2">特定值</span>"]</div><div class="t m0 x3 h2 y12 ff1 fs0 fc0 sc0 ls0 ws0">C. //div/class="<span class="ff2">特定值</span>"</div><div class="t m0 x3 h2 y13 ff1 fs0 fc0 sc0 ls0 ws0">D. //div[@class="<span class="ff2">特定值</span>"]</div><div class="t m0 x2 h2 y14 ff1 fs0 fc0 sc0 ls0 ws0">4.<span class="_ _2"> </span><span class="ff2">在使<span class="_ _3"></span>用<span class="_ _1"> </span></span>etree<span class="_"> </span><span class="ff2">解析<span class="_ _1"> </span></span>HTML<span class="_"> </span><span class="ff2">文档<span class="_ _3"></span>时,<span class="_ _3"></span>如<span class="_ _3"></span>果希<span class="_ _3"></span>望<span class="_ _3"></span>获<span class="_ _3"></span>取某<span class="_ _3"></span>个<span class="_ _3"></span>标签<span class="_ _1"> </span></span>tag<span class="_"> </span><span class="ff2">直系<span class="_ _3"></span>文<span class="_ _3"></span>本内<span class="_ _3"></span>容<span class="_ _3"></span>,应<span class="_ _3"></span>使</span></div><div class="t m0 x3 h2 y15 ff2 fs0 fc0 sc0 ls0 ws0">用哪种<span class="_ _0"> </span><span class="ff1">xpath<span class="_"> </span></span>表达式?</div><div class="t m0 x3 h2 y16 ff1 fs0 fc0 sc0 ls0 ws0">A. //text()</div><div class="t m0 x3 h2 y17 ff1 fs0 fc0 sc0 ls0 ws0">B. /text()</div><div class="t m0 x3 h2 y18 ff1 fs0 fc0 sc0 ls0 ws0">C. tag/text()</div><div class="t m0 x3 h2 y19 ff1 fs0 fc0 sc0 ls0 ws0">D. tag//text()</div><div class="t m0 x2 h2 y1a ff1 fs0 fc0 sc0 ls0 ws0">5.<span class="_ _2"> </span><span class="ff2">以下哪个函数用于将本地的<span class="_ _0"> </span></span>H<span class="_ _3"></span>TML<span class="_ _0"> </span><span class="ff2">文档源码数据加载到<span class="_ _1"> </span></span>etree<span class="_ _0"> </span><span class="ff2">对象中?</span></div><div class="t m0 x3 h2 y1b ff1 fs0 fc0 sc0 ls0 ws0">A. etree.HTML()</div><div class="t m0 x3 h2 y1c ff1 fs0 fc0 sc0 ls0 ws0">B. etree.parse()</div><div class="t m0 x3 h2 y1d ff1 fs0 fc0 sc0 ls0 ws0">C. etree.tostring()</div><div class="t m0 x3 h2 y1e ff1 fs0 fc0 sc0 ls0 ws0">D. etree.XPath()</div><div class="t m0 x1 h3 y1f ff3 fs0 fc0 sc0 ls0 ws0">二、简答题</div><div class="t m0 x2 h2 y20 ff1 fs0 fc0 sc0 ls0 ws0">1.<span class="_ _2"> </span><span class="ff2">请简要描述使用<span class="_ _0"> </span></span>lxm<span class="_ _3"></span>l<span class="_ _0"> </span><span class="ff2">模块的<span class="_ _1"> </span></span>etree<span class="_ _0"> </span><span class="ff2">分析页面数据的主要步骤。</span></div><div class="t m0 x2 h2 y21 ff1 fs0 fc0 sc0 ls0 ws0">2.<span class="_ _2"> </span><span class="ff2">在<span class="_ _0"> </span></span>x<span class="_ _3"></span>path<span class="_ _0"> </span><span class="ff2">表达式中,如何使用索引定位特定的元素?请给出一个例子。</span></div><div class="t m0 x2 h2 y22 ff1 fs0 fc0 sc0 ls0 ws0">3.<span class="_ _2"> </span><span class="ff2">请解释<span class="_ _0"> </span></span>x<span class="_ _3"></span>path<span class="_ _0"> </span><span class="ff2">表达式中的</span>“/”<span class="ff2">和</span>“//”<span class="ff2">的区别。</span></div><div class="t m0 x1 h3 y23 ff3 fs0 fc0 sc0 ls0 ws0">答案</div><div class="t m0 x1 h3 y24 ff3 fs0 fc0 sc0 ls0 ws0">一、选择题</div><div class="t m0 x2 h2 y25 ff1 fs0 fc0 sc0 ls0 ws0">1.<span class="_ _2"> </span>B. etree<span class="_ _0"> </span><span class="ff2">对象</span></div><div class="t m0 x3 h2 y26 ff2 fs0 fc0 sc0 ls0 ws0">解析<span class="_ _3"></span>:<span class="_ _3"></span>在使<span class="_ _3"></span>用<span class="_ _1"> </span><span class="ff1">lxml<span class="_"> </span></span>模块<span class="_ _3"></span>的<span class="_ _1"> </span><span class="ff1">etree<span class="_"> </span></span>分析页<span class="_ _3"></span>面<span class="_ _3"></span>数据<span class="_ _3"></span>时<span class="_ _3"></span>,首<span class="_ _3"></span>先<span class="_ _3"></span>需要<span class="_ _3"></span>实<span class="_ _3"></span>例化<span class="_ _3"></span>一<span class="_ _3"></span>个<span class="_ _1"> </span><span class="ff1">etree<span class="_"> </span></span>对象。</div><div class="t m0 x2 h2 y27 ff1 fs0 fc0 sc0 ls0 ws0">2.<span class="_ _2"> </span>B. //</div><div class="t m0 x3 h2 y28 ff2 fs0 fc0 sc0 ls0 ws0">解析:在<span class="_ _0"> </span><span class="ff1">xpath<span class="_"> </span></span>表达式中,<span class="ff1">//</span>表示从任意节点开始定位。</div><div class="t m0 x2 h2 y29 ff1 fs0 fc0 sc0 ls0 ws0">3.<span class="_ _2"> </span>D. //div[@class="<span class="ff2">特定值</span>"]</div><div class="t m0 x3 h2 y2a ff2 fs0 fc0 sc0 ls0 ws0">解析<span class="_ _6"></span>:<span class="_ _6"></span>在<span class="_ _7"> </span><span class="ff1">xpath<span class="_ _7"> </span></span>表达式中,<span class="_ _8"></span>要定位具有特定<span class="_ _7"> </span><span class="ff1">class<span class="_ _7"> </span></span>属性的<span class="_ _7"> </span><span class="ff1">div<span class="_ _7"> </span></span>元素,<span class="_ _8"></span>应使用<span class="_ _4"></span><span class="ff1">//div[@class="</span></div><div class="t m0 x3 h2 y2b ff2 fs0 fc0 sc0 ls0 ws0">特定值<span class="ff1">"]</span>。</div><div class="t m0 x2 h2 y2c ff1 fs0 fc0 sc0 ls0 ws0">4.<span class="_ _2"> </span>C. tag/text()</div></div><div class="pi" data-data='{"ctm":[1.611830,0.000000,0.000000,1.611830,0.000000,0.000000]}'></div></div><div id="pf2" class="pf w0 h0" data-page-no="2"><div class="pc pc2 w0 h0"><img class="bi x0 y0 w1 h1" alt="" src="/image.php?url=https://csdnimg.cn/release/download_crawler_static/90182467/bg2.jpg"><div class="t m0 x3 h2 y1 ff2 fs0 fc0 sc0 ls0 ws0">解析:<span class="_ _3"></span>在使<span class="_ _3"></span>用<span class="_ _1"> </span><span class="ff1">etree<span class="_"> </span></span>解析<span class="_ _1"> </span><span class="ff1">HTML<span class="_"> </span></span>文档时<span class="_ _3"></span>,如<span class="_ _3"></span>果希<span class="_ _3"></span>望获<span class="_ _3"></span>取某<span class="_ _3"></span>个标<span class="_ _3"></span>签直<span class="_ _3"></span>系文<span class="_ _3"></span>本内<span class="_ _3"></span>容,<span class="_ _3"></span>应</div><div class="t m0 x3 h2 y2d ff2 fs0 fc0 sc0 ls0 ws0">使用<span class="_ _0"> </span><span class="ff1">tag/text()</span>的<span class="_ _1"> </span><span class="ff1">xpath<span class="_ _0"> </span></span>表达式。</div><div class="t m0 x2 h2 y3 ff1 fs0 fc0 sc0 ls0 ws0">5.<span class="_ _2"> </span>B. etree.parse()</div><div class="t m0 x3 h2 y2e ff2 fs0 fc0 sc0 ls0 ws0">解析:<span class="ff1">etree.parse()</span>函数用于将本地的<span class="_ _0"> </span><span class="ff1">HTML<span class="_"> </span></span>文档源码数据加载到<span class="_ _0"> </span><span class="ff1">etree<span class="_"> </span></span>对象中。</div><div class="t m0 x1 h3 y2f ff3 fs0 fc0 sc0 ls0 ws0">二、简答题</div><div class="t m0 x2 h2 y30 ff1 fs0 fc0 sc0 ls0 ws0">1.<span class="_ _2"> </span><span class="ff2">使用<span class="_ _0"> </span></span>lxm<span class="_ _3"></span>l<span class="_ _0"> </span><span class="ff2">模块的<span class="_ _1"> </span></span>etree<span class="_ _0"> </span><span class="ff2">分析页面数据的主要步骤包括:</span></div><div class="t m0 x4 h2 y31 ff4 fs1 fc0 sc0 ls0 ws0">o<span class="_ _9"> </span><span class="ff2 fs0">实例化一个<span class="_ _0"> </span><span class="ff1">etree<span class="_"> </span></span>对象,并将被解析的页面源码数据加载到该对象中。</span></div><div class="t m0 x4 h2 y32 ff4 fs1 fc0 sc0 ls0 ws0">o<span class="_ _9"> </span><span class="ff2 fs0">调用<span class="_ _1"> </span><span class="ff1">etree<span class="_"> </span></span>对象中的<span class="_ _1"> </span><span class="ff1">xpath<span class="_ _0"> </span></span>方<span class="_ _3"></span>法,<span class="_ _3"></span>结合<span class="_ _1"> </span><span class="ff1">xpath<span class="_"> </span></span>表达式定位<span class="_ _3"></span>标签和<span class="_ _3"></span>爬取内<span class="_ _3"></span>容文</span></div><div class="t m0 x5 h2 y9 ff2 fs0 fc0 sc0 ls0 ws0">本或属性。</div><div class="t m0 x2 h2 ya ff1 fs0 fc0 sc0 ls0 ws0">2.<span class="_ _2"> </span><span class="ff2">在<span class="_ _0"> </span></span>x<span class="_ _3"></span>path<span class="_ _0"> </span><span class="ff2">表达式中,<span class="_ _a"></span>可以使用索引定位特定的元素。<span class="_ _a"></span>例如,<span class="_ _b"></span>要定位<span class="_ _0"> </span><span class="ff1">class<span class="_"> </span></span>为<span class="ff1">"title"</span>的<span class="_ _0"> </span><span class="ff1">div</span></span></div><div class="t m0 x3 h2 yb ff2 fs0 fc0 sc0 ls0 ws0">元素中的第一个<span class="_ _0"> </span><span class="ff1">a<span class="_"> </span></span>标签,可以使用以下<span class="_ _0"> </span><span class="ff1">xpath<span class="_"> </span></span>表达式<span class="_ _5"></span>:<span class="_ _4"></span><span class="ff1">//div[@class="title"]/a[1]<span class="ff2">。这</span></span></div><div class="t m0 x3 h2 yc ff2 fs0 fc0 sc0 ls0 ws0">里的<span class="ff1">[1]</span>表示索引为<span class="_ _0"> </span><span class="ff1">1<span class="_"> </span></span>的元素,即第一个<span class="_ _0"> </span><span class="ff1">a<span class="_"> </span></span>标签。</div><div class="t m0 x2 h2 y33 ff1 fs0 fc0 sc0 ls0 ws0">3.<span class="_ _2"> </span><span class="ff2">在<span class="_ _0"> </span></span>x<span class="_ _3"></span>path<span class="_ _0"> </span><span class="ff2">表达式中,</span>“/”<span class="ff2">表示一个层级,从根节点开始定位<span class="_ _4"></span>;<span class="_ _5"></span>而<span class="ff1">“//”</span>表示多个层级,可</span></div><div class="t m0 x3 h2 y34 ff2 fs0 fc0 sc0 ls0 ws0">以从任意节点开始定位。<span class="_ _4"></span>例如,<span class="_ _c"></span><span class="ff1">/html/head/title<span class="_ _0"> </span><span class="ff2">表示从根节点<span class="_ _1"> </span></span>html<span class="_ _0"> </span><span class="ff2">开始,<span class="_ _c"></span>依次定位</span></span></div><div class="t m0 x3 h2 y35 ff2 fs0 fc0 sc0 ls0 ws0">到<span class="_ _0"> </span><span class="ff1">head<span class="_"> </span></span>和<span class="_ _0"> </span><span class="ff1">title<span class="_"> </span></span>标签<span class="_ _5"></span>;<span class="_ _5"></span>而<span class="ff1">//title<span class="_"> </span></span>则表示从任意节点开始定位<span class="_ _0"> </span><span class="ff1">title<span class="_"> </span></span>标签,不考虑其在文</div><div class="t m0 x3 h2 y10 ff2 fs0 fc0 sc0 ls0 ws0">档中的具体位置。</div><div class="t m0 x1 h2 y36 ff2 fs0 fc0 sc0 ls0 ws0">代码题填空题:</div><div class="t m0 x1 h2 y14 ff1 fs0 fc0 sc0 ls0 ws0">1<span class="ff2">、<span class="_ _d"> </span>已知<span class="_ _0"> </span></span>H<span class="_ _3"></span>TML<span class="_ _0"> </span><span class="ff2">页面源代码如下所示。</span></div><div class="c x2 y37 w2 h4"><div class="t m0 x6 h5 y38 ff5 fs0 fc0 sc0 ls0 ws0"><html></div><div class="t m0 x6 h5 y39 ff5 fs0 fc0 sc0 ls0 ws0"><head></div><div class="t m0 x6 h6 y3a ff5 fs0 fc0 sc0 ls0 ws0"> <span class="_ _e"> </span><title><span class="ff6">示例页面</span></title></div><div class="t m0 x6 h5 y3b ff5 fs0 fc0 sc0 ls0 ws0"></head></div><div class="t m0 x6 h5 y3c ff5 fs0 fc0 sc0 ls0 ws0"><body></div><div class="t m0 x6 h5 y3d ff5 fs0 fc0 sc0 ls0 ws0"> <span class="_ _e"> </span><div class="content"></div><div class="t m0 x6 h6 y3e ff5 fs0 fc0 sc0 ls0 ws0"> <span class="_ _f"> </span><h1><span class="ff6">欢迎来到我的网站</span></h1></div><div class="t m0 x6 h6 y3f ff5 fs0 fc0 sc0 ls0 ws0"> <span class="_ _f"> </span><p class="description"><span class="ff6">这是一个示例页面,用于展示如何使用<span class="_ _0"> </span></span>l<span class="_ _3"></span>xml<span class="_ _0"> </span><span class="ff6">模</span></div><div class="t m0 x6 h6 y40 ff6 fs0 fc0 sc0 ls0 ws0">块。<span class="ff5"></p></span></div><div class="t m0 x6 h5 y41 ff5 fs0 fc0 sc0 ls0 ws0"> <span class="_ _f"> </span><ul></div><div class="t m0 x6 h6 y42 ff5 fs0 fc0 sc0 ls0 ws0"> <span class="_ _10"> </span><li><a href="http://example.com/page1"><span class="ff6">页面<span class="_ _0"> </span></span>1</a></li></div><div class="t m0 x6 h6 y43 ff5 fs0 fc0 sc0 ls0 ws0"> <span class="_ _10"> </span><li><a href="http://example.com/page2"><span class="ff6">页面<span class="_ _0"> </span></span>2</a></li></div><div class="t m0 x6 h6 y44 ff5 fs0 fc0 sc0 ls0 ws0"> <span class="_ _10"> </span><li><a href="http://example.com/page3"><span class="ff6">页面<span class="_ _0"> </span></span>3</a></li></div><div class="t m0 x6 h5 y45 ff5 fs0 fc0 sc0 ls0 ws0"> <span class="_ _f"> </span></ul></div><div class="t m0 x6 h5 y46 ff5 fs0 fc0 sc0 ls0 ws0"> <span class="_ _e"> </span></div></div><div class="t m0 x6 h5 y47 ff5 fs0 fc0 sc0 ls0 ws0"></body></div><div class="t m0 x6 h5 y48 ff5 fs0 fc0 sc0 ls0 ws0"></html></div></div><div class="t m0 x2 h2 y49 ff2 fs0 fc0 sc0 ls0 ws0">选择正确答案将以下<span class="_ _0"> </span><span class="ff1">python<span class="_"> </span></span>代码填充完整,使其满足题目要求能提取<span class="_ _0"> </span><span class="ff1">HTML<span class="_"> </span></span>的内容。</div><div class="c x1 y4a w3 h7"><div class="t m0 x6 h2 y4b ff1 fs0 fc0 sc0 ls0 ws0">from lxml import etree</div><div class="t m0 x6 h2 y4c ff1 fs0 fc0 sc0 ls0 ws0"># <span class="_ _0"> </span><span class="ff2">假设<span class="_ _0"> </span></span>html_content<span class="_"> </span><span class="ff2">是上述<span class="_ _0"> </span></span>HTML<span class="_"> </span><span class="ff2">代码片段的字符串表示</span></div><div class="t m0 x6 h2 y4d ff1 fs0 fc0 sc0 ls0 ws0">html_content = '''</div></div></div><div class="pi" data-data='{"ctm":[1.611830,0.000000,0.000000,1.611830,0.000000,0.000000]}'></div></div>