(优发娱乐wwwyoufa8) 爬虫:搜索结果要等待几秒之后才能出现,怎么爬取结果出现后的页面?

#coding=utf-8
import sys
import requests
reloadsys
sys.setdefaultencodingutf-8
headers={User-Agent:Mozilla/5.0 Windows NT 10.0; WOW64 AppleWebKit/537.36 KHTML, like Gecko Chrome/50.0.2661.87 Safari/537.36, Accept-Language:zh-CN,zh;q=0.8, Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8, Referer:http://www.itaotm.com/search.php?seat=%E7%94%B3%E8%AF%B7%E4%BA%BA&searchKey=%E5%8C%97%E4%BA%AC, Upgrade-Insecure-Requests:1} session=requests.session def get_detailpage: url=http://www.itaotm.com/search!page.php?pageNo=1&l=20161019113636&gjfls=1%3B2%3B3%3B4%3B5%3B6%3B7%3B8%3B9%3B10%3B11%3B12%3B13%3B14%3B15%3B16%3B17%3B18%3B19%3B20%3B21%3B22%3B23%3B24%3B25%3B26%3B27%3B28%3B29%3B30%3B31%3B32%3B33%3B34&gjfl=0&seat=%E7%94%B3%E8%AF%B7%E4%BA%BA&searchKey=%E5%8C%97%E4%BA%AC data={pageNo:1,l:20161019113636,gjfls:1%3B2%3B3%3B4%3B5%3B6%3B7%3B8%3B9%3B10%3B11%3B12%3B13%3B14%3B15%3B16%3B17%3B18%3B19%3B20%3B21%3B22%3B23%3B24%3B25%3B26%3B27%3B28%3B29%3B30%3B31%3B32%3B33%3B34, gjfl:0,seat:%E7%94%B3%E8%AF%B7%E4%BA%BA,searchKey:%E5%8C%97%E4%BA%AC} html=session.geturl=url,headers=headers,params=data print html.text get_detailpage

在填好搜索条件后,点击查询,页面上会显示要几秒后才能出结果,然后直接出现结果的页面。
爬取的网址是最后出现结果的网址,但爬取到的内容会出现:

<div class="cls"></div>
<div class="mainBox"> <div class="jump"> 数据量巨大,正在努力查询中.... 预计<span id="jumpTo">5</span>秒后出结果 </div> <div class="cls"></div>
</div> 

并不是想要的最终结果,怎么才能爬取到正确结果呢?不用selenium的话

只能抓包分析

简单点,在浏览器的开发工具里看网络连接数据,看看最后的数据是怎么取得的~

抓包的话就能发现实际是对url访问了两次,猜测应该是在第一次返回的网页里有一段js代码用于跳转的

用phatomjs

发表评论

电子邮件地址不会被公开。 必填项已用*标注