问答

请输入您的问题

在爬虫工程下运行scrapy crawl报错: Traceback (most recent call last): File "/usr/bin/scrapy", line 11, in load_entry_point('Scrapy==1.6.0', 'console_scripts', 'scrapy')() File "/usr/lib/python2.7/site-packages/Scrapy-1.6.0-py2.7.egg/scrapy/cmdline.py", line 150, in execute _run_print_help(parser, _run_command, cmd, args, opts) File "/usr/lib/python2.7/site-packages/Scrapy-1.6.0-py2.7.egg/scrap
2018-06-20 01:03:53 [twisted] CRITICAL: Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 1299, in _inlineCallbacks result = g.send(result) File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 98, in crawl six.reraise(*exc_info) File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 79, in crawl self.spi
操作系统Ubuntu17,在scrapy中使用selenium时报错,错误代码如下: File "/usr/local/lib/python2.7/dist-packages/selenium-3.0.0b2-py2.7.egg/selenium/webdriver/firefox/webdriver.py", line 65, in __init__ self.service.start() File "/usr/local/lib/python2.7/dist-packages/selenium-3.0.0b2-py2.7.egg/selenium/webdriver/common/service.py", line 71, in start os.path.basename(self.path), self.start_error_message) selenium.co
利用如下代码从多页爬取,但似乎Request没有调用,或其回调函数没有调用?
def parse(self, response):
    #水平爬取
    next_selector = response.xpath('//*[contains(@class, "house-lst-page-box")]//a[last()]/@href')
    for url in next_selector.extract():
    yield Request(urlparse.urljoin(response.url, url))
    #垂直爬取

我虚拟机中的操作系统是Ubuntu 17.10,安装时使用英文为默认语言,且只安装了命令行模式,未安装图形界面。之后运用scrapy进行中文网站页面抓取实验。该网站为utf-8编码,以下是我的parse函数:
def parse(self, response):
title_list = response.xpath('//ul[@class="sellListContent"]/li/div[1]/div[1]/a/text()').extract()
price_list = response.xpath('//ul[@class="sellListContent"]/li/div[1]/div[6]/div[1]/span/text()').re('[,0-9]+')