我虚拟机中的操作系统是Ubuntu 17.10,安装时使用英文为默认语言,且只安装了命令行模式,未安装图形界面。之后运用scrapy进行中文网站页面抓取实验。该网站为utf-8编码,以下是我的parse函数:
def parse(self, response):
title_list = response.xpath('//ul[@class="sellListContent"]/li/div[1]/div[1]/a/text()').extract()
price_list = response.xpath('//ul[@class="sellListContent"]/li/div[1]/div[6]/div[1]/span/text()').re('[,0-9]+')
desc_list = response.xpath('//ul[@class="sellListContent"]/li/div[1]/div[2]/div/text()').extract()
addr_list = response.xpath('//ul[@class="sellListContent"]/li/div[1]/div[2]/div/a/text()').extract()
image_urls_list = response.xpath('//ul[@class="sellListContent"]/li/a/img/@src').extract()
for i,j,k,l,m in zip(title_list, price_list, desc_list,addr_list,image_urls_list):
self.log("title:%s" % i.encode('utf-8'))
self.log("price:%s" % j.encode('utf-8'))
self.log("descr:%s" % k.encode('utf-8'))
self.log("addr:%s" % l.encode('utf-8'))
self.log("image_urls:%s" % m)
通过xshell连接该虚拟机,运行spcapy crawl ..命令,输出中显示都是乱码,请问是设么原因?