|
多个items
这个比较简单,在items.py文件内创建相应的类,在spider中引入即可
items.py
- import scrapy
- class MymultispiderItem(scrapy.Item):
- # define the fields for your item here like:
- # name = scrapy.Field()
- pass
- class Myspd1spiderItem(scrapy.Item):
- name = scrapy.Field()
- class Myspd2spiderItem(scrapy.Item):
- name = scrapy.Field()
- class Myspd3spiderItem(scrapy.Item):
- name = scrapy.Field()
复制代码
spider内使用对应的items
- import scrapy
- from mymultispider.items import Myspd1spiderItem
- class Myspd1Spider(scrapy.Spider):
- name = 'myspd1'
- allowed_domains = ['sina.com.cn']
- start_urls = ['http://sina.com.cn/']
- def parse(self, response):
- print('myspd1')
- item = Myspd1spiderItem()
- item['name'] = 'myspd1的pipelines'
- yield item
复制代码
四,指定pipelines
1,这个也有两种方法,方法一,定义多个pipeline类:
pipelines.py文件内:
- class Myspd1spiderPipeline:
- def process_item(self,item,spider):
- print(item['name'])
- return item
- class Myspd2spiderPipeline:
- def process_item(self,item,spider):
- print(item['name'])
- return item
- class Myspd3spiderPipeline:
- def process_item(self,item,spider):
- print(item['name'])
- return item
复制代码
1.1settings.py文件开启管道
- ITEM_PIPELINES = {
- 'mymultispider.pipelines.Myspd1spiderPipeline': 300,
- 'mymultispider.pipelines.Myspd2spiderPipeline': 300,
- 'mymultispider.pipelines.Myspd3spiderPipeline': 300,
- }
复制代码
1.2spider中设置管道
- mport scrapy
- from mymultispider.items import Myspd1spiderItem
- class Myspd1Spider(scrapy.Spider):
- name = 'myspd1'
- allowed_domains = ['sina.com.cn']
- start_urls = ['http://sina.com.cn/']
- custom_settings = {
- 'ITEM_PIPELINES': {'mymultispider.pipelines.Myspd1spiderPipeline': 300},
- }
- def parse(self, response):
- print('myspd1')
- item = Myspd1spiderItem()
- item['name'] = 'myspd1的pipelines'
- yield item
复制代码
指定管道的代码
- custom_settings = {
- 'ITEM_PIPELINES': {'mymultispider.pipelines.Myspd1spiderPipeline': 300},
- }
复制代码
|
|