大米CMS官网论坛,大米站长联盟,大米站长之家,大米开发者社区

标题: python scrapy多个items多管道的使用 [打印本页]

作者: 追影    时间: 2023-6-30 17:28
标题: python scrapy多个items多管道的使用
多个items
这个比较简单,在items.py文件内创建相应的类,在spider中引入即可
items.py
  1. import scrapy
  2. class MymultispiderItem(scrapy.Item):
  3.     # define the fields for your item here like:
  4.     # name = scrapy.Field()
  5.     pass

  6. class Myspd1spiderItem(scrapy.Item):
  7.     name = scrapy.Field()

  8. class Myspd2spiderItem(scrapy.Item):
  9.     name = scrapy.Field()

  10. class Myspd3spiderItem(scrapy.Item):
  11.     name = scrapy.Field()
复制代码


spider内使用对应的items
  1. import scrapy
  2. from mymultispider.items import Myspd1spiderItem

  3. class Myspd1Spider(scrapy.Spider):
  4.     name = 'myspd1'
  5.     allowed_domains = ['sina.com.cn']
  6.     start_urls = ['http://sina.com.cn/']

  7.     def parse(self, response):
  8.         print('myspd1')
  9.         item = Myspd1spiderItem()
  10.         item['name'] = 'myspd1的pipelines'
  11.         yield item
复制代码




四,指定pipelines


1,这个也有两种方法,方法一,定义多个pipeline类:


pipelines.py文件内:

  1. class Myspd1spiderPipeline:
  2.     def process_item(self,item,spider):
  3.         print(item['name'])
  4.         return item

  5. class Myspd2spiderPipeline:
  6.     def process_item(self,item,spider):
  7.         print(item['name'])
  8.         return item

  9. class Myspd3spiderPipeline:
  10.     def process_item(self,item,spider):
  11.         print(item['name'])
  12.         return item
复制代码



1.1settings.py文件开启管道


  1. ITEM_PIPELINES = {
  2.    'mymultispider.pipelines.Myspd1spiderPipeline': 300,
  3.    'mymultispider.pipelines.Myspd2spiderPipeline': 300,
  4.    'mymultispider.pipelines.Myspd3spiderPipeline': 300,
  5. }
复制代码


1.2spider中设置管道

  1. mport scrapy
  2. from mymultispider.items import Myspd1spiderItem

  3. class Myspd1Spider(scrapy.Spider):
  4.     name = 'myspd1'
  5.     allowed_domains = ['sina.com.cn']
  6.     start_urls = ['http://sina.com.cn/']
  7.     custom_settings = {
  8.         'ITEM_PIPELINES': {'mymultispider.pipelines.Myspd1spiderPipeline': 300},
  9.     }

  10.     def parse(self, response):
  11.         print('myspd1')
  12.         item = Myspd1spiderItem()
  13.         item['name'] = 'myspd1的pipelines'
  14.         yield item
复制代码






指定管道的代码


  1. custom_settings = {
  2.         'ITEM_PIPELINES': {'mymultispider.pipelines.Myspd1spiderPipeline': 300},
  3.     }
复制代码








欢迎光临 大米CMS官网论坛,大米站长联盟,大米站长之家,大米开发者社区 (https://www.damicms.com/bbs/) Powered by Discuz! X3.1