Python scrapy 爬虫问题

用 scrapy 框架爬智联的招聘信息的时候报的错看不懂啊
2019-04-09 23:29:10 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request
2019-04-09 23:29:10 [selenium.webdriver.remote.remote_connection] DEBUG: POST http://127.0.0.1:50132/session/b97f6963939467e28aa83493fcf91f9d/url {"url": "https://zhaopin.com", "sessionId": "b97f6963939467e28aa83493fcf91f9d"}
[7964:9720:0409/232912.471:ERROR:ssl_client_socket_impl.cc(964)] handshake failed; returned -1, SSL error code 1, net_error -100
[7964:9720:0409/232912.505:ERROR:ssl_client_socket_impl.cc(964)] handshake failed; returned -1, SSL error code 1, net_error -100
[7964:10376:0409/232913.146:ERROR:platform_sensor_reader_win.cc(242)] NOT IMPLEMENTED
2019-04-09 23:29:14 [urllib3.connectionpool] DEBUG: http://127.0.0.1:50132 "POST /session/b97f6963939467e28aa83493fcf91f9d/url HTTP/1.1" 200 72
2019-04-09 23:29:14 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request
2019-04-09 23:29:14 [selenium.webdriver.remote.remote_connection] DEBUG: GET http://127.0.0.1:50132/session/b97f6963939467e28aa83493fcf91f9d/window_handle {"sessionId": "b97f6963939467e28aa83493fcf91f9d"}
2019-04-09 23:29:14 [urllib3.connectionpool] DEBUG: http://127.0.0.1:50132 "GET /session/b97f6963939467e28aa83493fcf91f9d/window_handle HTTP/1.1" 200 111
2019-04-09 23:29:14 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request
2019-04-09 23:29:14 [selenium.webdriver.remote.remote_connection] DEBUG: POST http://127.0.0.1:50132/session/b97f6963939467e28aa83493fcf91f9d/element {"using": "class name", "value": "zp-search__input", "sessionId": "b97f6963939467e28aa83493fcf9
1f9d"}
2019-04-09 23:29:14 [urllib3.connectionpool] DEBUG: http://127.0.0.1:50132 "POST /session/b97f6963939467e28aa83493fcf91f9d/element HTTP/1.1" 200 102
2019-04-09 23:29:14 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request

这是代码
class JobsSpider(scrapy.Spider):
name = 'jobs'
allowed_domains = ['zhaopin.com']
start_urls = ['https://www.zhaopin.com/']

def start_requests(self):
browser = webdriver.Chrome()
browser.get("https://zhaopin.com")
windows = browser.current_window_handle
input = browser.find_element_by_class_name('zp-search__input')
input.send_keys('Python')
time.sleep(1)
button = browser.find_element_by_class_name('zp-search__btn')
button.click()
all_handles = browser.window_handles
for handle in all_handles:
if handle != windows:
browser.switch_to.window(handle)
url = browser.current_url
yield Request(url,callback = self.parse)

def parse(self, response):
le = LinkExtractor(restrict_css='div.contentpile__content__wrapper__item.clearfix')
for link in le.extract_links(response):
yield scrapy.Request(link.url,callback=self.parse_job)

def parse_job(self,response):
jobs = JobItem()
sel = response.css('div.main')
jobs['jobname'] = sel.css('hi.l.info-h3::text').extract_first()
jobs['Cname'] = sel.css('div.company 1::text').extract_first()
jobs['salary'] = sel.css('div.l.info-money strong::text').extract_first()
jobs['joblocation'] = sel.css('span.icon-address::text').extract_first()
jobs['experience'] = sel.css('div.info-three.1').xpath('(.//span)[1].text()').extract_first()
jobs['education'] =sel.css('div.info-three.1').xpath('(.//span)[2].text()').extract_first()
jobs['count'] =sel.css('div.info-three.1').xpath('(.//span)[3].text()').extract_first()
jobs['jobintro'] = sel.css('div.pos-ul').extract
yield jobs

这是不是和 cookie 有什么关系啊求各位大佬解答

第 1 条附言 · 2019-04-10 16:01:45 +08:00

class JobsSpider(scrapy.Spider): name = 'jobs' allowed_domains = ['zhaopin.com'] start_urls = ['https://www.zhaopin.com/']

def start_requests(self):
    browser = webdriver.Chrome()
    browser.get("https://zhaopin.com")
    windows = browser.current_window_handle
    input = browser.find_element_by_class_name('zp-search__input')
    input.send_keys('Python')
    time.sleep(1)
    button = browser.find_element_by_class_name('zp-search__btn')
    button.click()
    all_handles = browser.window_handles
    for handle in all_handles:
        if handle != windows:
            browser.switch_to.window(handle)
    url = browser.current_url
    yield Request(url,callback = self.parse)

def parse(self, response):
    le = LinkExtractor(restrict_css='div.contentpile__content__wrapper__item.clearfix')
    for link in le.extract_links(response):
        yield scrapy.Request(link.url,callback=self.parse_job)

def parse_job(self,response):
    jobs = JobItem()
    sel = response.css('div.main')
    jobs['jobname'] = sel.css('hi.l.info-h3::text').extract_first()
    jobs['Cname'] = sel.css('div.company 1::text').extract_first()
    jobs['salary'] = sel.css('div.l.info-money strong::text').extract_first()
    jobs['joblocation'] = sel.css('span.icon-address::text').extract_first()
    jobs['experience'] = sel.css('div.info-three.1').xpath('(.//span)[1].text()').extract_first()
    jobs['education'] =sel.css('div.info-three.1').xpath('(.//span)[2].text()').extract_first()
    jobs['count'] =sel.css('div.info-three.1').xpath('(.//span)[3].text()').extract_first()
    jobs['jobintro'] = sel.css('div.pos-ul').extract
    yield jobs

3 条回复

huisezhiyin

2019-04-10 15:13:00 +08:00

你这个代码格式贴的让人很难看得懂啊

idotfish

2019-04-10 15:46:10 +08:00

@huisezhiyin 不好意思，刚刚入门 python，不太懂这些东西，把代码直接截图出来可以吗

huisezhiyin

2019-04-10 16:17:04 +08:00

@idotfish 你这随便搜一下 ERROR 就有答案啊
随便搜一下 error:ssl_client_socket_impl.cc(964)] handshake failed
stack overflow 上的一个答案
https://stackoverflow.com/questions/37883759/errorssl-client-socket-openssl-cc1158-handshake-failed-with-chromedriver-chr
不行的话就试试其他的答案