1
seki 2015-09-05 18:48:27 +08:00
信息是提交并 post 给同一个地址的,每次更改之后重新载入了而已
审查 select ,可以看到 onchange 绑定,然后可以去找 <script>,代码也是明文的 |
2
seki 2015-09-05 18:51:24 +08:00 1
关于 python 的部分,用 urllib2 或者 requests 来构造相同的 post 请求
至于后台有什么反爬虫检测,这个就不清楚了,保守估计是不会有的,遇到了再说 |
3
ljcarsenal 2015-09-05 19:33:47 +08:00
不是 ajax 可以看到 select 的 onchange 绑定了事件
|
4
rwalle 2015-09-05 19:40:23 +08:00
看 Network 标签
|
5
1130335361 2015-09-05 19:51:49 +08:00 1
|
6
explist OP Network 标签看不了,或许因为这是银行网站
onchange 看见了,但是...但是我根本解读不了它(对 HTML 知之甚少) |
7
explist OP def ghtest ():
url = r'http://www.icbc.com.cn/ICBCDynamicSite2/other/rmbdeposit.aspx' req = request.Request (url ) req.add_header ("User-Agent",'') g=ghHtml () # HTMLParser with request.urlopen (req ) as f: g.feed (f.read ().decode ()) dataDict={} for item in g.dates: dataDict['id'] = item log=parse.urlencode (dataDict ).encode ('utf-8') f = request.urlopen (url,log ) # dosoming f.close () |
8
paradoxs 2015-09-05 19:59:16 +08:00
|
9
Shy07 2015-09-05 20:25:36 +08:00
写了一个 Ruby 版的,只要改个日期就可以了
```ruby require 'net/http' params = { 'Sel_Date' => '2012-07-06', # 修改日期即可 '__EVENTTARGET' => 'Sel_Date', '__EVENTARGUMENT' => '', '__LASTFOCUS' => '', '__VIEWSTATE' => '/wEPDwUJNDkwNDM1MTYwD2QWAgIDD2QWAgIBD2QWBmYPEGQPFiFmAgECAgIDAgQCBQIGAgcCCAIJAgoCCwIMAg0CDgIPAhACEQISAhMCFAIVAhYCFwIYAhkCGgIbAhwCHQIeAh8CIBYhEAUP6K+36YCJ5oup5pe26Ze0ZWcQBQoyMDE1LTA4LTI2BQoyMDE1LTA4LTI2ZxAFCjIwMTUtMDYtMjgFCjIwMTUtMDYtMjhnEAUKMjAxNS0wNS0xMQUKMjAxNS0wNS0xMWcQBQoyMDE1LTAzLTAxBQoyMDE1LTAzLTAxZxAFCjIwMTQtMTEtMjIFCjIwMTQtMTEtMjJnEAUKMjAxMi0wNy0wNgUKMjAxMi0wNy0wNmcQBQoyMDEyLTA2LTA4BQoyMDEyLTA2LTA4ZxAFCjIwMTEtMDctMDcFCjIwMTEtMDctMDdnEAUKMjAxMS0wNC0wNgUKMjAxMS0wNC0wNmcQBQoyMDExLTAyLTA5BQoyMDExLTAyLTA5ZxAFCjIwMTAtMTItMjYFCjIwMTAtMTItMjZnEAUKMjAxMC0xMC0yMAUKMjAxMC0xMC0yMGcQBQoyMDA4LTEyLTIzBQoyMDA4LTEyLTIzZxAFCjIwMDgtMTEtMjcFCjIwMDgtMTEtMjdnEAUKMjAwOC0xMC0zMAUKMjAwOC0xMC0zMGcQBQoyMDA4LTEwLTA5BQoyMDA4LTEwLTA5ZxAFCjIwMDctMTItMjEFCjIwMDctMTItMjFnEAUKMjAwNy0wOS0xNQUKMjAwNy0wOS0xNWcQBQoyMDA3LTA4LTIyBQoyMDA3LTA4LTIyZxAFCjIwMDctMDctMjEFCjIwMDctMDctMjFnEAUKMjAwNy0wNS0xOQUKMjAwNy0wNS0xOWcQBQoyMDA3LTAzLTE4BQoyMDA3LTAzLTE4ZxAFCjIwMDYtMDgtMTkFCjIwMDYtMDgtMTlnEAUKMjAwNC0xMC0yOQUKMjAwNC0xMC0yOWcQBQoyMDAyLTAyLTIxBQoyMDAyLTAyLTIxZxAFCjE5OTktMDYtMTAFCjE5OTktMDYtMTBnEAUKMTk5OC0xMi0wNwUKMTk5OC0xMi0wN2cQBQoxOTk4LTA3LTAxBQoxOTk4LTA3LTAxZxAFCjE5OTgtMDMtMjUFCjE5OTgtMDMtMjVnEAUKMTk5Ny0xMC0yMwUKMTk5Ny0xMC0yM2cQBQoxOTk2LTA4LTIzBQoxOTk2LTA4LTIzZxAFCjE5OTYtMDUtMDEFCjE5OTYtMDUtMDFnFgFmZAIBDxYCHgRUZXh0BQoyMDE1LTA4LTI2ZAICDxYCHwAFohU8dGFibGUgYm9yZGVyPSIxIiBjZWxscGFkZGluZz0iMCIgY2VsbHNwYWNpbmc9IjAiIHdpZHRoPSI4NSUiICBydWxlcz0iYWxsIiBmcmFtZT0iYm9yZGVyIiBzdHlsZT0iYm9yZGVyLWNvbGxhcHNlOmNvbGxhcHNlOyBib3JkZXItY29sb3I6ICNDQ0NDQ0M7Ij48dGJvZHk+PHRyPjx0ZCB3aWR0aD0iNTclIiAgdmFsaWduPSJjZW50ZXIiIGJnY29sb3I9IiNlOGU4ZTgiPjxwIGFsaWduPSJjZW50ZXIiPjxiPumhueebrjwvYj48L3RkPjx0ZCB3aWR0aD0iNDMlIiBiZ2NvbG9yPSIjZThlOGU4IiBoZWlnaHQ9IjE5Ij48cCBhbGlnbj0iY2VudGVyIj48Yj7lubTliKnnjoc8L2I+JTwvdGQ+PC90cj48dHI+PHRkIHdpZHRoPSI1NyUiIGhlaWdodD0iMTkiIGFsaWduPSJsZWZ0Ij7kuIDjgIHln47kuaHlsYXmsJHlj4rljZXkvY3lrZjmrL48L3RkPjx0ZCBoZWlnaHQ9IjE5IiB3aWR0aD0iNDMlIiBhbGlnbj0iY2VudGVyIj4mbmJzcDs8L3RkPjwvdHI+PHRyPjx0ZCB3aWR0aD0iNTclIiBoZWlnaHQ9IjE5Ij48ZGl2IGFsaWduPSJsZWZ0Ij7vvIjkuIDvvInmtLvmnJ88L2Rpdj48L3RkPjx0ZCBoZWlnaHQ9IjE5IiB3aWR0aD0iNDMlIiBhbGlnbj0iY2VudGVyIj4wLjM1PC90ZD48L3RyPjx0cj48dGQgd2lkdGg9IjU3JSIgaGVpZ2h0PSIxOSIgYWxpZ249ImxlZnQiPu+8iOS6jO+8ieWumuacnzwvdGQ+PHRkIGhlaWdodD0iMTkiIHdpZHRoPSI0MyUiIGFsaWduPSJjZW50ZXIiPiZuYnNwOzwvdGQ+PC90cj48dHI+PHRkIHdpZHRoPSI1NyUiIGhlaWdodD0iMTkiPjxkaXYgYWxpZ249ImxlZnQiPjEu5pW05a2Y5pW05Y+WPC9kaXY+PC90ZD48dGQgaGVpZ2h0PSIxOSIgd2lkdGg9IjQzJSIgYWxpZ249ImNlbnRlciI+Jm5ic3A7PC90ZD48L3RyPjx0cj48dGQgd2lkdGg9IjU3JSIgYWxpZ249ImNlbnRlciIgaGVpZ2h0PSIxOSI+5LiJ5Liq5pyIPC90ZD48dGQgd2lkdGg9IjQzJSIgYWxpZ249ImNlbnRlciIgaGVpZ2h0PSIxOSI+MS42PC90ZD48L3RyPjx0cj48dGQgd2lkdGg9IjU3JSIgYWxpZ249ImNlbnRlciIgaGVpZ2h0PSIxOSI+5Y2K5bm0PC90ZD48dGQgd2lkdGg9IjQzJSIgYWxpZ249ImNlbnRlciIgaGVpZ2h0PSIxOSI+MS44PC90ZD48L3RyPjx0cj48dGQgd2lkdGg9IjU3JSIgYWxpZ249ImNlbnRlciIgaGVpZ2h0PSIxOSI+5LiA5bm0PC90ZD48dGQgd2lkdGg9IjQzJSIgYWxpZ249ImNlbnRlciIgaGVpZ2h0PSIxOSI+MjwvdGQ+PC90cj48dHI+PHRkIHdpZHRoPSI1NyUiIGFsaWduPSJjZW50ZXIiIGhlaWdodD0iMTkiPuS6jOW5tDwvdGQ+PHRkIHdpZHRoPSI0MyUiIGFsaWduPSJjZW50ZXIiIGhlaWdodD0iMTkiPjIuNTwvdGQ+PC90cj48dHI+PHRkIHdpZHRoPSI1NyUiIGFsaWduPSJjZW50ZXIiIGhlaWdodD0iMTkiPuS4ieW5tDwvdGQ+PHRkIHdpZHRoPSI0MyUiIGFsaWduPSJjZW50ZXIiIGhlaWdodD0iMTkiPjM8L3RkPjwvdHI+PHRyPjx0ZCB3aWR0aD0iNTclIiBhbGlnbj0iY2VudGVyIiBoZWlnaHQ9IjE5Ij7kupTlubQ8L3RkPjx0ZCB3aWR0aD0iNDMlIiBhbGlnbj0iY2VudGVyIiBoZWlnaHQ9IjE5Ij4zLjA1PC90ZD48L3RyPjx0cj48dGQgd2lkdGg9IjU3JSIgaGVpZ2h0PSIxOSI+PGRpdiBhbGlnbj0ibGVmdCI+Mi7pm7blrZjmlbTlj5bjgIHmlbTlrZjpm7blj5bjgIHlrZjmnKzlj5bmga88L2Rpdj48L3RkPjx0ZCB3aWR0aD0iNDMlIiBoZWlnaHQ9IjE5Ij4mbmJzcDs8L3RkPjwvdHI+PHRyPjx0ZCB3aWR0aD0iNTclIiBhbGlnbj0iY2VudGVyIiBoZWlnaHQ9IjE5Ij7kuIDlubQ8L3RkPjx0ZCB3aWR0aD0iNDMlIiBhbGlnbj0iY2VudGVyIiBoZWlnaHQ9IjE5Ij4xLjY8L3RkPjwvdHI+PHRyPjx0ZCB3aWR0aD0iNTclIiBhbGlnbj0iY2VudGVyIiBoZWlnaHQ9IjE5Ij7kuInlubQ8L3RkPjx0ZCB3aWR0aD0iNDMlIiBhbGlnbj0iY2VudGVyIiBoZWlnaHQ9IjE5Ij4xLjg8L3RkPjwvdHI+PHRyPjx0ZCB3aWR0aD0iNTclIiBhbGlnbj0iY2VudGVyIiBoZWlnaHQ9IjE5Ij7kupTlubQ8L3RkPjx0ZCB3aWR0aD0iNDMlIiBhbGlnbj0iY2VudGVyIiBoZWlnaHQ9IjE5Ij4xLjg1PC90ZD48L3RyPjx0cj48dGQgaGVpZ2h0PSIxOSI+PGRpdiBhbGlnbj0ibGVmdCI+My7lrprmtLvkuKTkvr88L2Rpdj48L3RkPjx0ZCBjb2xzcGFuPSIyIiBoZWlnaHQ9IjE5IiBhbGlnbj0ibGVmdCI+5oyJ5LiA5bm05Lul5YaF5a6a5pyf5pW05a2Y5pW05Y+W5ZCM5qGj5qyh5Yip546H5omTNuaKmDwvdGQ+PC90cj48dHI+PHRkIGhlaWdodD0iMTkiPjxkaXYgYWxpZ249ImxlZnQiPuS6jOOAgeWNj+WumuWtmOasvjwvZGl2PjwvdGQ+PHRkIGNvbHNwYW49IjIiIGFsaWduPSJjZW50ZXIiIGhlaWdodD0iMTkiPjEuMTU8L3RkPjwvdHI+PHRyPjx0ZCB3aWR0aD0iNTclIiBoZWlnaHQ9IjE5Ij48ZGl2IGFsaWduPSJsZWZ0Ij7kuInjgIHpgJrnn6XlrZjmrL48L2Rpdj48L3RkPjx0ZCB3aWR0aD0iNDMlIiBoZWlnaHQ9IjE5Ij48Zm9udCBjb2xvcj0iI2ViZWJlYiI+LjwvZm9udD48L3RkPjwvdHI+PHRyPjx0ZCB3aWR0aD0iNTclIiBhbGlnbj0iY2VudGVyIiBoZWlnaHQ9IjE5Ij7kuIDlpKk8L3RkPjx0ZCB3aWR0aD0iNDMlIiBhbGlnbj0iY2VudGVyIiBoZWlnaHQ9IjE5Ij4wLjg8L3RkPjwvdHI+PHRyPjx0ZCB3aWR0aD0iNTclIiBhbGlnbj0iY2VudGVyIiBoZWlnaHQ9IjE5Ij7kuIPlpKk8L3RkPjx0ZCB3aWR0aD0iNDMlIiBhbGlnbj0iY2VudGVyIiBoZWlnaHQ9IjE5Ij4xLjM1PC90ZD48L3RyPjwvdGFibGU+ZGRDrgsxnIFuzBq+7MoE9zn85XGzBQ==' } uri = URI.parse ("http://www.icbc.com.cn/ICBCDynamicSite2/other/rmbdeposit.aspx") res = Net::HTTP.post_form uri, params puts res.body ``` |
10
Shy07 2015-09-05 20:58:35 +08:00
施工完毕
require 'net/http' uri = URI.parse ("http://www.icbc.com.cn/ICBCDynamicSite2/other/rmbdeposit.aspx") html = Net::HTTP.get uri dates = [] html.scan (/<option value="(\d{4}-\d{2}-\d{2})">/) {|s| dates += s } html =~ /name="__VIEWSTATE" id="__VIEWSTATE" value="(.*)" \/>/ params = { '__EVENTTARGET' => 'Sel_Date', '__EVENTARGUMENT' => '', '__LASTFOCUS' => '', '__VIEWSTATE' => $1.clone } dates.each do |date| params['Sel_Date'] = date res = Net::HTTP.post_form uri, params # 正则提取具体内容就不写了,这里直接输出 html =_=b open ("#{date}.html", 'w') {|io| io.write res.body } end |
11
ljdawn 2015-09-05 21:10:53 +08:00
先给左边的时间抓下来。 然后挨个儿 post 一下。。
|
12
explist OP 有了时间列表后,如何构造 POST 请求?
|
14
Shy07 2015-09-05 22:01:32 +08:00 via iPhone
@explist
表单就五个参数, post 给原地址就可以了 '__EVENTTARGET' => 'Sel_Date', 固定 '__EVENTARGUMENT' => '', 固定 '__LASTFOCUS' => '', 固定 '__VIEWSTATE' => 那串 Base64 ,固定 'Sel_Date' => 日期,可变 |
16
miemiekurisu 2015-09-05 22:23:03 +08:00
....你直接起个 scrapy 用 xpath 抓页面数据不就结了么...省时省力...
|
17
Shy07 2015-09-05 22:34:42 +08:00 via iPhone 1
@explist
看他的 js ,最后是 submit 提交的,所以把页面里所有可以提交的表单元素找出来就行了 |
18
explist OP 出于学习目的问一下:
建设银行的这个网站: http://www.ccb.com/cn/personal/interest/rmbdeposit.html 如何爬取,源代码中并无 table 标签 |