There was problem when I tried to grab BAIDU tongji infor.
When I logined success, there was a 302 redirection to main page.
That means the internal redirect was from https://tongji.baidu.com/web/welcome/ico?s=sdfsdfsdfsdf
to s://tongji.baidu.com/web/12323243/overview/index?siteId=sdfsf
.
I wonder know that how does the program(may be the broswer? I am also not clear. LOL) pass the cookie from the 320 page to the destination page? and Why?
Could anyone do me a favor? Thanks in advace.
Append:
302 page : https://tongji.baidu.com/web/welcome/ico?s=sdfsdfsdfsdf
destination page: s://tongji.baidu.com/web/12323243/overview/index?siteId=sdfsf
1
Cooky 2017-12-08 16:31:01 +08:00 via Android
Chinese please or go to Stack Overflow
|
2
shanechiu OP @Cooky I am a little worried about whether this question lives up to the Stack Overflow's strict standard.
|
3
vincenttone 2017-12-08 16:35:05 +08:00
不知道你看得懂中文不
中文答案: 1. http 是无状态的 2. cookie 是通过 header 传递的 3. 留意一下 cookie 的域 |
4
hhacker 2017-12-08 16:38:11 +08:00
Because the cookie is shared by same domain
|
5
shanechiu OP @vincenttone well, Is it that means the 302 page request cookie will also pass to the destination page by header and it also acts as a request cookie in the destination page?
|
6
fml87 2017-12-08 16:50:47 +08:00
logined 是什么
|
7
shanechiu OP @fml87 a past tense of word "login", it means events or actions happen in the past.
|
8
vincenttone 2017-12-08 16:54:42 +08:00
@shanechiu 如果你想理解 cookie 在 302 页面中的表现,就必须先了解 cookie 在普通页面中的表现。
如我刚才所说: 1. http 是无状态的 这个是前提。 cookie 存在本地,无状态的情况下,不关心你有没有做 302 跳转。 |
9
shanechiu OP @vincenttone Thanks for your kindness and patience. There seems like a outline about this.
|
10
knightdf 2017-12-08 18:04:17 +08:00
这么秀的吗?看历史原来你不是会中文么?
|
11
yospan 2017-12-09 15:19:00 +08:00
之前刚做了,用 session 啊,统计后台设置个第三方密码,然后 post 给他,保持 session 去请求其他页面,接着统计里的数据随便拿~ 那去参考下把;我是 py 新手;
``` ##百度统计的第三查看密码,登录并获得 session 和 siteid idwd = {'passwd': '66666'} S = Session() logined = S.post("https://tongji.baidu.com/web/welcome/ico?s=8dfdafdafadfa4bccd", data=idwd, headers=REQ_HEADERS) #获得 siteid,并转换成字符串 siteid= str(logined.url.split("=")[1]) webid = str(logined.url.split("/")[4]) ##搜索词的 post 参数 keyjson = {"siteId":siteid,"st":"","et":"","st2":"","et2":"","indicators":"['pv_count','visitor_count','ip_count','bounce_ratio','avg_visit_time']","order":"pv_count,desc","offset":"","pageSize":"","target":"-1","flag":"indicator","source":"","isGroup":"0","clientDevice":"all","reportId":"12","method":"source/searchword/a","queryId":""} readkeyjson = S.post("https://tongji.baidu.com/web/"+webid+"/ajax/post", data=keyjson, headers=REQ_HEADERS) #按文本读取 jsondata = readkeyjson.text #格式化 json readjsondict = json.loads(jsondata) keyNamejson = readjsondict['data']['items'][0] for items in keyNamejson: items2 = items print(items2[0]['name']) ``` |