requests库和python中XPath
典例
1 | import requests |
response对象
类成员 | 使用方法 |
---|---|
status_code | 状态码 (200 is OK, 404 is Not Found) |
text | 以unicode返回内容 |
apparent_encoding | 返回响应显式的编码(可能有隐编码) |
encoding | Returns the encoding used to decode r.text |
json() | 返回response的json类型,前提是response是json格式的,否则报错 |
ok | 小于400则ok |
close() | Closes the connection to the server |
content | Returns the content of the response, in bytes |
cookies | Returns a CookieJar object with the cookies sent back from the server |
elapsed | Returns a timedelta object with the time elapsed from sending the request to the arrival of the response |
headers | Returns a dictionary of response headers |
history | Returns a list of response objects holding the history of request (url) |
is_permanent_redirect | Returns True if the response is the permanent redirected url, otherwise False |
is_redirect | Returns True if the response was redirected, otherwise False |
iter_content() | Iterates over the response |
iter_lines() | Iterates over the lines of the response |
links | Returns the header links |
next | Returns a PreparedRequest object for the next request in a redirection |
raise_for_status() | If an error occur, this method returns a HTTPError object |
reason | Returns a text corresponding to the status code |
request | Returns the request object that requested this response |
url | Returns the URL of the response |
返回码速查
1 | 2xx:成功 |
post
1 | payload = {'username':'admin','password':'123456'} |
xpath-python
其它XPath:见元素选择XPath
Playwright安装及常用函数 | Min的博客 (xxminxx.love)
安装
1 | conda install lxml |
使用
爬虫典型常用
导库
1 | from lxml import etree |
string转为etree(html格式)
1 | html = etree.HTML(text) |
XPath匹配
1 | html.xpath(<xpath>) |
比如
1 | html.xpath('//li/a') |
其它
- etree除了从string创建也可以从文本文件创建
打开./test.html文件
1 | html = etree.parse('./test.html', etree.HTMLParser()) |
- 打印树
以html这个etree实例为例子,先转为string,变为bytes,再解码
1 | print(etree.tostring(html).decode('utf-8')) |
本博客所有文章除特别声明外,均采用 CC BY-NC-SA 4.0 许可协议。转载请注明来自 Min的博客!
评论