爬虫入门一把搜

释放双眼,带上耳机,听听看~!

实战下载母猫图,哈哈哈哈

# -*- coding:UTF-8 -*-import urllib.request#引用模块
response = urllib.request.urlopen("http://placekitten.com/g/200/300")'''urlopen是什么。URLopen可以简单理解为打开,里面可以是条字符串'''meinv_img = response.read()#
with open("meinvtu.jpg",'wb')as f:# f.write(meinv_img)#

>>> response.geturl()'http://placekitten.com/g/200/300'>>> response.info()<http.client.HTTPMessage object at 0x04763E70>>>> print(response.info())Date: Wed, 19 Jun 2019 16:06:49 GMTContent-Length: 6327Connection: closeSet-Cookie: __cfduid=d1ea0fa0d89b88bce166015e474dfc91c1560960409; expires=Thu, 18-Jun-20 16:06:49 GMT; path=/; domain=.placekitten.com; HttpOnlyAccess-Control-Allow-Origin: *Cache-Control: public, max-age=86400Expires: Thu, 20 Jun 2019 16:06:49 GMTCF-Cache-Status: HITAccept-Ranges: bytesVary: Accept-EncodingServer: cloudflareCF-RAY: 4e96c09e7b469971-LAX

实例二

进行在线翻译

游览器内容渗透基础之浅谈HTTP请求(小白文)在一文中很详细

首先在游览器中输入http://fanyi.youdao.com  

F12打开审查元素,如下所示也可以右键打开审查元素

点击

然后输入渗透云笔记查看

点击链接查看

把data沾出来

这些是与爬虫对应的

如下

Request URL: http://fanyi.youdao.com/translate_o?smartresult=dict&smartresult=rulei: 渗透云笔记from: AUTOto: AUTOsmartresult: dictclient: fanyideskwebsalt: 15609628090667sign: 6c2f918076d0c0a5426e1b7bcbf4b33ats: 1560962809066bv: 8eb5748fd9d9cf1da538ed0cc7b0c0e5doctype: jsonversion: 2.1keyfrom: fanyi.webaction: FY_BY_CLICKBUTTION
  1. 如下

    # -*- coding:UTF-8 -*-import urllib.requestimport urllib.parseurl = 'http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule'#直接从审查元素中copy过来的url会报错,必须把translate_o中的_o 删除才可以
    #url = "http://fanyi.youdao.com/translate_o?smartresult=dict&smartresult=rule"
    data = {}data['i'] = '渗透云笔记'data['from'] = 'AUTO'data['to'] = 'AUTO'data['smartresult'] = 'dict'data['client'] = 'fanyideskweb'data['salt'] = '15609628090667'data['sign'] = '6c2f918076d0c0a5426e1b7bcbf4b33a'data['ts'] = '1560962809066'data['bv'] = '8eb5748fd9d9cf1da538ed0cc7b0c0e5'data['doctype'] = 'json'data['version'] = '2.1'data['keyfrom'] = 'fanyi.web'data['action'] = 'FY_BY_CLICKBUTTION'
    #使用urllib.parse.urlencode() 把data转换为需要的形式#带上编码(utf-8data = urllib.parse.urlencode(data).encode('utf-8') response = urllib.request.urlopen(url, data)html = response.read().decode('utf-8')print(html)

运行结果为

>>>  {"type":"ZH_CN2EN","errorCode":0,"elapsedTime":1,"translateResult":[[{"src":"渗透云笔记","tgt":"Penetrate cloud notes"}]]}
>>>

这样舒服

乍一看妈耶,这不是个字典吧

>>> type(html)<class 'str'>>>> 

但是这里返回的是字符串,是因为使用了JSON格式说白了就是用字符串把Python的数据结构封装一下子

下面美化一下我们的程序

# -*- coding:UTF-8 -*-import urllib.requestimport urllib.parseimport jsoncontent = input("请输入翻译的内容:")url = 'http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule'#直接从审查元素中copy过来的url会报错,必须把translate_o中的_o 删除才可以
#url = "http://fanyi.youdao.com/translate_o?smartresult=dict&smartresult=rule"
data = {}data['i'] = contentdata['from'] = 'AUTO'data['to'] = 'AUTO'data['smartresult'] = 'dict'data['client'] = 'fanyideskweb'data['salt'] = '15609628090667'data['sign'] = '6c2f918076d0c0a5426e1b7bcbf4b33a'data['ts'] = '1560962809066'data['bv'] = '8eb5748fd9d9cf1da538ed0cc7b0c0e5'data['doctype'] = 'json'data['version'] = '2.1'data['keyfrom'] = 'fanyi.web'data['action'] = 'FY_BY_CLICKBUTTION'
#使用urllib.parse.urlencode() 把data转换为需要的形式#带上编码(utf-8data = urllib.parse.urlencode(data).encode('utf-8') response = urllib.request.urlopen(url, data)
html = response.read().decode('utf-8')target = json.loads(html)print("翻译结果%s"%(target["translateResult"][0][0]['tgt']))

运行结果

请输入翻译的内容:卧槽翻译结果Oh my god>>> 

安装Beautiful Soup

命令也是

    pip install beautifulsoup4easy_install beautifulsoup4

还是cmd进入C:\Python34\Scripts>

明天爬小说

睡觉

求打赏类似了

本文源自微信公众号:渗透云笔记

人已赞赏
安全工具

个人年度总结及AWD线下赛复盘

2019-10-16 17:16:28

安全工具

渗透基础之SQL注入

2019-10-16 17:16:35

0 条回复 A文章作者 M管理员
    暂无讨论,说说你的看法吧
个人中心
购物车
优惠劵
今日签到
有新私信 私信列表
搜索