Skip to content

About

asyncio的鼎鼎大名就不用多说了吧,谁用谁糊涂!

今天来看看它儿子怎么aiohttp怎么用。

download

python
pip install aiohttp

无返回值的多任务

python
import time
import asyncio
import aiohttp

urls = [
    'https://www.baidu.com', 
    'https://edgeapi.rubyonrails.org/', 
    'https://www.cnblogs.com',
    'https://www.bing.com', 
    'https://www.zhihu.com/',
]

async def get(url):  # async开头
    async with aiohttp.ClientSession() as session:   
        async with session.get(url) as result:
            print(result.status, result.url)
t1 = time.time()
loop = asyncio.get_event_loop()  # 创建一个事件循环模型
tasks = [get(i) for i in urls]  # 初始化任务列表
loop.run_until_complete(asyncio.wait(tasks))  # 执行任务
print('running time: ', time.time() - t1)

async with aiohttp.ClientSession() as session: 中的async with aiohttp.ClientSession() as是固定写法,至于as后面的session可以自定义。

虽然,我们能打印了,但是,我们怎么能获取到返回值呢?

有返回值的多任务

python
import time
import asyncio
import aiohttp
from fake_useragent import UserAgent  # pip install fake_useragent

urls = [
    'https://www.baidu.com', 
    'https://edgeapi.rubyonrails.org/', 
    'https://www.cnblogs.com',
    'https://www.bing.com', 
    'https://www.zhihu.com/',
]

async def get(url):
    async with aiohttp.ClientSession() as session:
        headers = {'User-Agent': UserAgent().random}
        async with session.request(method='get', url=url, headers=headers) as result:
            return result.status, result.url

t1 = time.time()
loop = asyncio.get_event_loop()
# 想要获取返回值需要使用 loop.create_task(get(i))
tasks = [loop.create_task(get(i)) for i in urls]
loop.run_until_complete(asyncio.wait(tasks))
for i in tasks:
    print(i.result())  # 循环tasks获取每个result
loop.close()
print('running time: ', time.time() - t1)

上例展示了带请求头的写法。

看到session.request(method='get', url=url, headers=headers)这种写法,你一定不陌生,其实aiohttprequets模块用法基本一致。

再来看,进一步封装的用法:

python
import time
import asyncio
import aiohttp
from fake_useragent import UserAgent  # pip install fake_useragent


urls = [
    'https://www.baidu.com', 'https://edgeapi.rubyonrails.org/', 'https://www.cnblogs.com',
    'https://www.bing.com', 'https://www.zhihu.com/',
]

async def get(url):
    async with aiohttp.ClientSession() as session:
        headers = {'User-Agent': UserAgent().random}
        async with session.request(method='get', url=url, headers=headers) as result:
            return result.status, result.url

async def main():
    task_l = [get(i) for i in urls]
    for ret in asyncio.as_completed(task_l):
        res = await ret
        print(res)
        
t1 = time.time()
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
loop.close()
print('running time: ', time.time() - t1)

see also:Python aiohttp异步爬虫(萌新读物,大神勿扰)