HOME/Articles/

2017-8-27 python 协程加速

Article Outline

情景描述:

上周,由于产品嫌报告生成太慢,经过使用profile/gprof2dot研究后,发现主要时间耗费在接口网络请求上,于是我决定在项目中大量处理I/O网络请求的地方使用gevent,以缓解报告生成压力。

<!--more-->

实现代码:

import gevent
from gevent import monkey, pool
monkey.patch_socket()
p = pool.Pool(300)

def requests_parse(self, tel_tuple):
    """
    主要处理requests请求
    """
    print('Size of pool', len(p))
    ……
    ……

def generator_label(self, tel_data):
    """
    使用gevent实现协程处理I/O网络请求
    """
    jobs = [p.spawn(self.requests_parse, tel) for tel in tel_data]
    gevent.joinall(jobs)

    tlist = [x.value for x in jobs]
    if None in tlist:
        message_list = [x.exception for x in jobs]
        self.logger.error(tlist)
        self.logger.error(message_list)
        raise Exception(message_list)
    return tlist

def update_data_step(self, tel):
    """
    主要将请求回来处理好的数据写入数据库(mongo)
    """
    res = self.db.update()
    ……
    ……

tel_data = [……] # 一大堆待请求参数list
tlist = self.generator_label(set(tel_data))
map(lambda x: self.update_data_step(x[0])(x[1],x[2],x[3],x[4],x[5],x[6]), tlist)

问题:

上线后发现,代码运行一段时间后,请求一上来,任务数直线上升,一直增加: enter description here 但是,pool的数量是正常的: enter description here 之后log里报错信息:

  File "run.py", line 42, in update_data
  File "report/generate/calls_sa_by_tel.py", line 396, in run
  File "report/generate/calls_sa_by_tel.py", line 38, in base_call
  File "venv/lib/python2.7/site-packages/pymongo/cursor.py", line 729, in count
  File "venv/lib/python2.7/site-packages/pymongo/collection.py", line 1344, in _count
  File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
  File "venv/lib/python2.7/site-packages/pymongo/mongo_client.py", line 904, in _socket_for_reads
  File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
  File "venv/lib/python2.7/site-packages/pymongo/mongo_client.py", line 870, in _get_socket
  File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
  File "venv/lib/python2.7/site-packages/pymongo/server.py", line 168, in get_socket
  File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
  File "venv/lib/python2.7/site-packages/pymongo/pool.py", line 844, in get_socket
  File "venv/lib/python2.7/site-packages/pymongo/pool.py", line 881, in _get_socket_no_auth
  File "venv/lib/python2.7/site-packages/pymongo/pool.py", line 817, in connect
  File "venv/lib/python2.7/site-packages/pymongo/pool.py", line 263, in _raise_connection_failure
AutoReconnect: xxx.xxx.xxx.117:27017: [Errno 24] Too many open files

解决办法:

经过分析,解决办法:monkey.patch_socket()换为monkey.patch_all(),或者,在使用完gevent后使用reload(socket)将socket初始化。 原因:应该是mongo写数据是阻塞的,请求快于写操作,导致写操作堆积越来越多,最终导致程序抛出Too many open files错误。 最终代码,如下:

import gevent
from gevent import monkey, pool
monkey.patch_all()

def generator_label(self, tel_data):
    p = pool.Pool(10)
    jobs = [p.spawn(self.requests_parse, tel) for tel in tel_data]
    gevent.joinall(jobs)
    # print [x.value for x in jobs]
    tlist = [x.value for x in jobs]
    if None in tlist:
        message_list = [x.exception for x in jobs]
        self.logger.error(tlist)
        self.logger.error(message_list)
        raise Exception(message_list)
    return tlist


# 或者使用
p = pool.Pool(10)
jobs = [p.spawn(self.requests_parse, tel) for tel in tel_data]
gevent.joinall(jobs)
import socket
reload(socket)

另一个用例:

给传递两个参数,直接后面跟着就行,逗号分开; 返回如是多个情况的,value是一个以tuple为元素的list。

    p = pool.Pool(10)
    jobs = []
# date ('2017-07-01', '2017-07-14')
jobs = [p.spawn(self.crawl_a_call_log, date[0], date[1]) for date in dates]
gevent.joinall(jobs)
data_list = [x.value for x in jobs]
print data_list
if None in data_list:
    message_list = [x.exception for x in jobs]
    self.log('crawler', data_list, message_list)
    raise Exception(message_list)

```

总结

协程(gevent)是把双刃剑,monkey.patch 是一个邪恶的东西; 提升效果不要太好,耗时足足降了60%,而且,请求越多,效果越明显。