0%

使用迭代器优化pymysql大量查询

缘由

当我们需要使用pymysql进行大量数据的提取时,发现越来越慢。直到阻塞,down。

解决

使用SSDictCursor无缓冲游标来操作,使用fetchall_unbuffered,来进行。同时辅以分页查询即可。

首先是建立连接,curror 的选择

1
2
3
4
5
6
7
8
9
10
11
12
from pymysql import connect, cursors

client = connect(**{
"host": "127.0.0.1",
"port": 3306,
"user": "root",
"password": "123321",
"database": "test",
'charset': 'utf8'
}, cursorclass=cursors.SSCursor)

cursor = client.cursor()

注意在此可以选择无缓冲游标, 来操作。即cursors.SSCursor,cursors.SSDictCursor

1
2
3
4
5
6
page_count = 1000
for pg in range(1000):
query_sql = f"""select * from xiaohongshu_comment_note_2 limit {page_number * page_count}, {page_count};"""
cursor.execute(query_sql)
for results in cursor.fetchall_unbuffered():
print(results)

当然还可以在函数中这样写

1
2
3
4
5
6
7
8
9
10
11
12
13
14
def unbuffer_query(page_count=1000, max_page=1000):
"""
:param page_count:
:param max_page:
:return:
"""
for page_number in range(max_page):
query_sql = f"""select * from xiaohongshu_comment_note_2 limit {page_number * page_count}, {page_count};"""
cursor.execute(query_sql)
yield from cursor.fetchall_unbuffered()


for i in unbuffer_query(page_count=999, max_page=9999):
print(i)

瑞思拜~