Cucco’s Compute Hack

コンピュータ関係の記事を書いていきます。

LEFT OUTER JOINの評価

LEFT OUTER JOINの評価

一瞬で終わる。メモリも食わない。
UNIQUE属性付けておくとさらに早い。

実施内容

データのテーブルと、クラス分け結果のテーブルのJoin。
クラス分け結果は適当に%演算で作った。

ソースコード
import sqlite3
from faker import Faker
import random
from datetime import datetime

tic0 = datetime.now()

con = sqlite3.connect(":memory:")
con.isolation_level = None  # None で自動コミットモード
cur = con.cursor()
cur.execute('PRAGMA temp_store=MEMORY;')
cur.execute('PRAGMA journal_mode=MEMORY;')
#cur.execute('PRAGMA rock_mode=OFF;')

listsize = 100000
classsize= 128

sql_create_data = "CREATE TABLE data(number INTEGER UNIQUE, name TEXT, score REAL)"
sql_create_class = "CREATE TABLE class(number INTEGER UNIQUE, label INTEGER)"
sql_instert_data = "INSERT INTO data VALUES(?,?,?)"
sql_instert_class = "INSERT INTO class VALUES(?,?)"

sql_select_join = "SELECT data.number, data.name, data.score, class.label FROM data LEFT OUTER JOIN class ON data.number = class.number"
sql_select_join_cl1 = "SELECT data.number, data.name, data.score, class.label FROM data LEFT OUTER JOIN class ON data.number = class.number WHERE class.label = 1 "
#list_data = [[None] for i in range(listsize)]
list_data = [None] * listsize
list_class = [None] * listsize
fake = Faker()


print("データ準備 開始")
for i in range(listsize):
    list_data[i] = [i, fake.name(), random.random()]
    list_class[i] = [i, i%classsize]
print("データ準備 完了")

cur.execute(sql_create_data)
cur.execute(sql_create_class)
print("テーブルの定義を表示")
table_defs = cur.execute("SELECT * FROM sqlite_master WHERE type='table'")
for item in table_defs:
    print(item)

print("データの挿入")
cur.executemany(sql_instert_data, list_data)
cur.executemany(sql_instert_class, list_class)

# ここからjoin時間計測
tic = datetime.now()

cur.execute(sql_select_join)

toc = datetime.now()

print("テーブルの中身を表示")
print(cur.fetchone())
print(cur.fetchone())
print(cur.fetchone())
print(cur.fetchone())
print(cur.fetchone())
print(cur.fetchone())
print(cur.fetchone())
print(cur.fetchone())
print(cur.fetchone())
print(cur.fetchone())
print(cur.fetchone())
print(cur.fetchone())
print(cur.fetchone())
print(cur.fetchone())

print("classが1のテーブルの中身を表示")
cur.execute(sql_select_join_cl1)
print(cur.fetchone())
print(cur.fetchone())
print(cur.fetchone())
print(cur.fetchone())
print(cur.fetchone())
print(cur.fetchone())
print(cur.fetchone())
print(cur.fetchone())
print(cur.fetchone())
print(cur.fetchone())
print(cur.fetchone())
print(cur.fetchone())
print(cur.fetchone())
print(cur.fetchone())

print("join {0}件 処理時間 {1}".format(listsize, toc-tic))
print("全体処理時間 {0}件 処理時間 {1}".format(listsize, datetime.now()-tic0))
con.close()
結果
(base) C:\dev\SampleCodes\hello_sqlite>python outerjoin.py
データ準備 開始
データ準備 完了
テーブルの定義を表示
('table', 'data', 'data', 2, 'CREATE TABLE data(number INTEGER UNIQUE, name TEXT, score REAL)')
('table', 'class', 'class', 4, 'CREATE TABLE class(number INTEGER UNIQUE, label INTEGER)')
データの挿入
テーブルの中身を表示
(0, 'Troy Wagner', 0.44436625795068907, 0)
(1, 'Lee Robertson', 0.8527352596455694, 1)
(2, 'Mark Franco', 0.9942993187450221, 2)
(3, 'Joel Sandoval', 0.9462360044917911, 3)
(4, 'Katherine Ho', 0.6177953257612685, 4)
(5, 'Shannon Adams', 0.6044842942129696, 5)
(6, 'Jacob Wong', 0.4513990366427033, 6)
(7, 'Howard Bennett', 0.6643356684420646, 7)
(8, 'Scott Mitchell', 0.5792325135939987, 8)
(9, 'Joshua Tyler', 0.4882390253804837, 9)
(10, 'Steven Day', 0.03463822764171387, 10)
(11, 'Heather Hughes', 0.888992343863056, 11)
(12, 'Margaret Thomas DDS', 0.402020535579534, 12)
(13, 'David Rosario', 0.8354810420435498, 13)
classが1のテーブルの中身を表示
(1, 'Lee Robertson', 0.8527352596455694, 1)
(129, 'April Pittman', 0.48442848761646606, 1)
(257, 'Mercedes Gallagher', 0.9811398290695581, 1)
(385, 'Xavier Harvey', 0.39333389385880146, 1)
(513, 'Laura Powell', 0.11691465694144565, 1)
(641, 'Mrs. Lori Gibbs MD', 0.13424141037411286, 1)
(769, 'Christina Jacobson', 0.7144345369983663, 1)
(897, 'Gregory Lowe', 0.2939644032693347, 1)
(1025, 'Michael Santos', 0.9319930754332403, 1)
(1153, 'Olivia Montgomery', 0.3756360549631732, 1)
(1281, 'Joseph Eaton', 0.7451317465818261, 1)
(1409, 'Julia Adams', 0.5858535015316725, 1)
(1537, 'George Kelley', 0.12250922691704258, 1)
(1665, 'Patricia Johnson DVM', 0.09949616494939173, 1)
join 100000件 処理時間 0:00:00
全体処理時間 100000件 処理時間 0:00:14.371519