Python is, in a lot of ways, a very rich language. After years of using
it, I still regularly discover new parts of the ecosystem, even in the
standard library. In particular, there are a few modules which are not
very well-known, but can be very useful in some situations. Today I
discovered dbm
a persistent key/value store:
Quick start:
import dbm
with open('my_store', 'c') as db:
db['key'] = 'value'
print(db.keys()) # ['key']
print(db['key']) # 'value'
print('key' in db) # True
It behaves a lot like a dict
except:
- It persists its values on disk
- You can only use
str
orbytes
as key and values
The performance is also slower than a dictionary, but faster than sqlite3
.
The benchmark consists in performing 10k writes and 10k random reads:
from random import random
import time
operations = 10000
writes = [str(i) for i in range(operations)]
reads = [str(int(random() * operations)) for _ in range(operations)]
# Create some records
for i in writes:
db[i] = 'x'
# Read values in random order
for i in reads:
x = db[i]
Here are the results:
dict
-- total: 0.002 seconds, 0.23398 μs/record x 1dbm
-- total: 0.054 seconds, 5.35984 μs/record x 27sqlite3
in-memory: total: 2.468 seconds, 246.84346 μs/record x 1234sqlite3
file: total 42.407 seconds, 4240.69593 μs/record x 21207
Why is sqlite3
so slow? Well, the benchmark is probably not representative of
the typical workload for sqlite (lots of individual insertions and selections).
If we perform the same operations using executemany
and one select
of all
the keys at once, we get:
:memory:
-- total: 0.038 seconds, 3.81415 μs/record x 19file
-- total 0.071 seconds, 7.07369 μs/record x 35
It's much better, but still not as fast as dbm
(when we persist to
a file). So, if you have a workload where keys and values are added
and retrieved very often, are always str
or bytes
and need to be
persisted on disk, dbm
is a serious contender!
The code used to benchmark can be found here: dbm_bench.py
import dbm
import sqlite3
import os
from random import random
import time
MAX_RECORDS = 10000
WRITES = [str(i) for i in range(MAX_RECORDS)]
RANDOM_READS = [
str(int(random() * MAX_RECORDS))
for _ in range(MAX_RECORDS)
]
class SqliteDict:
def __init__(self, con):
self.con = con
self.con.execute('CREATE TABLE store(key TEXT, value TEXT)')
def __setitem__(self, key, value):
with self.con:
self.con.execute('INSERT INTO store VALUES (?,?)', (key, value))
def __getitem__(self, key):
with self.con:
return self.con.execute('SELECT value FROM store WHERE key=?', (key,))
def sqlite3_mem_open():
with sqlite3.connect(':memory:') as con:
yield SqliteDict(con)
def sqlite3_file_open():
with sqlite3.connect('file.sql') as con:
yield SqliteDict(con)
try:
os.remove('file.sql')
except os.FileNoteFoundError:
print('Could not find', 'file.sql')
def dbm_open():
name = 'dbm.db'
with dbm.open('dbm', 'c') as db:
yield db
try:
os.remove(name)
except os.FileNoteFoundError:
print('Could not find', name)
def dict_open():
yield {}
def bench(db_gen):
for db in db_gen():
t = time.time()
# create some records
for i in WRITES:
db[i] = 'x'
# do a some random reads
for i in RANDOM_READS:
x = db[i]
time_taken = time.time() - t
print("Took %0.3f seconds, %0.5f microseconds / record" % (time_taken, (time_taken * 1000000) / MAX_RECORDS))
def bench_sqlite():
with sqlite3.connect('db.sql') as con:
with con:
con.execute('CREATE TABLE store(key TEXT, value TEXT)')
t = time.time()
with con:
con.executemany('INSERT INTO store VALUES (?,?)', (
(k, 'x') for k in WRITES
))
with con:
con.execute('SELECT * FROM store WHERE key in ({0})'.format(', '.join('?' for _ in RANDOM_READS)), RANDOM_READS).fetchall()
time_taken = time.time() - t
print("Took %0.3f seconds, %0.5f microseconds / record" % (time_taken, (time_taken * 1000000) / MAX_RECORDS))
if __name__ == "__main__":
bench_sqlite()
bench(dict_open)
bench(dbm_open)
bench(sqlite3_mem_open)
bench(sqlite3_file_open)