We needed a JSON document store for a microservice. The documents are relatively large, about 1MB, and they should be retrieved by id. Mongo and Postgres are the obvious candidates – Mongo because that’s what it does and Postgres because it has a reputation for good JSON support.
We tested how long it takes to add a JSON document and retrieve it again. We added the documents one at a time (both Mongo and Postgres support multiple document insert and its a lot faster, but in real world situations, you may not have multiple documents to insert).
We used two docker containers to test and a python script to insert 10000 documents and retrieve the documents by id (using the pymongo and psycopg2 client libraries).
| insert a document | retrieve a document | |
|---|---|---|
| mongo | 24ms | 12ms |
| postgres | 34ms | 16ms |
Note: we also tested redis and elasticsearch, although neither are really appropriate for our needs (redis because its memory-based and elasticsearch since we don’t need search capabilities). But anyway, just for interest, here are the results:
| insert a document | retrieve a document | |
|---|---|---|
| redis | 9ms | 7ms |
| elasticsearch | 66ms | 10ms |
docker-compose.yml:
version: "3.8"
services:
mongo:
image: mongo
container_name: mongo
ports:
- 27017:27017
mem_limit: 1500m
command: --wiredTigerEngineConfigString="cache_size=256M"
restart: always
volumes:
- ./mongo/data:/data/db
postgres:
image: postgres
container_name: postgres
restart: always
environment:
- POSTGRES_USER=postgres
- POSTGRES_PASSWORD=postgres
ports:
- "5432:5432"
volumes:
- ./postgres/data:/var/lib/postgresql/data
test-postgres.py
from utils import documents
import time
import json
import psycopg2
import psycopg2.extras
try:
connection = psycopg2.connect("user='postgres' host='localhost' password='postgres'")
connection.set_isolation_level(psycopg2.extensions.ISOLATION_LEVEL_AUTOCOMMIT)
except:
print ("unable to connect to the database server")
cursor = connection.cursor(cursor_factory=psycopg2.extras.RealDictCursor)
try:
cursor.execute("CREATE DATABASE test")
except:
pass
connection.close()
try:
connection = psycopg2.connect("user='postgres' host='localhost' password='postgres'", database='test')
connection.set_isolation_level(psycopg2.extensions.ISOLATION_LEVEL_AUTOCOMMIT)
except:
print ("unable to connect to the database test")
cursor = connection.cursor(cursor_factory=psycopg2.extras.RealDictCursor)
cursor.execute("DROP TABLE IF EXISTS documents")
cursor.execute("""
CREATE TABLE documents (
id integer NOT NULL PRIMARY KEY,
document json
)
""")
documents_to_add = 10000
elapsed_time = 0
for id in range(1,documents_to_add+1):
document = documents.get_document(id)
if id % 100 == 0:
print(id)
start_time = time.time_ns()
cursor.execute("INSERT INTO documents (id, document) VALUES (%s, %s);", (id, json.dumps(document)))
elapsed_time += time.time_ns()-start_time
print(f"time to insert one document = {(elapsed_time)/documents_to_add/1000000:3} ms")
documents_to_search = documents_to_add
start_time = time.time_ns()
count = 0
for id in range(1,documents_to_search+1):
count += 1
cursor.execute(f"select document from documents where id={id}")
document = cursor.fetchone()["document"]
if not document:
print (f"failed to find document {id}")
print(f"time to find and retrieve one document = {(time.time_ns()-start_time)/count/1000000:3} ms")
test-mongo.py:
import pymongo
from bson.objectid import ObjectId
from datetime import datetime
import time
from utils import documents
myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["test"]
items = mydb["items"]
def object_id_from_int(n):
s = str(n)
s = '0' * (24 - len(s)) + s
return ObjectId(s)
def int_from_object_id(obj):
return int(str(obj))
items.drop()
documents_to_add = 10000
elapsed_time = 0
for id in range(1,documents_to_add+1):
document = documents.get_document(id)
document["_id"] = object_id_from_int(id)
if id % 100 == 0:
print(id)
start_time = time.time_ns()
x = items.insert_one(document)
elapsed_time += time.time_ns()-start_time
print(f"time to insert one document = {(elapsed_time)/documents_to_add/1000000:3} ms")
documents_to_search = documents_to_add
start_time = time.time_ns()
count = 0
for id in range(1,documents_to_search+1):
count += 1
document = items.find_one({"_id": object_id_from_int(id)})
if not document:
print (f"failed to find document {id}")
print(f"time to find and retrieve one document = {(time.time_ns()-start_time)/count/1000000:3} ms")

7 thoughts on “Mongo vs Postgres as simple JSON object store”