We needed a JSON document store for a microservice. The documents are relatively large, about 1MB, and they should be retrieved by id. Mongo and Postgres are the obvious candidates – Mongo because that’s what it does and Postgres because it has a reputation for good JSON support.
We tested how long it takes to add a JSON document and retrieve it again. We added the documents one at a time (both Mongo and Postgres support multiple document insert and its a lot faster, but in real world situations, you may not have multiple documents to insert).
We used two docker containers to test and a python script to insert 10000 documents and retrieve the documents by id (using the pymongo and psycopg2 client libraries).
insert a document | retrieve a document | |
---|---|---|
mongo | 24ms | 12ms |
postgres | 34ms | 16ms |
Note: we also tested redis and elasticsearch, although neither are really appropriate for our needs (redis because its memory-based and elasticsearch since we don’t need search capabilities). But anyway, just for interest, here are the results:
insert a document | retrieve a document | |
---|---|---|
redis | 9ms | 7ms |
elasticsearch | 66ms | 10ms |
docker-compose.yml:
version: "3.8" services: mongo: image: mongo container_name: mongo ports: - 27017:27017 mem_limit: 1500m command: --wiredTigerEngineConfigString="cache_size=256M" restart: always volumes: - ./mongo/data:/data/db postgres: image: postgres container_name: postgres restart: always environment: - POSTGRES_USER=postgres - POSTGRES_PASSWORD=postgres ports: - "5432:5432" volumes: - ./postgres/data:/var/lib/postgresql/data
test-postgres.py
from utils import documents import time import json import psycopg2 import psycopg2.extras try: connection = psycopg2.connect("user='postgres' host='localhost' password='postgres'") connection.set_isolation_level(psycopg2.extensions.ISOLATION_LEVEL_AUTOCOMMIT) except: print ("unable to connect to the database server") cursor = connection.cursor(cursor_factory=psycopg2.extras.RealDictCursor) try: cursor.execute("CREATE DATABASE test") except: pass connection.close() try: connection = psycopg2.connect("user='postgres' host='localhost' password='postgres'", database='test') connection.set_isolation_level(psycopg2.extensions.ISOLATION_LEVEL_AUTOCOMMIT) except: print ("unable to connect to the database test") cursor = connection.cursor(cursor_factory=psycopg2.extras.RealDictCursor) cursor.execute("DROP TABLE IF EXISTS documents") cursor.execute(""" CREATE TABLE documents ( id integer NOT NULL PRIMARY KEY, document json ) """) documents_to_add = 10000 elapsed_time = 0 for id in range(1,documents_to_add+1): document = documents.get_document(id) if id % 100 == 0: print(id) start_time = time.time_ns() cursor.execute("INSERT INTO documents (id, document) VALUES (%s, %s);", (id, json.dumps(document))) elapsed_time += time.time_ns()-start_time print(f"time to insert one document = {(elapsed_time)/documents_to_add/1000000:3} ms") documents_to_search = documents_to_add start_time = time.time_ns() count = 0 for id in range(1,documents_to_search+1): count += 1 cursor.execute(f"select document from documents where id={id}") document = cursor.fetchone()["document"] if not document: print (f"failed to find document {id}") print(f"time to find and retrieve one document = {(time.time_ns()-start_time)/count/1000000:3} ms")
test-mongo.py:
import pymongo from bson.objectid import ObjectId from datetime import datetime import time from utils import documents myclient = pymongo.MongoClient("mongodb://localhost:27017/") mydb = myclient["test"] items = mydb["items"] def object_id_from_int(n): s = str(n) s = '0' * (24 - len(s)) + s return ObjectId(s) def int_from_object_id(obj): return int(str(obj)) items.drop() documents_to_add = 10000 elapsed_time = 0 for id in range(1,documents_to_add+1): document = documents.get_document(id) document["_id"] = object_id_from_int(id) if id % 100 == 0: print(id) start_time = time.time_ns() x = items.insert_one(document) elapsed_time += time.time_ns()-start_time print(f"time to insert one document = {(elapsed_time)/documents_to_add/1000000:3} ms") documents_to_search = documents_to_add start_time = time.time_ns() count = 0 for id in range(1,documents_to_search+1): count += 1 document = items.find_one({"_id": object_id_from_int(id)}) if not document: print (f"failed to find document {id}") print(f"time to find and retrieve one document = {(time.time_ns()-start_time)/count/1000000:3} ms")
7 thoughts on “Mongo vs Postgres as simple JSON object store”