Mongo vs Postgres as simple JSON object store

We needed a JSON document store for a microservice. The documents are relatively large, about 1MB, and they should be retrieved by id. Mongo and Postgres are the obvious candidates – Mongo because that’s what it does and Postgres because it has a reputation for good JSON support.

We tested how long it takes to add a JSON document and retrieve it again. We added the documents one at a time (both Mongo and Postgres support multiple document insert and its a lot faster, but in real world situations, you may not have multiple documents to insert).

We used two docker containers to test and a python script to insert 10000 documents and retrieve the documents by id (using the pymongo and psycopg2 client libraries).

insert a document retrieve a document
mongo 24ms 12ms
postgres 34ms 16ms

Note: we also tested redis and elasticsearch, although neither are really appropriate for our needs (redis because its memory-based and elasticsearch since we don’t need search capabilities). But anyway, just for interest, here are the results:

insert a document retrieve a document
redis 9ms 7ms
elasticsearch 66ms 10ms

docker-compose.yml:

version: "3.8"

services:
  mongo:
    image: mongo
    container_name: mongo
    ports:
      - 27017:27017
    mem_limit: 1500m
    command: --wiredTigerEngineConfigString="cache_size=256M"
    restart: always
    volumes:
      - ./mongo/data:/data/db

  postgres:
    image: postgres
    container_name: postgres
    restart: always
    environment:
      - POSTGRES_USER=postgres
      - POSTGRES_PASSWORD=postgres
    ports:
      - "5432:5432"
    volumes:
      - ./postgres/data:/var/lib/postgresql/data

test-postgres.py

from utils import documents
import time
import json
import psycopg2
import psycopg2.extras

try:
    connection = psycopg2.connect("user='postgres' host='localhost' password='postgres'")
    connection.set_isolation_level(psycopg2.extensions.ISOLATION_LEVEL_AUTOCOMMIT)
except:
    print ("unable to connect to the database server")
    
cursor = connection.cursor(cursor_factory=psycopg2.extras.RealDictCursor)

try:
    cursor.execute("CREATE DATABASE test")
except:
    pass

connection.close()

try:
    connection = psycopg2.connect("user='postgres' host='localhost' password='postgres'", database='test')
    connection.set_isolation_level(psycopg2.extensions.ISOLATION_LEVEL_AUTOCOMMIT)
except:
    print ("unable to connect to the database test")
    
cursor = connection.cursor(cursor_factory=psycopg2.extras.RealDictCursor)

cursor.execute("DROP TABLE IF EXISTS documents")

cursor.execute("""
                CREATE TABLE documents (
                    id integer NOT NULL PRIMARY KEY,
                    document json
                )
                """)

documents_to_add = 10000

elapsed_time = 0
for id in range(1,documents_to_add+1):
    document = documents.get_document(id)
    if id % 100 == 0:
        print(id)
    start_time = time.time_ns()
    cursor.execute("INSERT INTO documents (id, document) VALUES (%s, %s);", (id, json.dumps(document)))
    elapsed_time += time.time_ns()-start_time
print(f"time to insert one document = {(elapsed_time)/documents_to_add/1000000:3} ms")

documents_to_search = documents_to_add
start_time = time.time_ns()
count = 0
for id in range(1,documents_to_search+1):
    count += 1
    cursor.execute(f"select document from documents where id={id}")
    document = cursor.fetchone()["document"]
    if not document:
        print (f"failed to find document {id}")
print(f"time to find and retrieve one document = {(time.time_ns()-start_time)/count/1000000:3} ms")

test-mongo.py:

import pymongo
from bson.objectid import ObjectId
from datetime import datetime
import time
from utils import documents

myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["test"]
items = mydb["items"]

def object_id_from_int(n):
    s = str(n)
    s = '0' * (24 - len(s)) + s
    return ObjectId(s)

def int_from_object_id(obj):
    return int(str(obj))

items.drop()
documents_to_add = 10000

elapsed_time = 0
for id in range(1,documents_to_add+1):
    document = documents.get_document(id)
    document["_id"] = object_id_from_int(id)
    if id % 100 == 0:
        print(id)
    start_time = time.time_ns()
    x = items.insert_one(document)
    elapsed_time += time.time_ns()-start_time
print(f"time to insert one document = {(elapsed_time)/documents_to_add/1000000:3} ms")

documents_to_search = documents_to_add
start_time = time.time_ns()
count = 0
for id in range(1,documents_to_search+1):
    count += 1
    document = items.find_one({"_id": object_id_from_int(id)})
    if not document:
        print (f"failed to find document {id}")
print(f"time to find and retrieve one document = {(time.time_ns()-start_time)/count/1000000:3} ms")

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.