A search microservice with Solr, Docker and Postgres

As I have just joined the team at Armstrong Consulting, I was encouraged to explore the capabilities of Solr within a Docker container and research ways to index existing data stored in a postgres database. In this article I want to present my findings by building a custom Docker image for that purpose.

Using Docker, booting up a Solr instance is straightforward:

$ docker run -d -p 8983:8983 --name my_solr solr
$ docker exec -it my_solr solr create_core -c my_core

First, we run the Solr image, then we can execute commands, such as creating a core against it. Since having to execute commands manually is very inconvenient, the docker solr image provides scripts that handle the job for us:

$ docker run -d -p 8983:8983 --name my_solr solr solr-precreate my_core

This isn’t very useful on its own, because the core still needs to be configured using a custom configset.


Creating the custom solr image

As a starting point we will use the default configset. We can get it by copying it out of a Solr container like this:

$ docker create --rm --name copier solr
$ docker cp copier:/opt/solr/server/solr/configsets/_default myconfig
$ docker rm copier

Now we have to create a file that contains information about the database driver, url, user and password as well as the query and the fields we want Solr to index. We’re going to call the file db-data-config.xml and put it into the myconfig/conf directory.

<dataConfig>
  <dataSource type="JdbcDataSource" driver="org.postgresql.Driver" url="jdbc:postgresql://db:1234/users" user="my-username" password="my-secure-password" batchSize="1" />
  <document name="Users">
    <entity name="User" query="SELECT * FROM user">
      <field column="id" name="id" />
      <field column="username" name="username" />
    </entity>
  </document>
</dataConfig>

Next, we have to modify solrconfig.xml and add a new request handler to use the configurations we just created. Add the following just beneath the </requestDispatcher> tag, where the request handler definitions start.

<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
  <lst name="defaults">
    <str name="config">db-data-config.xml</str>
  </lst>
</requestHandler> 

Additionally, we have to add two more lines to solrconfig.xml so solr knows where to look for the necessary dependencies such as the database driver. Add the following two lines inside the <config> tag, below all the other <lib /> tags. Solr will look for files in the specified directory that match the regex and add them to the classpath.

<lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-dataimporthandler-.*\.jar" />
<lib dir="${solr.install.dir:../../../..}/contrib/dataimporthandler-extras/lib/" regex="postgresql-.*\.jar" />

Note, that the docker solr image doesn’t include the database driver. We will have to copy the driver into our custom built solr image later.

As the last step of configuring solr we have to add the fields we want to be able to query. We do this by adding them to the managed-schema file where all the other fields are defined. As there is already a definition for id, all we have to is to add a field for username like this:

<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<field name="username" type="string" indexed="true" stored="true" />

Now lets create a Dockerfile that contains all of our configurations:

FROM solr

COPY myconfig /opt/solr/server/solr/configsets/myconfig
COPY postgresql-42.2.7.jar /opt/solr/contrib/dataimporthandler-extras/lib

First, we copy our custom configuration folder into the image on line 3, then we copy the database driver into the directory dataimporthandler-extras/lib/ we configured in the solrconfig.xml.

Once we have built the custom image, we can run it using docker-compose:

$ docker build . -t custom_solr
version: "3.0"

services:
  solr_test:
    container_name: my_solr
    image: custom_solr
    ports:
      - 8983:8983
    command:
      - solr-precreate
      - my_core # core name
      - /opt/solr/server/solr/configsets/myconfig # custom configset

networks:
  default:
    external:
      name: database_default

Note, that we cannot only pass the name of the core to the solr-precreate script, but we can provide a custom configset as well. In addition, we have to make sure that our solr container shares the same network with the database container.

The same behavior can be achieved by mounting the configset and the driver into the container, instead of creating a custom image that contains the necessary files.

Thats it! Now we can go to localhost:8983 and index our database as well as query usernames as intended. However, our image still has some limitations such as hardcoded database credentials for example. I want to adress this issue in the following part by extending the collection of default scripts solr provides with a custom one.


Writing a custom script

Right now the database credentials are hardcoded into the db-data-config.xml file. This might not be much of an issue, but it’s still worth improving on.

Solr docker provides some bash scripts out of the box, such as the solr-precreate script we used earlier. The scripts are located in /opt/docker-solr/scripts and we can simply pass the name of the script and its parameters to the command-section of the docker-compose file like this:

command:
  - solr-precreate #name of the script
  - my_core # argument 1
  - /opt/solr/server/solr/configsets/myconfig # argument 2

What we want to do is something like this:

command:
      - set-db-variables # custom script
      - jdbc:postgresql://db:1234/users # db url
      - my-username # db user
      - my-secure-password # db password
      - solr-precreate # execute solr-precreate script with the parameters below
      - my_core # core name
      - /opt/solr/server/solr/configsets/myconfig # custom configset

To achieve this behaviour our script needs to accept all of the arguments, use only the first three to configure the database credentials, and execute the rest of them.

First, we create a template for the db-data-config.xml file we are going to call db-data-config_template.xml and put placeholders into the fields we want to change.

<dataConfig>
  <dataSource type="JdbcDataSource" driver="org.postgresql.Driver" url="dburl" user="dbuser" password="dbpassword" batchSize="1" />
  <document name="Users">
    <entity name="User" query="SELECT * FROM user">
      <field column="id" name="id" />
      <field column="username" name="username" />
    </entity>
  </document>
</dataConfig>

Then, we can write a script that swaps the placeholders with the values provided as arguments:

#!/bin/bash

# use first 3 arguments to change db-data-config.xml

#escape url for sed
escapedUrl=$(echo "$1" | sed -e 's/[\/&]/\\&/g' | cat)
sed "s/dburl/$escapedUrl/g" /opt/solr/server/solr/configsets/myconfig/conf/db-data-config_template.xml |
sed "s/dbuser/$2/g" | 
sed "s/dbpassword/$3/g" > /opt/solr/server/solr/configsets/myconfig/conf/db-data-config.xml

# execute further command with its arguments
"${@:4}"

Finally, we have to copy the script inside the image. We do this by extending the Dockerfile with one more line:

COPY set-db-variables /opt/docker-solr/scripts/set-db-variables

Now we can easily switch between different databases without rebuilding the image. As seen in the example above, using scripts provides a powerful way to automate tasks in order to improve the usability of the docker solr image.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.