A search microservice with Solr, Docker and Postgres

As I have just joined the team at Armstrong Consulting, I was encouraged to explore the capabilities of Solr within a Docker container and research ways to index existing data stored in a postgres database. In this article I want to present my findings by building a custom Docker image for that purpose.

Using Docker, booting up a Solr instance is straightforward:

First, we run the Solr image, then we can execute commands, such as creating a core against it. Since having to execute commands manually is very inconvenient, the docker solr image provides scripts that handle the job for us:

This isn’t very useful on it’s own, because the core still needs to be configured using a custom configset.


Creating the custom solr image

As a starting point we will use the default configset. We can get it by copying it out of a Solr container like this:

Now we have to create a file that contains information about the database driver, url, user and password as well as the query and the fields we want Solr to index. We’re going to call the file db-data-config.xml and put it into the myconfig/conf directory.

Next, we have to modify solrconfig.xml and add a new request handler to use the configurations we just created. Add the following just beneath the </requestDispatcher> tag, where the request handler definitions start.

Additionally, we have to add two more lines to solrconfig.xml so solr knows where to look for the necessary dependencies such as the database driver. Add the following two lines inside the <config> tag, below all the other <lib /> tags. Solr will look for files in the specified directory that match the regex and add them to the classpath.

Note, that the docker solr image doesn’t include the database driver. We will have to copy the driver into our custom built solr image later.

As the last step of configuring solr we have to add the fields we want to be able to query. We do this by adding them to the managed-schema file where all the other fields are defined. As there is already a definition for id, all we have to is to add a field for username like this:

Now lets create a Dockerfile that contains all of our configurations:

First, we copy our custom configuration folder into the image on line 3, then we copy the database driver into the directory dataimporthandler-extras/lib/ we configured in the solrconfig.xml.

Once we have built the custom image, we can run it using docker-compose:

Note, that we cannot only pass the name of the core to the solr-precreate script, but we can provide a custom configset as well. In addition, we have to make sure that our solr container shares the same network with the database container.

The same behavior can be achieved by mounting the configset and the driver into the container, instead of creating a custom image that contains the necessary files.

Thats it! Now we can go to localhost:8983 and index our database as well as query usernames as intended. However, our image still has some limitations such as hardcoded database credentials for example. I want to adress this issue in the following part by extending the collection of default scripts solr provides with a custom one.


Writing a custom script

Right now the database credentials are hardcoded into the db-data-config.xml file. This might not be much of an issue, but it’s still worth improving on.

Solr docker provides some bash scripts out of the box, such as the solr-precreate script we used earlier. The scripts are located in /opt/docker-solr/scripts and we can simply pass the name of the script and its parameters to the command-section of the docker-compose file like this:

What we want to do is something like this:

To achieve this behaviour our script needs to accept all of the arguments, use only the first three to configure the database credentials, and execute the rest of them.

First, we create a template for the db-data-config.xml file we are going to call db-data-config_template.xml and put placeholders into the fields we want to change.

Then, we can write a script that swaps the placeholders with the values provided as arguments:

Finally, we have to copy the script inside the image. We do this by extending the Dockerfile with one more line:

Now we can easily switch between different databases without rebuilding the image. As seen in the example above, using scripts provides a powerful way to automate tasks in order to improve the usability of the docker solr image.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.