As I have just joined the team at Armstrong Consulting, I was encouraged to explore the capabilities of Solr within a Docker container and research ways to index existing data stored in a postgres database. In this article I want to present my findings by building a custom Docker image for that purpose.
Using Docker, booting up a Solr instance is straightforward:
$ docker run -d -p 8983:8983 --name my_solr solr
$ docker exec -it my_solr solr create_core -c my_core
First, we run the Solr image, then we can execute commands, such as creating a core against it. Since having to execute commands manually is very inconvenient, the docker solr image provides scripts that handle the job for us:
$ docker run -d -p 8983:8983 --name my_solr solr solr-precreate my_core
This isn’t very useful on its own, because the core still needs to be configured using a custom configset.
Creating the custom solr image
As a starting point we will use the default configset. We can get it by copying it out of a Solr container like this:
$ docker create --rm --name copier solr
$ docker cp copier:/opt/solr/server/solr/configsets/_default myconfig
$ docker rm copier
Now we have to create a file that contains information about the database driver, url, user and password as well as the query and the fields we want Solr to index. We’re going to call the file db-data-config.xml
and put it into the myconfig/conf
directory.
<dataConfig>
<dataSource type="JdbcDataSource" driver="org.postgresql.Driver" url="jdbc:postgresql://db:1234/users" user="my-username" password="my-secure-password" batchSize="1" />
<document name="Users">
<entity name="User" query="SELECT * FROM user">
<field column="id" name="id" />
<field column="username" name="username" />
</entity>
</document>
</dataConfig>
Next, we have to modify solrconfig.xml
and add a new request handler to use the configurations we just created. Add the following just beneath the </requestDispatcher>
tag, where the request handler definitions start.
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">db-data-config.xml</str>
</lst>
</requestHandler>
Additionally, we have to add two more lines to solrconfig.xml
so solr knows where to look for the necessary dependencies such as the database driver. Add the following two lines inside the <config>
tag, below all the other <lib />
tags. Solr will look for files in the specified directory that match the regex and add them to the classpath.
<lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-dataimporthandler-.*\.jar" />
<lib dir="${solr.install.dir:../../../..}/contrib/dataimporthandler-extras/lib/" regex="postgresql-.*\.jar" />
Note, that the docker solr image doesn’t include the database driver. We will have to copy the driver into our custom built solr image later.
As the last step of configuring solr we have to add the fields we want to be able to query. We do this by adding them to the managed-schema
file where all the other fields are defined. As there is already a definition for id
, all we have to is to add a field for username
like this:
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<field name="username" type="string" indexed="true" stored="true" />
Now lets create a Dockerfile that contains all of our configurations:
FROM solr
COPY myconfig /opt/solr/server/solr/configsets/myconfig
COPY postgresql-42.2.7.jar /opt/solr/contrib/dataimporthandler-extras/lib
First, we copy our custom configuration folder into the image on line 3, then we copy the database driver into the directory dataimporthandler-extras/lib/
we configured in the solrconfig.xml
.
Once we have built the custom image, we can run it using docker-compose:
$ docker build . -t custom_solr
version: "3.0"
services:
solr_test:
container_name: my_solr
image: custom_solr
ports:
- 8983:8983
command:
- solr-precreate
- my_core # core name
- /opt/solr/server/solr/configsets/myconfig # custom configset
networks:
default:
external:
name: database_default
Note, that we cannot only pass the name of the core to the solr-precreate
script, but we can provide a custom configset as well. In addition, we have to make sure that our solr container shares the same network with the database container.
The same behavior can be achieved by mounting the configset and the driver into the container, instead of creating a custom image that contains the necessary files.
Thats it! Now we can go to localhost:8983
and index our database as well as query usernames as intended. However, our image still has some limitations such as hardcoded database credentials for example. I want to adress this issue in the following part by extending the collection of default scripts solr provides with a custom one.
Writing a custom script
Right now the database credentials are hardcoded into the db-data-config.xml
file. This might not be much of an issue, but it’s still worth improving on.
Solr docker provides some bash scripts out of the box, such as the solr-precreate
script we used earlier. The scripts are located in /opt/docker-solr/scripts
and we can simply pass the name of the script and its parameters to the command-section of the docker-compose
file like this:
command:
- solr-precreate #name of the script
- my_core # argument 1
- /opt/solr/server/solr/configsets/myconfig # argument 2
What we want to do is something like this:
command:
- set-db-variables # custom script
- jdbc:postgresql://db:1234/users # db url
- my-username # db user
- my-secure-password # db password
- solr-precreate # execute solr-precreate script with the parameters below
- my_core # core name
- /opt/solr/server/solr/configsets/myconfig # custom configset
To achieve this behaviour our script needs to accept all of the arguments, use only the first three to configure the database credentials, and execute the rest of them.
First, we create a template for the db-data-config.xml
file we are going to call db-data-config_template.xml
and put placeholders into the fields we want to change.
<dataConfig>
<dataSource type="JdbcDataSource" driver="org.postgresql.Driver" url="dburl" user="dbuser" password="dbpassword" batchSize="1" />
<document name="Users">
<entity name="User" query="SELECT * FROM user">
<field column="id" name="id" />
<field column="username" name="username" />
</entity>
</document>
</dataConfig>
Then, we can write a script that swaps the placeholders with the values provided as arguments:
#!/bin/bash
# use first 3 arguments to change db-data-config.xml
#escape url for sed
escapedUrl=$(echo "$1" | sed -e 's/[\/&]/\\&/g' | cat)
sed "s/dburl/$escapedUrl/g" /opt/solr/server/solr/configsets/myconfig/conf/db-data-config_template.xml |
sed "s/dbuser/$2/g" |
sed "s/dbpassword/$3/g" > /opt/solr/server/solr/configsets/myconfig/conf/db-data-config.xml
# execute further command with its arguments
"${@:4}"
Finally, we have to copy the script inside the image. We do this by extending the Dockerfile
with one more line:
COPY set-db-variables /opt/docker-solr/scripts/set-db-variables
Now we can easily switch between different databases without rebuilding the image. As seen in the example above, using scripts provides a powerful way to automate tasks in order to improve the usability of the docker solr image.