Skip to main content

When NOT to use MongoDB background unique indexing

This is based on my personal experience working on MongoDB with a Java app.

My Java app is a Springboot app using Spring Data MongoDB to connect with the database. And here is the maven dependency that I used.


  <dependency>
   <groupId>org.springframework.boot</groupId>
   <artifactId>spring-boot-starter-data-mongodb</artifactId>
  </dependency>

Before jumping into the problem, let me just brief about the indexes that we are using here.


Indexes

Indexes are set to a collection because, during my operations, I might need the collections to be queried based on these indexed fields faster and efficiently.

Unique index: This could be a field that is unique for each document that you are saving in the collection. This field can be a Candidate key which would be a field where you could use to uniquely identify a document. Sometimes these indexes are created as a validation in the database level. So, you cannot insert two similar records in the DB.

Background index: These indexes are based on when the index record is created. Normally, when a collection (comparable to a Table of relational DB) has an index created to it, whenever a new record is added to this collection, at the same time an index record is also created. So the background index is not created at the same time as the actual record is created, instead the indexing is happening in the background and it gives the priority to the actual data to be recorded. Therefore when the index is happening in the background, the read-write operations tend to be much faster because it doesn't block you until the index is created. By default, an index is always running background unless specified otherwise. (By the way, the newer versions of Mongo after 4.2 this could be different as the time delay could be pretty much the same as for background and foreground indexing).

I must say, I'm not an expert in how MongoDB's internal works, but I believe it will make sense with this explanation.

In the Java representation, you can configure the indexes like below.


import org.springframework.data.annotation.Id;
import org.springframework.data.mongodb.core.index.Indexed;
import org.springframework.data.mongodb.core.mapping.Document;

@Document(collection = "items")
public class Item {
    
    @Id
    private String id;
    @Indexed(unique = true, background = true)
    private String serialNumber;
    private String name;

    // rest of the code
}

We have a Java entity called Item (just an example representation of anything, this can be an item of an ordering list or smth). This one has an Id which is also the primary key of this document where we have annotated with @Id to say this. Also, we have a serial number


Problem scenario


Let's say my collection is supposed to have an extremely large number of data like several millions of records. It will take some time to complete the indexing for a collection (Depending on the DB server capacity of course) given the insert rate is also much higher. In this kind of scenario, I would go to running the index in the background. Because I don't want to make my transactions slower and have a slow response time for my application.

Let's say my transactions don't keep the uniqueness, but I'm trying to rely on the DB unique index instead. However, my index is running in the background. Therefore, if one of the transactions inserts a duplicate record with the same serial number into my DB because the unique index is not completely created yet.

So, let's say I have duplicate records, but since my index is not completed yet, it is allowing me to insert or have the existing records duplicated. So when the indexing process is coming over to this duplicated record, it will throw a duplicate index error. If the index is annotated in the Java entity, it will throw an exception.

org.springframework.dao.DuplicateKeyException: Write failed with error code 11000 and error message 'E11000 duplicate key error collection: <some db name>.items index: serialNumber dup key

If there's a duplicate anomaly exists in the data, when a new Java app starts up with the unique index, it will fail to start.

Solution

If you need to create a unique index, don't run it in the background. If you need to index it anyway, then make sure that you don't have the duplication in your data source. This can be either done with the code level as well, just to check before it inserts. Otherwise, you need to have a mechanism to remove duplicates from the source data before running the index.



Comments

Popular posts from this blog

Install Docker on Windows 11 with WSL Ubuntu 22.04

This is to install Docker within Ubuntu WSL without using the Windows Docker application. Follow the below steps. Install Ubuntu 22.04 WSL 1. Enable Windows Subsystem for Linux and Virtual Machine platform Go to Control Panel -> Programs -> Programs and Features -> Turn Windows features on or off 2. Switch to WSL 2 Open Powershell and type in the below command. wsl --set-default-version 2 If you don't have WSL 2, download the latest WSL 2 package and install it.  3. Install Ubuntu Open Microsoft Store and search for Ubuntu. Select the version you intend to install. I'd use the latest LTS version Ubuntu 22.04. Click on the Get button. It will take a couple of minutes to download and install. 4. Open up the installed Ubuntu version that was installed. If you get an error like the below image, make sure to install the WSL2 Kernel update .  If it's an older Ubuntu version the error message would be something like the image below. Error: WSL 2 requires an update to its

How to fix SSLHandshakeException PKIX path building failed in Java

TL ; DR 1. Extract the public certificate of the website/API that you are trying to connect from your Java application. Steps are mentioned in this post 2. Use the Java keytool to install the extracted certificate into the "cacerts" file (Trust store) keytool -import -trustcacerts -alias <domain name> -file <public certificate>.cert -keystore /path_to_java_home/jre/lib/security/cacerts -storepass changeit 3. Restart your Java application Exception A typical exception stack trace would look like below. javax.net.ssl. SSLHandshakeException : sun.security.validator.ValidatorException: PKIX path building failed : sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target at sun.security.ssl.Alerts.getSSLException(Alerts.java:192) at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1959) at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:302) at sun.security.ssl.Handshake

How to set terminal title in Ubuntu 18.04 LTS

In older Ubuntu versions, you could have just right-clicked on the Ubuntu Terminal window's title bar and set any title you would like. But unfortunately after Ubuntu 18.04 LTS this feature is gone. I used to love this feature because I'm multiple tabs in the single terminal window kind of a guy. I usually like to work in multiple named terminal tabs like below. By default, in newer Ubuntu version, it is showing just the current directory. Let's see how we can do this. Ubuntu prompt In ubuntu's Bash, there's an environment variable $PS1 which is responsible for the details that the command line prompts. You'll be able to echo this and see what's inside it. If I echo it it will print something like this. echo $PS1 \[\e]0;\u@\h: \w\a\]${debian_chroot:+($debian_chroot)}\[\033[01;32m\]\u@\h\[\033[00m\]:\[\033[01;34m\]\w\[\033[00m\]\$ If you really want to understand what this means, you can refer this page .  Updating the termina