Flask — MongoDB: Validate Your Data Using JSON Schema

Obi kastanya
6 min readJan 6, 2023
mongodb.com

MongoDB is a great tool when we want to work with dynamic data where each record could have a different data structure and not be bound to the other. It's a great idea, but sometimes it could be also a nightmare.

When we talk about data integrity, dynamic data is a pain in the ass. Imagine we have 2 different columns with the same name but different data types, which one is right? That's an example.

Another problem with dynamic data structure is, that we don't have any documentation about the structure. If you ever use SQL Databases, you could see the data structure by looking at the table description.

PostgreSQL table structure viewed using pgAdmin

It's easy to describe what the data look like, and it is also well-documented.

But with dynamic data structure, it's hard to trace our data structure. Since MongoDB doesn't keep it, we need to document our data structure somewhere else, or we have to read our program's code just to figure out what the data is look like.
Oftentimes, the inconsistency of data could create an issue in our programs.

Is there any way to keep our data structure consistent and well-documented?

Yes, that's why I wrote this article. We could JSON Schema.

A schema is a JSON object that defines the structure and contents of your data.
JSON Schema allows us to validate our JSON data before its written into our mongo collection.
To make it more clear, let's do some experiments on MongoDB Compass.

Create a collection named ‘contact’ in our mongo database. Run the following command in MongoDB Compass Shell.

That command will create a collection named “contact ” with a certain JSON Schema.

In that JSON Schema, we create a validator to validate our JSON data. Inside the validator, we list all the required attributes and the properties description.
For more schema specifications, you can visit the official MongoDB documentation.

Let’s create new data where the age is not valid.
Run the following command in the MongoDB Compass Shell.

db.contact.insertOne({
"_id" :"1",
"name":"Obi Kastanya",
"phone_number":"087876782767",
"age":-23
})

We will receive an error like this.

It said that our data failed the document validation.

Take a look deeper into the error.

To make it easier to read the error detail, let me just paste it here.

{
"failingDocumentId": "1",
"details": {
"operatorName": "$jsonSchema",
"title": "Contact Object Validation",
"schemaRulesNotSatisfied": [
{
"operatorName": "properties",
"propertiesNotSatisfied": [
{
"propertyName": "age",
"description": "'Age' must be an integer and between 0-150",
"details": [
{
"operatorName": "minimum",
"specifiedAs": { "minimum": 0 },
"reason": "comparison failed",
"consideredValue": -23
}
]
}
]
}
]
}
}

“schemaRulesNotSatisfied” means that our data didn't match our schema specification/rules. It could be a property that didn't match the constraint, a missing field, an unwanted additional field and etc.

“propertiesNotSatisfied” means that the value of the field didn't fit the properties specification. In this example, our “age” field didn't fit the minimum value where it must be a value between 0 and 150.

MongoDB will return the error description. We could use it for our application.

For now, try to insert new data that fit our schema, to make sure that it works.

Fantastic. The data was successfully inserted.

Now let's try to do it on our Flask Application.

Create 2 files in our project, main.py, and db.py.

main.py:

from flask import (
Flask,
request
)
from db import Connection

app=Flask(__name__)
db=Connection('flask_mongo_crud')


if __name__=="__main__":
app.run(debug=True, port=8887)

db.py:

from pymongo import MongoClient

config={
"host":"localhost",
"port":27017,
"username":"",
"password":""
}

class Connection:
def __new__(cls, database):
connection=MongoClient(**config)
return connection[database]

If you follow my previous tutorial about flask and MongoDB, you must already have those files.

Now add some API endpoints in the main.py files.

Run the server. Use the following command.

python main.py

This API will work perfectly.

Check the API using Postman. Use this JSON payload.

{
"title": "OPPOF19",
"description": "OPPO F19 is officially announced on April 2021.",
"price": "230",
"discount_percentage": "129",
"rating": 7,
"stock": "123",
"brand": "OPPO",
"category": "smartphones",
"thumbnail": "https://i.dummyjson.com/data/products/4/thumbnail.jpg",
"images": ["https://i.dummyjson.com/data/products/4/1.jpg"]
}

You’ll receive the following result:

As you can see, it works. But the data is not valid, the discount percentage is more than 100% and it's also string, not float or double data types.

Now, let's add the JSON Schema to validate the data.

There are 2 ways to add JSON Schema, include the Schema while creating a new collection, or, we could create a command to attach JSON Schema to our existing collection.

In this tutorial, we gonna attach our Schema to an existing collection.

Create a folder named “schema” in our project, then add the following files.

product_category.json:

product.json:

Then, create a file named schema.py.

The idea is to store our JSON Schema inside the ‘schema’ folder. The schema.py script would read all files inside the schema folder and create a collection based on the JSON filename.
After creating the collection, the script will read the schema inside JSON files and attach it to the collection.

Now, Update the main.py files.

Import mongo Exception.

...
from pymongo.errors import WriteError
...

Modify the try-except block to this:

try:
....
except WriteError as e:
return {
"message":e.details
}, 500

Now run the following command in your console:

python schema.py

Check the APIs again. This is the result you'll receive.

Now we get errors because our parameter doesn't fit the JSON Schema.
We got a lot of errors here.

Let's simplify the errors so the API consumer could use the error message. Instead of returning all the errors, let's just return the first one only. So the user will fix them one by one.

Create a new file named error.py.

SchemaValidationError class will receive the pymongo error and extract the first error message from it.
__new__() is a method that will be called every time an instance/object of the class is created. By overriding the __new_() method, we could change the value that is returned by the object.

In this case, we will return the error message string from the pymongo exception.

Let's change the main.py files.

Import the SchemaValidationError class.

...
from error import SchemaValidationError
...

Modify the try-except block to this:

try:
....
except WriteError as e:
return {
"message":SchemaValidationError(e)
}, 500

Let's check the APIs again.

Perfect.

Now let's use the correct payload.

payload:

{
"title": "OPPOF19",
"description": "OPPO F19 is officially announced on April 2021.",
"price": 230.00,
"discount_percentage": 89.00,
"rating": 4.7,
"stock": 123,
"brand": "OPPO",
"category": "smartphones",
"thumbnail": "https://i.dummyjson.com/data/products/4/thumbnail.jpg",
"images": ["https://i.dummyjson.com/data/products/4/1.jpg"]
}

It should work like this.

Now, let's check the product-category API.

Perfect.

Thanks for reading my article. If you want to see the full script, you could visit my GitHub repository here.

--

--