Data Compare Tool using Pandas, Flask with MongoDB and Docker + AWS ECR — Part 2

Subham Kumar Sahoo
8 min readJan 30, 2023

Data comparison between MySQL tables using Python Pandas and Flask. MongoDB for logging the statistics and CI/CD using Docker with AWS ECR.

Architecture Diagram

Previous part : https://medium.com/@subham-sahoo/data-compare-tool-using-pandas-flask-with-mongodb-and-docker-aws-ecr-part-1-6d8f3688c795

Hands-on time!!

Disclaimer : If you get stuck just do not get demotivated and leave stuffs halfway. Believe me the answers to your problems lie in just one search on internet (maybe more than one 😉).

GitHub repository : https://github.com/sksgit7/Data-compare-docker

Pre-requisites:

Installations on your system:

Step 1 : Create MySQL tables and add data

I hope till now you have created a DB connection on MySQL workbench and have logged in. Now lets open an editor on workbench and run the below SQL statements to :

  • Create database “test”.
  • Create two tables SRC_EMP and TGT_EMP.
  • Insert few records into these tables.
CREATE DATABASE IF NOT EXISTS TEST;
USE test;

CREATE TABLE SRC_EMP(
ID INTEGER PRIMARY KEY,
ENAME VARCHAR(20),
SALARY INTEGER
);

CREATE TABLE TGT_EMP(
ID INTEGER PRIMARY KEY,
ENAME VARCHAR(20),
SALARY INTEGER
);

INSERT INTO SRC_EMP VALUES(1, ‘Ram’, 2000);
INSERT INTO SRC_EMP VALUES(2, ‘Sam’, 1000);
INSERT INTO SRC_EMP VALUES(3, ‘Hari’, 3000);

INSERT INTO TGT_EMP VALUES(1, ‘Ram’, 2200);
INSERT INTO TGT_EMP VALUES(2, ‘Sam’, 1000);
INSERT INTO TGT_EMP VALUES(4, ‘Jay’, 4000);

-- provide access to an username from <your-ip> to access "test" DB
CREATE USER '<username>'@'<your-ip>' IDENTIFIED BY '<password>';
GRANT ALL PRIVILEGES ON test.* TO '<username>'@'<your-ip>';

FLUSH PRIVILEGES;

Here we need to manually grant certain access for Python Flask app running in Docker container to access MySQL DB running locally. If the Flask application would have been on local we could used hostname as “localhost” or “127.0.0.1” to connect to local MySQL DB here.

Run “ipconfig” con terminal (Windows).

Take on of the IPv4 Ethernet addresses and that will be the IP we put in above SQL commands. The username and password can be anything.

Directory structure

Our parent directory is db-compare.

docker directory contains db (for mongo-express Dockerfile) and web (for Flask app code and Dockerfile).

tgt-docker directory is what we will mount with Flask web application container to replicate our result csv files to local space.

compose.yaml file is to build docker images and run the docker containers.

compose-ecr.yaml file is to run the docker containers as we will get already built docker images from AWS ECR.

tgt and ve-db-compare (Python virtual environment) are for testing purpose only.

The Dockerfile at last is not required.

Step 2 : Flask web application

Now we will create Python script app.py that will:

  • Create a Flask web application through which we will submit our DB connection details and SQL queries for which we want to compare the results.
  • Connect to MySQL DB, execute the SQL queries and use Pandas dataframes to find the data differences.
  • Write the differences as csv files.
  • Connect to MongoDB database and log the statistics of the recent compare as a new document (record).

Github link: https://github.com/sksgit7/Data-compare-docker

Most of the code explanations I have included as code comments. Feel free to reach out if you need more clarity.

Along with the app.py Python Flask script, there will be:

  • templates folder which will contain “index.html” file. This will be our front-end html code that will show a form where we can fill the DB details.
  • requirements.txt file that will contain the list of libraries and packages that we need for this project. These will be auto installed using Dockerfile.

Step 3 : Creating Dockerfile

We will use the Dockerfiles to build our docker images (Flask app and mongo-express) based on a base docker runtime. As there will be three components Flask app, MongoDB and Mongo-express containers, here we need not build for MongoDB as we will directly use official MongoDB docker image from DockerHub.

Note — The name of the files should be Dockerfile only for Docker to recognize. The files can be found on the Github repository mentioned.

For Flask application

FROM python:3.7-slim-buster
WORKDIR /app
COPY ./app /app
RUN pip install -r requirements.txt
EXPOSE 3000
CMD python ./app.py

Line 1 : Fetch the Python base image from DockerHub. I have used Python 3.7 slim buster version as there was some issue with 3.9 and alpine versions.

Line 2: This assigns the work directory as /app in the docker container where this image will run. Every command (CMD) will execute on this path.

Line 3 : Copy app directory contents (like app.py, requiremets.txt and templates folder) from local into the work directory (i.e. app).

Line 4 : Install packages mentioned in requirements.txt in the container environment.

Line 5 : Expose port 3000 through which we can access the app.

Line 6 : Run “python app.py” to start the Flask app.

For Mongo-Express

We could have directly used latest image of mongo-express from DockerHub rather than building it using a Dockerfile. But we need “curl” package to be installed here as we want to ping the mongo-db container for health checks and only then start our mongo-express container. If mongo-express starts before mongo-db then it will show some error message.

FROM mongo-express:latest
RUN apk add — update curl
EXPOSE 8081

Line 1 : Fetch latest mongo-express image from DockerHub.

Line 2 : Install “curl”

Line 3 : Expose 8081 internal port for us to access. Actually we have mapped 8081 to 8080 local port, so we will access it on 8080 only.

Step 4 : Creating Docker Compose file

This we will use to build or pull (from DockerHub) our docker images and then run those in Docker containers we required configurations. All the docker containers mentioned in the file will run on same network and we will be able to connect to one container from other using container name only.

compose.yaml

version: '3'
services:
db-compare-app:
build: ./docker/web
image: db-compare:4.0
ports:
- 3000:3000
volumes:
- H:\DevOps\docker-techworld\db-compare\tgt-docker:/app/tgt
mongodb:
image: mongo
ports:
- 27018:27017
environment:
- MONGO_INITDB_ROOT_USERNAME=admin
- MONGO_INITDB_ROOT_PASSWORD=password
volumes:
- mongo-data:/data/db
healthcheck:
test: echo 'db.runCommand("ping").ok' | mongosh localhost:27017/test --quiet
interval: 15s
timeout: 5s
retries: 4
start_period: 10s

mongo-express:
build: ./docker/db
image: mongo-express-db-compare
depends_on:
mongodb:
condition: service_healthy
restart: always # fixes MongoNetworkError when mongodb is not ready when mongo-express starts
ports:
- 8080:8081
environment:
- ME_CONFIG_MONGODB_ADMINUSERNAME=admin
- ME_CONFIG_MONGODB_ADMINPASSWORD=password
- ME_CONFIG_MONGODB_SERVER=mongodb
volumes:
mongo-data:
driver: local

All our three components will be defined as one-one service.

Flask app

db-compare-app:
build: ./docker/web
image: db-compare:4.0
ports:
— 3000:3000
volumes:
— H:\DevOps\docker-techworld\db-compare\tgt-docker:/app/tgt

db-compare-app : container name.

build : path of the Dockerfile for Flask application.

image : name to be given for the image that will be generated.

ports : <host>:<container>. Here the Flask app will run on port 3000 in container and we will attach local port 3000 to it. So, we will access the Flask app on “http://localhost:3000”.

volumes : <local path>:<container path>. Here we have attached/mounted local folder tgt-docker on tgt folder of container. So, our result files “src_diff_tgt” and “tgt_diff_src” will be generated in tgt folder inside container and simultaneously replicated to our local directory.

MongoDB

mongodb:
image: mongo
ports:
- 27018:27017
environment:
- MONGO_INITDB_ROOT_USERNAME=admin
- MONGO_INITDB_ROOT_PASSWORD=password
volumes:
- mongo-data:/data/db
healthcheck:
test: echo 'db.runCommand("ping").ok' | mongosh localhost:27017/test --quiet
interval: 15s
timeout: 5s
retries: 4
start_period: 10s

image : We will pull latest mongo image from DockerHub.

ports : Attach 27018 local port to 27017 port of container on which the MongoDB service runs (by default mongo-db port is 27017).

  • Note — We have used 27018 port for local as Python comes with MongoDB integrated which by default runs on 27017. So, it is just to avoid conflicts. Actually while inside a container we will use container name i.e. “mongodb” to connect from Flask Python container rather than “localhost:27017”, so it will not cause any issue even if we keep same ports but if Python container is not on the same network then it can cause conflict. In that case we will be actually connected to a different database rather than mongo container.

environments : Here we are providing the username and password as environment variables.

volumes : We have attched a local named volume to the mongo-db path, so that when we delete the containers and restart with same volume name the data will be persisted from previous runs.

healthcheck : Health check for mongo-db which will be used by mongo-express container. Here after 10s of mongo container start, it will check every 15 second once. If the test ping is successful then it will create mongo-express container. And if unsuccessful it will retry for another 4 times.

Mongo-Express

mongo-express:
build: ./docker/db
image: mongo-express-db-compare
depends_on:
mongodb:
condition: service_healthy
restart: always # fixes MongoNetworkError when mongodb is not ready when mongo-express starts
ports:
- 8080:8081
environment:
- ME_CONFIG_MONGODB_ADMINUSERNAME=admin
- ME_CONFIG_MONGODB_ADMINPASSWORD=password
- ME_CONFIG_MONGODB_SERVER=mongodb

depends_on : Checks if the mongo-db container is up or not based on the health check defined for mongo-db.

restart : It restarts the mongo-express container in-case of any failure (mongo-db not up and running when it started). We can choose to only have this restart option rather than having health check.

ports : Attached 8080 host (local) port to 8081 container port where mongo-express service will run.

environment : Specifies username, password and name of mongo-db container.

volumes:
mongo-data:
driver: local

Just defining the named volumes at last.

In the next part we will use compose.yaml file to build and run our containers.

Next part : https://medium.com/@subham-sahoo/data-compare-tool-using-pandas-flask-with-mongodb-and-docker-aws-ecr-part-3-a7337362b416

If you liked this project, kindly do clap. Feel free to share it with others.

👉Follow me on medium for more such interesting contents and projects.

Check out my other projects here : https://medium.com/@subham-sahoo/

Connect with me at LinkedIn. ✨

Thank you!!

References :

All references can be found here : https://github.com/sksgit7/Data-compare-docker/blob/main/references.txt

--

--