Building and Deploying a Machine Learning Model: An End-to-End Guide with Docker, GitHub Actions, and Deployment

Machine learning (ML) has become a cornerstone of innovation across industries. However, creating an ML model is only one part of the process. Deploying that model in a way that ensures consistency, scalability, and maintainability is equally critical. This blog takes a deep dive into building and deploying an end-to-end ML pipeline for predicting California house prices. We will cover every step, from data preprocessing to containerization with Docker, setting up CI/CD pipelines using GitHub Actions, and deploying the final application.

Project Overview

In this project, we aim to predict house prices in California using linear regression. Our primary goals are:

Model Development: Train a machine learning model on the California housing dataset.
Containerization: Use Docker to create a consistent and portable runtime environment.
Automation: Implement CI/CD pipelines with GitHub Actions to streamline testing and deployment. Learning Model: An End-to-End Guide with Docker, GitHub Actions, and Deployment
Deployment: Make the model available as a web service via a cloud platform.

Step 1: Data Preprocessing and Model Training

1.1 Dataset Overview

The dataset used is the California housing dataset, which provides information on various housing features such as median income, population, and the median house value. This dataset is a popular choice for regression tasks.

1.2 Preprocessing Steps

Before training the model, we preprocess the dataset to ensure optimal performance:

Handling Missing Data: Any missing values are filled with appropriate measures like the mean or median.
Feature Scaling: Features are standardized to have a mean of 0 and a standard deviation of 1.
Splitting Data: The data is divided into training and testing sets in an 80:20 ratio.

from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Load dataset
data = fetch_california_housing()
X = data.data
y = data.target

# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

1.3 Model Training

We use a simple linear regression model to predict house prices.

from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Train the model
model = LinearRegression()
model.fit(X_train, y_train)

# Evaluate the model
predictions = model.predict(X_test)
mse = mean_squared_error(y_test, predictions)
print(f"Mean Squared Error: {mse}")

Step 2: Containerization with Docker

Docker is a powerful tool for creating portable and consistent environments. By containerizing our application, we ensure that it runs seamlessly across different systems.

2.1 Writing a Dockerfile

A Dockerfile is a script that specifies the environment and dependencies required for the application.

FROM python:3.7
COPY . /app
WORKDIR /app
RUN pip install -r requirements.txt
EXPOSE $PORT
CMD gunicorn --workers=4 --bind 0.0.0.0:$PORT app:app

2.2 Building and Running the Container

Commands to package and run the application:

docker build -t ml-housing-app .
docker run -p 5000:5000 ml-housing-app

Step 3: Automating with GitHub Actions

Continuous Integration and Deployment (CI/CD) reduce manual efforts by automating testing and deployment tasks.

3.1 The Workflow File

A .github/workflows/main.yml file is defined to automate the pipeline.

name: Deploy to Render

on:
  push:
    branches: [main]

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      # Check out your repository
      - name: Checkout Repository
        uses: actions/checkout@v2

      # Log in to Docker Hub
      - name: Log in to Docker Hub
        uses: docker/login-action@v2
        with:
          username: ${{ secrets.DOCKER_USERNAME }}  # Docker Hub username secret
          password: ${{ secrets.DOCKER_PASSWORD }}  # Docker Hub password/token secret

      # Build and push Docker image
      - name: Build and Push Docker Image
        run: |
          docker build -t ${{ secrets.DOCKER_USERNAME }}/my-app:latest .
          docker push ${{ secrets.DOCKER_USERNAME }}/my-app:latest

      # Deploy to Render via Render's API
      - name: Deploy to Render
        run: |
          curl -X POST \
            -H "Authorization: Bearer ${{ secrets.RENDER_API_KEY }}" \
            -H "Content-Type: application/json" \
            -d "{\"serviceId\":\"srv-ctlahm0gph6c739jiv5g\",\"clearCache\":true}" \
            https://api.render.com/v1/services/srv-ctlahm0gph6c739jiv5g/deploys

4.1 Choosing a Cloud Platform

Render is a cloud platform that offers simple deployment options for web applications, APIs, and Docker-based services. For this guide, we'll use Render to deploy a Dockerized application.

4.2 Deploying the Application to Render

Step 1: Create a Render Account

If you don't already have an account on Render, go to Render's website and sign up.

Step 2: Connect Your GitHub Repository

Log into your Render account.
Click on the New button and select Web Service (for web applications) or Private Service (for private applications).
Connect your GitHub repository to Render by following the on-screen instructions.

4.3 Automating Deployment with CI/CD on Render

Step 1: Configure the Web Service

Render will automatically detect the Dockerfile or your app's build settings. If you’re not using Docker, it will attempt to infer the build command based on the project type (e.g., Python, Node.js).
Specify the Branch you want Render to use for deployments (e.g., main or master).
Set the Start Command (e.g., python app.py for Python apps).
Choose the environment (e.g., Free or Paid Tier).
If your app requires environment variables (e.g., API keys or secrets), set them under the Environment Variables section in the Render dashboard.
Click Create Web Service.

Step 2: Trigger Automatic Deployment

With CI/CD enabled:

Every time you push changes to the specified branch in your GitHub repository, Render will automatically detect the new commit, pull the changes, and redeploy your application.
You can monitor the build and deployment logs in the Render dashboard under the Logs section.

Step 3: Access Your Application

Once deployment is complete, Render will provide a live URL for your application.

California house price predictor

Advantages of Using Render's GitHub Integration

Fully Automated: No need to manually trigger builds; every commit is deployed automatically.
Version Control: Easily roll back to previous commits if needed.
Monitoring: View logs and monitor application performance directly in the Render dashboard.

Conclusion

This guide demonstrates an efficient and modern approach to deploying machine learning models. By leveraging Docker and GitHub Actions, you can ensure portability, scalability, and automation—key elements for real-world applications.