Create an AI chatbot from scratch using Docker

NewsCreate an AI chatbot from scratch using Docker

In the ever-evolving realm of artificial intelligence, Generative AI (GenAI) stands out as a transformative force in software development. However, despite its potential, developers often encounter numerous challenges when creating AI-powered applications. Today, we will explore how to build a fully functional Generative AI chatbot using Docker Model Runner, alongside observability tools like Prometheus, Grafana, and Jaeger. This guide aims to address common hurdles faced by developers, demonstrating how Docker Model Runner provides a streamlined solution, and guiding you through the process of building a production-ready chatbot with comprehensive monitoring capabilities.

The Current Challenges in GenAI Development

Generative AI is reshaping the landscape of software development, but constructing AI-driven applications is fraught with challenges. One significant issue is the fragmented AI landscape, where developers must integrate various libraries and frameworks that are not inherently compatible. Additionally, executing large language models efficiently necessitates specialized hardware configurations, which differ across platforms. This often results in teams maintaining separate environments for their application code and AI models, complicating the development process.

Another challenge is the lack of standardized methods for storing, versioning, and deploying models, leading to inconsistent practices. Moreover, relying on cloud-based AI services can lead to unpredictable costs that scale with usage. Sending data to external AI services also poses privacy and security risks, particularly for applications handling sensitive information. These challenges collectively create a frustrating developer experience, hampering experimentation and slowing down innovation at a time when businesses are eager to accelerate their AI adoption.

How Docker is Addressing These Challenges

Docker Model Runner offers a revolutionary approach to GenAI development by integrating AI model execution directly into familiar container workflows. This innovation simplifies the process for developers by making it easier to run AI models locally, within the existing Docker framework. Key benefits of using Docker Model Runner include:

  • Simplified Model Execution: Execute AI models locally with a simple Docker CLI command, eliminating the need for complex setup.
  • Hardware Acceleration: Gain direct access to GPU resources without the overhead associated with containerization.
  • Integrated Workflow: Seamlessly integrate with existing Docker tools and container development practices.
  • Standardized Packaging: Distribute models as Open Container Initiative (OCI) artifacts through the same registries you already use.
  • Cost Control: Eliminate unpredictable API costs by running models locally.
  • Data Privacy: Keep sensitive data within your infrastructure, with no external API calls.

    This approach fundamentally transforms the way developers can build and test AI-powered applications, making local development faster, more secure, and significantly more efficient.

    Building an AI Chatbot with Docker

    This guide will walk you through the process of building a comprehensive GenAI application, showcasing how to create a fully-featured chat interface powered by Docker Model Runner, complete with advanced observability tools to monitor and optimize your AI models.

    Project Overview

    The project is a complete Generative AI interface that demonstrates how to:

    1. Create a responsive React/TypeScript chat UI with streaming responses.
    2. Build a Go backend server that integrates with Docker Model Runner.
    3. Implement comprehensive observability with metrics, logging, and tracing.
    4. Monitor AI model performance with real-time metrics.

      Architecture

      The application consists of several key components:

    5. Frontend: Sends chat messages to the backend API.
    6. Backend: Formats the messages and sends them to the Model Runner.
    7. LLM: Processes the input and generates a response.
    8. Backend: Streams the tokens back to the frontend as they’re generated.
    9. Frontend: Displays the incoming tokens in real-time.
    10. Observability Components: Collect metrics, logs, and traces throughout the process.

      This architecture enables a seamless flow of data between the frontend, backend, Model Runner, and observability tools like Prometheus, Grafana, and Jaeger.

      Project Structure

      The project is structured as follows:

      <br /> .<br /> ├── Dockerfile<br /> ├── README-model-runner.md<br /> ├── README.md<br /> ├── backend.env<br /> ├── compose.yaml<br /> ├── frontend<br /> ..<br /> ├── go.mod<br /> ├── go.sum<br /> ├── grafana<br /> │ └── provisioning<br /> ├── main.go<br /> ├── main_branch_update.md<br /> ├── observability<br /> │ └── README.md<br /> ├── pkg<br /> │ ├── health<br /> │ ├── logger<br /> │ ├── metrics<br /> │ ├── middleware<br /> │ └── tracing<br /> ├── prometheus<br /> │ └── prometheus.yml<br /> ├── refs<br /> │ └── heads<br /> ..<br />

      We’ll explore the key files and understand how they work together throughout this guide.

      Prerequisites

      Before starting, ensure you have:

  • Docker Desktop (version 4.40 or newer).
  • Docker Model Runner enabled.
  • At least 16GB of RAM for efficient AI model execution.
  • Familiarity with Go (for backend development).
  • Familiarity with React and TypeScript (for frontend development).

    Getting Started

    To run the application, follow these steps:

    1. Clone the Repository:

      bash<br /> git clone https://github.com/dockersamples/genai-model-runner-metrics<br /> cd genai-model-runner-metrics<br />

    2. Enable Docker Model Runner in Docker Desktop:
      • Go to Settings > Features in Development > Beta tab.
      • Enable “Docker Model Runner”.
      • Select “Apply and restart”.
    3. Download the Model:

      For this demonstration, we’ll use Llama 3.2, but you can substitute any model of your choice:

      bash<br /> docker model pull ai/llama3.2:1B-Q8_0<br />

    4. Start the Application:

      bash<br /> docker compose up -d --build<br />

    5. Access the Chat Interface:

      Open your browser and navigate to http://localhost:3000. You’ll be greeted with a modern chat interface featuring a clean, responsive design with a dark/light mode toggle, a message input area ready for your first prompt, and model information displayed in the footer.

    6. View Metrics:

      Click on "Expand" to view metrics like input tokens, output tokens, total requests, average response time, and error rate.

      Implementation Details

      Let’s delve into the workings of the key components:

    7. Frontend Implementation: The React frontend provides a clean, responsive chat interface built with TypeScript and modern React patterns. The core App.tsx component manages state for dark mode preferences and model metadata fetched from the backend’s health endpoint. When the component mounts, the useEffect hook automatically retrieves information about the currently running AI model, displaying details like the model name directly in the footer.
    8. Backend Implementation: The Go backend communicates with Docker Model Runner, leveraging its OpenAI-compatible API. The Model Runner exposes endpoints that match OpenAI’s API structure, allowing standard client usage.
    9. Metrics Flow: The backend acts as a metrics bridge, connecting to llama.cpp via Model Runner API, collecting performance data from each API call, calculating metrics like tokens per second and memory usage, and exposing all metrics in Prometheus format.
    10. LLama.cpp Metrics Integration: The project provides detailed real-time metrics for llama.cpp models, including tokens per second (generation speed), context window size (maximum context length in tokens), prompt evaluation time (time spent processing input prompt), memory per token (memory efficiency), thread utilization (CPU threads used), and batch size (token processing batch size).
    11. Chat Implementation with Streaming: The chat endpoint implements streaming for real-time token generation, ensuring tokens appear in real-time in the user interface, providing a smooth and responsive chat experience.
    12. Performance Measurement: The system measures various performance aspects of the model, including first token time, tokens per second, and others, helping to optimize the user experience.
    13. Metrics Collection: The metrics.go file defines a comprehensive set of Prometheus metrics that allow monitoring of both application performance and llama.cpp model behavior.
    14. Core Metrics Architecture: The file establishes a collection of Prometheus metric types, including counters for cumulative values, gauges for values that can increase and decrease, and histograms for measuring distributions of values.

      Docker Compose: LLM as a First-Class Service

      With Docker Model Runner integration, Docker Compose simplifies AI model deployment, turning it into a standard service, akin to any other infrastructure component. The docker-compose.yml file defines the entire AI application, including AI models, application backend and frontend, observability stack, and all networking and dependencies, with the llm service using Docker’s model provider.

      Conclusion

      The genai-model-runner-metrics project exemplifies a powerful approach to building AI-powered applications with Docker Model Runner while maintaining comprehensive visibility into performance metrics. By combining local model execution with extensive metrics, developers gain both privacy and cost benefits of local execution alongside the observability essential for production applications.

      Whether you’re developing a customer support bot, a content generation tool, or a specialized AI assistant, this architecture provides a solid foundation for reliable, observable, and efficient AI applications. The metrics-driven approach ensures continuous monitoring and optimization, leading to better user experiences and more efficient resource utilization.

      Learn More

  • Read our quickstart guide to Docker Model Runner.
  • Find documentation for Model Runner.
  • Subscribe to the Docker Navigator Newsletter.
  • New to Docker? Create an account.
  • Have questions? The Docker community is here to help.

    By following this guide, you can embark on your journey to build a robust, locally-executed, metrics-driven Generative AI application with Docker Model Runner.

For more Information, Refer to this article.

Neil S
Neil S
Neil is a highly qualified Technical Writer with an M.Sc(IT) degree and an impressive range of IT and Support certifications including MCSE, CCNA, ACA(Adobe Certified Associates), and PG Dip (IT). With over 10 years of hands-on experience as an IT support engineer across Windows, Mac, iOS, and Linux Server platforms, Neil possesses the expertise to create comprehensive and user-friendly documentation that simplifies complex technical concepts for a wide audience.
Watch & Subscribe Our YouTube Channel
YouTube Subscribe Button

Latest From Hawkdive

You May like these Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.