Back

Building OpenStatus: A Deep Dive into Our Infrastructure Architecture

Building OpenStatus: A Deep Dive into Our Infrastructure Architecture
TD
Thibault Le Ouay Ducasse

Dec 12, 20243 min read

Infrastructure Overview

OpenStatus is a synthetic monitoring platform designed with resilience, scalability, and efficiency in mind. Our infrastructure is a carefully orchestrated ecosystem of multiple applications and services, each playing a crucial role.

Application Landscape

Our platform comprises several interconnected applications, each serving a specific purpose:

  1. Frontend Ecosystem The core of our user interaction is built on a robust, multi-faceted frontend architecture. We've strategically chosen technologies that provide optimal performance and developer experience:
    • A NextJS application that powers our marketing site, user dashboard, and status page hosted on Vercel.
    • An Astro + Starlight-powered documentation platform, ensuring our users have comprehensive, easily navigable documentation hosted on Cloudfare Pages.
  2. Backend Infrastructure All our backend services are hosted on Fly.io.
    • API server: Our public API and our alerting engine
    • Probes/Checker: a golang app deployed globally to monitor your service
    • Screenshot app: a service that takes screenshot of your website when we detect an downtime (Playwright)
Hosting providers

Managed Services

We also rely heavily on managed service to avoid handling it by ourselves. Here are some of the services we use

Scheduling and Job Management

Recognizing the critical nature of monitoring, we've heavily rely on managed services for scheduling and job management:

  • Cron Jobs: Currently using Vercel Cron, with plans to migrate to Google Cron for an enhanced user experience and taking advantage of Google Cloud Platform credits for cost savings.

Queue Architecture

Every check are pushed to a queue and processed by our probes. The probes are responsible to check the status of your service.

  • Job Queue: Google Task Queues provide our distributed task management, with strategically segmented queues for different check frequencies

We've implemented a granular queue system to ensure efficient task processing:

  • Separate queues for frontend services
  • Dedicated queues for API server and alerting engine
  • Specialized queues for probes and screenshot services
Queue providers

Hosting Strategy

Our multi-cloud approach ensures flexibility and optimal performance:

  • Frontend: Hosted on Vercel for seamless deployment and edge networking
  • Probes: Currently on Fly.io, with plans to add more providers for our global monitoring system.
  • Queue Management: Leveraging Google Cloud Platform (benefiting from Google credits)

Data Infrastructure

We also don't want to handle the data infrastructure by ourselves. We rely on managed services for that:

  • Primary Database: Turso, providing a cost efficient data storage solution.
  • Analytics Database: Tinybird, enabling complex analytical queries and insights.

Design Philosophy

Our infrastructure design is driven by several key principles:

  • Resilience: Ensuring high availability and fault tolerance
  • Scalability: Architectural choices that allow seamless growth
  • Cost-Efficiency: Leveraging managed services and cloud credits
  • Performance: Optimizing each component for maximum efficiency.

Conclusion

Building a resilient synthetic monitoring platform requires a thoughtful approach to infrastructure design. By carefully orchestrating our applications, services, and managed solutions, we've created a robust ecosystem that delivers on our promise of reliable monitoring and alerting.

The drawback of this approach is the complexity of providing an easy self hostable services.