Watchdog by Team Untitled

A Smart Alert System to proactively monitor and diagnose backend health issues.

, and

May 05, 2025

Days before the Hackathon 🛀

The Brainstorming

Team formed after announcement; began brainstorming sessions.
Faced initial confusion about what qualified as an "internal operations optimization" idea.
Narrowed down to three initial concepts:
1. UI Health Checker: A bot to crawl customer stores, check UI consistency, ensure all elements are visible/functioning, take screenshots, and generate a report.
2. Smart Alert System: Monitor multiple backend tables (alerts, product updates, etc.), detect anomalies, investigate root causes, and auto-generate a support report.
3. Churn Predictor: Analyze merchant behavior, feature usage, and business impact to predict churn—currently done manually by the success team.

The Timeline

ETA: 48 hours 🚀

Final Idea Selection

Initially leaned towards the UI Health Checker, but dropped it:
- Other teams were working on similar solutions.
- It was too narrow for a 48-hour hackathon.
Pivoted to Watchdog - the Smart Alert System

ETA: 32 hours 💪

Design & Planning

Mapped relevant database tables to monitor.
Created a detailed flowchart to define:
- Table inspection order.
- Agent behavior (when it activates/stops).
- Trigger mechanism (e.g., failure in the notification alert table).
- Output report format and Slack channel for sharing insights.
Data Preparation:
- Decided not to use real customer data.
- Studied schemas of relevant tables (e.g., visits, products, sent_alerts, failed_alerts, product_updates).
- Generated mock data using Gemini - @Naga Priyanka

ETA: 20 hours 🥱

Implementation

Set up an MCP server to:
- Access Postgres
- Detect patterns
- Identify anomalies
- Generate a report
Faced some initial setup/syntax issues—support from @Senthil Murugan was crucial.
Cross-team collaboration included:
- @Ashutosh Singh sharing MCP expertise.
- @Nishaanth S P advising on table access and scope.

ETA: 8 hours 😵‍💫

Demo scenarios

The Problem Statement

Our current alerting system is fragmented and noisy. Multiple Slack channels are flooded with alerts from various backend systems (product updates, user events, error logs, etc.), but most of these messages are either non-actionable or lack clear context. As a result, engineers and support often treat alert channels as passive logs rather than active warnings. Critical issues go unnoticed until a customer support ticket is raised, leading to a reactive rather than a proactive support model.

“Alerts? or Logs? - I don’t know” 🤷‍♂️

The Solution

We will build a Smart Alert System powered by a Model Context Protocol (MCP) architecture to proactively monitor and diagnose backend health issues.

MCP Server:
- Expose multiple dynamic tools over a streaming SSE-based MCP server.
- Tools will include:
  - A Postgres tool to query Swym’s BI and operational data layers.
  - A Shopify API tool to check critical components like webhooks and metafields.
  - A Swym Admin tool to check backend and frontend configs
  - Custom analyzers built using prompts on the Gemini API to naturally query and detect anomalies across alerts, product updates, error logs, etc.
MCP Client:
- Use Gemini (or other LLMs) to intelligently decide which MCP tools to invoke based on symptoms or observed trends, such as Notifications Sent being dropped to 0.
- Run scheduled health check cron jobs daily (or at configurable intervals).
- Automatically detect falling trends (e.g., drops in wishlist activity, spike in webhook failures).
- Investigate likely root causes by correlating across multiple datasets (internal and external).
- Auto-generate Smart Alerts — clean, structured, and actionable — that summarize:
  - What failed
  - Why it failed (root cause)
  - Impact analysis (severity, affected merchants/users)
  - Suggested remediation using gemini API
Slack Integration:
- Smart Alerts with RCA will be pushed proactively to specific Slack channels (like #smart-alerts) with clear priority tagging.
- Minimize noise and prevent random error flooding.
- Ensure actionable alerts are delivered before a support ticket is opened.

This new system will fundamentally shift alerting from noisy and reactive to intelligent, proactive, and focused on root causes, improving operational efficiency and customer experience

The Engineering

Setting up Tools in our MCP server

private setupTools() {
  const setToolSchema = () =>
    this.server.setRequestHandler(ListToolsRequestSchema, async () => {
      this.singleGreetToolName = `single-greeting-${randomUUID()}`;
  
      const tools = [
        {
          name: 'generate-sql-query',
          description: 'Generate SQL based on natural language.',
          inputSchema: {
            type: 'object',
            properties: {
              question: { type: 'string', description: 'Natural language question' }
            },
            required: ['question']
          }
        }
      ];
  
      return { tools };
    });

  setToolSchema();  
  
  this.server.setRequestHandler(CallToolRequestSchema, async (request, extra) => {
    const args = request.params.arguments;
    const toolName = request.params.name;
  
    if (!args || !toolName) throw new Error('Tool name or arguments missing');
  
    if (toolName === 'generate-sql-query') {
      const { question } = args;
  
      const client = await pool.connect();
      try {
        // Collect schema for all public tables
        const tableNamesRes = await client.query(
          "SELECT table_name FROM information_schema.tables WHERE table_schema = 'business_intelligence'"
        );
        const tableNames = tableNamesRes.rows.map((r) => r.table_name);
  
        let prompt = `Generate a read-only SQL query based on the following question: "${question}".\n\n`;
        prompt += 'Here are the relevant table schemas:\n\n';
  
        for (const table of tableNames) {
          const cols = await client.query(
            'SELECT column_name, data_type FROM information_schema.columns WHERE table_name = $1',
            [table]
          );
          prompt += `Table "${table}":\n`;
          cols.rows.forEach((c) => {
            prompt += `- ${c.column_name} (${c.data_type})\n`;
          });
          prompt += '\n';
        }
  
        prompt += 'Only use SELECT statements. Do not modify data.';
  
        const result = await model.generateContent(prompt);
        const sql = result.response.text();
  
        return {
          content: [{ type: 'text', text: sql }],
          isError: false
        };
      } catch (error) {
        return {
          content: [{ type: 'text', text: String(error) }],
          isError: true
        };
      } finally {
        client.release();
      }
    }
  
    throw new Error('Tool not found');
  });
}

Making the Server streamable

import { Server, StreamableHTTPServerTransport } from '@modelcontextprotocol/sdk/server';
import { Request, Response } from 'express';
import { randomUUID } from 'crypto';

export class MCPServer {
  server: Server;
  transports: { [sessionId: string]: StreamableHTTPServerTransport } = {};
  private toolInterval: NodeJS.Timeout | undefined;

  constructor(server: Server) {
    this.server = server;
    this.setupTools();
  }

  async handleGetRequest(req: Request, res: Response) {
    const sessionId = req.headers['mcp-session-id'] as string | undefined;
    if (!sessionId || !this.transports[sessionId]) {
      res.status(400).json(this.createErrorResponse('Bad Request: invalid session ID.'));
      return;
    }
    const transport = this.transports[sessionId];
    await transport.handleRequest(req, res);
  }

  async handlePostRequest(req: Request, res: Response) {
    const sessionId = req.headers['mcp-session-id'] as string | undefined;
    let transport: StreamableHTTPServerTransport;

    try {
      if (sessionId && this.transports[sessionId]) {
        transport = this.transports[sessionId];
        await transport.handleRequest(req, res, req.body);
        return;
      }

      if (!sessionId && this.isInitializeRequest(req.body)) {
        transport = new StreamableHTTPServerTransport({
          sessionIdGenerator: () => randomUUID(),
        });

        await this.server.connect(transport);
        await transport.handleRequest(req, res, req.body);

        const newSessionId = transport.sessionId;
        if (newSessionId) {
          this.transports[newSessionId] = transport;
        }
        return;
      }

      res.status(400).json(this.createErrorResponse('Bad Request: invalid session ID.'));
    } catch (error) {
      console.error('Error handling MCP request:', error);
      res.status(500).json(this.createErrorResponse('Internal server error.'));
    }
  }

  async cleanup() {
    this.toolInterval?.close();
    await this.server.close();
  }

  private createErrorResponse(message: string) {
    return { error: { message } };
  }

  private isInitializeRequest(body: any) {
    return body?.type === 'initialize';
  }

  private setupTools() {
    // Initialize tools here
  }
}

Our simple client can get the list of all tools and call the tools

import { Client } from '@modelcontextprotocol/sdk/client/index.js';
import { URL } from 'url';
import { StreamableHTTPClientTransport } from '@modelcontextprotocol/sdk/client/streamableHttp.js';
import { TextContentSchema } from '@modelcontextprotocol/sdk/types.js';

class MCPClient {
  tools: { name: string; description: string }[] = [];
  private client: Client;
  private transport: StreamableHTTPClientTransport | null = null;
  private isCompleted = false;

  constructor(serverName: string) {
    this.client = new Client({
      name: `mcp-client-for-${serverName}`,
      version: '1.0.0'
    });
  }

  async connect(serverUrl: string) {
    this.transport = new StreamableHTTPClientTransport(new URL(serverUrl));
    await this.client.connect(this.transport);

    this.transport.onclose = () => { this.isCompleted = true; };
    this.transport.onerror = async () => { await this.cleanup(); };
  }

  async listTools() {
    const toolsResult = await this.client.listTools();
    this.tools = toolsResult.tools.map((tool) => ({
      name: tool.name,
      description: tool.description ?? ''
    }));
  }

  async callTool(name: string, params: Record<string, any> = {}) {
    const result = await this.client.callTool({ name, arguments: params });
    for (const item of result.content as object[]) {
      const parsed = TextContentSchema.safeParse(item);
      if (parsed.success) console.log(parsed.data.text);
    }
  }

  async waitForCompletion() {
    while (!this.isCompleted) {
      await new Promise((resolve) => setTimeout(resolve, 100));
    }
  }

  async cleanup() {
    await this.client.close();
  }
}

async function main() {
  const client = new MCPClient('sse-server');
  await client.connect('http://localhost:3000/mcp');
  await client.listTools();

  for (const tool of client.tools) {
    let params: Record<string, any> = {};

    if (tool.name === 'generate-sql-query') {
      params = { question: 'What is the total revenue for last month?' };
    } else if (tool.name === 'run-sql-query') {
      params = { sql: "SELECT * FROM orders WHERE order_date > NOW() - INTERVAL '7 days'" };
    } else if (tool.name === 'call-external-api') {
      params = { merchant_pid: 'T/po77FeJSjjEkCE=' };
    } else if (tool.name === 'send-slack-message') {
      params = { channel: '#alerts', message: 'System alert: All systems operational.' };
    }

    await client.callTool(tool.name, params);
  }

  await client.waitForCompletion();
  await client.cleanup();
}

main();

We can extend the client to use a Prompt generated by our engineering or support folks that uses Gemini or any other LLM to understand what tool to use and execute the selected tool in a clean order, and finally send the “smart-slack-alert”.

Team Contributions

Yashit: Led engineering efforts.
Priyanka & Tejesh: Supported with data and schema prep.
Dhanush: Designed the system flowchart.
Hassan: Helped with miscellaneous tasks.
Thanks for reading Swym Scrolls! Subscribe!

A guest post by

Priyanka Budivarthi

A guest post by

Dhanush

Swym Stories

Discussion about this post