Umami breakdown

System overview

Umami is a privacy-focused, open-source web analytics platform that serves as an alternative to Google Analytics. The platform offers several advantages over traditional analytics solutions: self-hosted deployment, multi-database support, and advanced reporting capabilities while maintaining strict privacy standards.

overview

Core functionality

Umami operates as a complete analytics platform that tracks and analyzes website visitor behavior through multiple data collection methods:

Primary data collection

Page view tracking with automatic URL change detection
Custom event monitoring through data attributes and programmatic calls
Session management with visitor identification and behavioral patterns
UTM parameter analysis for marketing campaign attribution
Revenue tracking through custom event data integration

Advanced analytics reports

The platform provides eight specialized report types for comprehensive business intelligence:

Insights: Custom data exploration and visualization
Funnel: Conversion pathway analysis through multi-step processes
Retention: User return behavior and engagement patterns
UTM: Marketing campaign performance tracking
Goals: Conversion event monitoring and optimization
Journey: User navigation flow analysis
Revenue: Financial performance and monetization tracking
Attribution: Marketing channel effectiveness measurement

Architecture and technical implementation

graph TD
    A["Website Visitor"] --> B["Umami Tracker Script"]
    B --> C{"Event Type"}
    C -->|Page View| D["Automatic Collection"]
    C -->|Custom Event| E["Element Interaction"]
    C -->|User Identity| F["Identification Call"]
    D --> G["Payload Assembly"]
    E --> G
    F --> G
    G --> H["POST /api/send"]
    H --> I["Request Validation"]
    I --> J["Bot Detection & IP Check"]
    J --> K["Client Info Extraction"]
    K --> L["Session Management"]
    L --> M{"Database Type"}
    M -->|Relational| N["PostgreSQL/MySQL"]
    M -->|Analytics| O["ClickHouse"]
    N --> P["Session Table"]
    N --> Q["WebsiteEvent Table"]
    N --> R["EventData Table"]
    O -->|Kafka Enabled| T["Kafka Producer"]
    O -->|Kafka Off| S["ClickHouse Database"]
    T --> AB["ClickHouse Consumer"]
    AB --> S
    P --> U["Analytics Queries"]
    Q --> U
    R --> U
    S --> U
    U --> V["Statistics API"]
    U --> W["Realtime API"]
    U --> X["Reports API"]
    V --> Y["Dashboard Display"]
    W --> Z["Live Analytics"]
    X --> AA["Custom Reports"]

Technology stack

Umami leverages Next.js 15 as its core framework with React 19 for the user interface, ensuring both optimal performance and modern development practices. The platform operates through four integration layers:

Client-side tracker for data collection
API endpoints for data processing and validation
Database layer with multi-engine support
Analytics engine for report generation and visualization

Database architecture

The system supports three database engines to accommodate different scale requirements:

PostgreSQL and MySQL for standard deployments with full relational capabilities
ClickHouse for high-volume analytics with columnar storage optimization

Data structure design

The platform employs a hierarchical data model optimized for analytics performance:

Core entities:

User and Team for access management and multi-tenant support
Website for tracking configuration and ownership
Session for visitor identification with device and location data
WebsiteEvent for all user interactions and page views
EventData and SessionData for custom analytics parameters
Report for saved analytics configurations

The database schema includes strategic indexing on time-based queries and website-specific lookups to ensure optimal query performance across millions of analytics events.

Performance infrastructure:

ClickHouse integration: For high-scale deployments, Umami supports ClickHouse for analytics workloads. This includes optimized query functions for time-series data and advanced filtering capabilities.

function getUTCString(date?: Date | string | number) {
  return formatInTimeZone(date || new Date(), 'UTC', 'yyyy-MM-dd HH:mm:ss');
}

function getDateStringSQL(data: any, unit: string = 'utc', timezone?: string) {
  if (timezone) {
    return `formatDateTime(${data}, '${CLICKHOUSE_DATE_FORMATS[unit]}', '${timezone}')`;
  }

  return `formatDateTime(${data}, '${CLICKHOUSE_DATE_FORMATS[unit]}')`;
}

function getDateSQL(field: string, unit: string, timezone?: string) {
  if (timezone) {
    return `toDateTime(date_trunc('${unit}', ${field}, '${timezone}'), '${timezone}')`;
  }
  return `toDateTime(date_trunc('${unit}', ${field}))`;
}

function getDateQuery(filters: QueryFilters = {}) {
  const { startDate, endDate, timezone } = filters;

  if (startDate) {
    if (endDate) {
      if (timezone) {
        return `and created_at between toTimezone({startDate:DateTime64},{timezone:String}) and toTimezone({endDate:DateTime64},{timezone:String})`;
      }
      return `and created_at between {startDate:DateTime64} and {endDate:DateTime64}`;
    } else {
      if (timezone) {
        return `and created_at >= toTimezone({startDate:DateTime64},{timezone:String})`;
      }
      return `and created_at >= {startDate:DateTime64}`;
    }
  }

  return '';
}

Caching strategy: Redis-based caching reduces database load for frequently accessed data, while JWT tokens enable stateless session management.

const cacheHeader = request.headers.get('x-umami-cache');

if (cacheHeader) {
  const result = await parseToken(cacheHeader, secret());
  if (result) {
    cache = result;
  }
}

Kafka streaming: For enterprise deployments, Kafka integration enables real-time event processing and horizontal scaling.

async function sendMessage(
  topic: string,
  message: { [key: string]: string | number } | { [key: string]: string | number }[],
): Promise<RecordMetadata[]> {
  try {
    await connect();

    return producer.send({
      topic,
      messages: Array.isArray(message)
        ? message.map(a => {
            return { value: JSON.stringify(a) };
          })
        : [
            {
              value: JSON.stringify(message),
            },
          ],
      timeout: SEND_TIMEOUT,
      acks: ACKS,
    });
  } catch (e) {
    console.log('KAFKA ERROR:', serializeError(e));
  }
}

Data access layer

Umami implements a sophisticated data access layer that abstracts database differences. The rawQuery function handles parameterized queries across different database types:

async function rawQuery(sql: string, data: object): Promise<any> {
  if (process.env.LOG_QUERY) {
    log('QUERY:\n', sql);
    log('PARAMETERS:\n', data);
  }

  const db = getDatabaseType();
  const params = [];

  if (db !== POSTGRESQL && db !== MYSQL) {
    return Promise.reject(new Error('Unknown database.'));
  }

  const query = sql?.replaceAll(/\{\{\s*(\w+)(::\w+)?\s*}}/g, (...args) => {
    const [, name, type] = args;

    const value = data[name];

    params.push(value);

    return db === MYSQL ? '?' : `$${params.length}${type ?? ''}`;
  });

  return process.env.DATABASE_REPLICA_URL
    ? client.$replica().$queryRawUnsafe(query, ...params)
    : client.$queryRawUnsafe(query, ...params);
}

This abstraction allows the same application code to work with different database backends by translating query syntax appropriately.

Technical challenges and solutions

Privacy protection and bot detection

Umami addresses the core problem of privacy-compliant analytics through multiple protective mechanisms: Do Not Track compliance:

Umami implements comprehensive Do Not Track (DNT) detection in the client-side tracker. The system checks multiple DNT sources:

Browser's doNotTrack property
Navigator's doNotTrack and msDoNotTrack properties
Data attribute override (data-do-not-track="true")

The tracking is disabled when any DNT signal equals 1, '1', or 'yes'. Additionally, users can manually disable tracking by setting umami.disabled in localStorage, providing granular user control over data collection.

const hasDoNotTrack = () => {
  const dnt = doNotTrack || ndnt || msdnt;
  return dnt === 1 || dnt === '1' || dnt === 'yes';
};

Bot filtering with isbot library:

The server-side API implements sophisticated bot detection using the isbot npm library. When a bot is detected through user agent analysis, the system returns a playful { beep: 'boop' } response instead of processing the analytics data.

This filtering can be disabled via the DISABLE_BOT_CHECK environment variable for testing scenarios. The bot detection occurs early in the request pipeline, preventing automated traffic from polluting analytics data. IP address handling and anonymization:

Umami implements a sophisticated IP address extraction system that supports multiple proxy headers. The system checks headers in priority order:

CloudFlare: cf-connecting-ip
Custom headers via CLIENT_IP_HEADER environment variable
Standard proxy headers: x-forwarded-for, x-real-ip, etc.

For x-forwarded-for headers, only the first IP is extracted to avoid proxy chain pollution. The system also includes IP blocking functionality through the IGNORE_IP environment variable, supporting both exact matches and CIDR notation for network ranges.

export const IP_ADDRESS_HEADERS = [
  'cf-connecting-ip',
  'x-client-ip',
  'x-forwarded-for',
  'do-connecting-ip',
  'fastly-client-ip',
  'true-client-ip',
  'x-real-ip',
  'x-cluster-client-ip',
  'x-forwarded',
  'forwarded',
  'x-appengine-user-ip',
];

//-----

export function hasBlockedIp(clientIp: string) {
  const ignoreIps = process.env.IGNORE_IP;

  if (ignoreIps) {
    const ips = [];

    if (ignoreIps) {
      ips.push(...ignoreIps.split(',').map(n => n.trim()));
    }

    return ips.find(ip => {
      if (ip === clientIp) {
        return true;
      }

      // CIDR notation
      if (ip.indexOf('/') > 0) {
        const addr = ipaddr.parse(clientIp);
        const range = ipaddr.parseCIDR(ip);

        if (addr.kind() === range[0].kind() && addr.match(range)) {
          return true;
        }
      }
    });
  }

  return false;
}

Geolocation with privacy safeguards:

The geolocation system prioritizes privacy by first checking if the IP is localhost. For legitimate IPs, it uses a hierarchical approach:

Header-based location (CloudFlare, Vercel) for faster processing
MaxMind GeoLite2 database for IP-to-location mapping when headers unavailable

The system extracts only essential geographic data (country, region, city) without storing precise coordinates.

// Database lookup
if (!global[MAXMIND]) {
  const dir = path.join(process.cwd(), 'geo');

  global[MAXMIND] = await maxmind.open(path.resolve(dir, 'GeoLite2-City.mmdb'));
}

// When the client IP is extracted from headers, sometimes the value includes a port
const cleanIp = ip?.split(':')[0];
const result = global[MAXMIND].get(cleanIp);
if (result) {
  const country = result.country?.iso_code ?? result?.registered_country?.iso_code;
  const region = result.subdivisions?.[0]?.iso_code;
  const city = result.city?.names?.en;

  return {
    country,
    region: getRegionCode(country, region),
    city,
  };
}

Minimal data collection architecture:

Umami's data collection is designed around privacy-first principles. The core payload structure collects only essential analytics data:

Website ID and screen resolution

Page title and URL (with configurable exclusions)

Language and referrer information

Optional identity for user tracking

The system supports URL sanitization through excludeSearch and excludeHash options, allowing websites to exclude sensitive query parameters or hash fragments from analytics.

Performance optimization for high-volume analytics

The platform handles scale challenges through several architectural decisions:

Session Management:

Unique session identification using UUID generation with website ID, IP address, user agent, and time-based salt
Visit expiration logic with 30-minute timeouts to accurately track user engagement sessions
Caching mechanism using JWT tokens to reduce database queries for repeated requests

const sessionSalt = hash(startOfMonth(createdAt).toUTCString());
const visitSalt = hash(startOfHour(createdAt).toUTCString());

const sessionId = id ? uuid(websiteId, id) : uuid(websiteId, ip, userAgent, sessionSalt);

// Find session
if (!clickhouse.enabled && !cache?.sessionId) {
  const session = await fetchSession(websiteId, sessionId);

  // Create a session if not found
  if (!session) {
    try {
      await createSession({
        id: sessionId,
        websiteId,
        browser,
        os,
        device,
        screen,
        language,
        country,
        region,
        city,
        distinctId: id,
      });
    } catch (e: any) {
      if (!e.message.toLowerCase().includes('unique constraint')) {
        return serverError(e);
      }
    }
  }
}

// Visit info
let visitId = cache?.visitId || uuid(sessionId, visitSalt);
let iat = cache?.iat || now;

// Expire visit after 30 minutes
if (!timestamp && now - iat > 1800) {
  visitId = uuid(sessionId, visitSalt);
  iat = now;
}

Database Query Optimization:

Dual query system supporting both relational and columnar database engines
Parallel processing for complex analytics reports across multiple data dimensions
Time-based partitioning strategies for efficient data retrieval

async function pagedQuery(
  query: string,
  queryParams: { [key: string]: any },
  pageParams: PageParams = {},
) {
  const { page = 1, pageSize, orderBy, sortDescending = false } = pageParams;
  const size = +pageSize || DEFAULT_PAGE_SIZE;
  const offset = +size * (+page - 1);
  const direction = sortDescending ? 'desc' : 'asc';

  const statements = [
    orderBy && `order by ${orderBy} ${direction}`,
    +size > 0 && `limit ${+size} offset ${+offset}`,
  ]
    .filter(n => n)
    .join('\n');

  const count = await rawQuery(`select count(*) as num from (${query}) t`, queryParams).then(
    res => res[0].num,
  );

  const data = await rawQuery(`${query}${statements}`, queryParams);

  return { data, count, page: +page, pageSize: size, orderBy };
}

Marketing attribution modeling

Umami provides sophisticated attribution analysis through configurable models:

First-click attribution for customer acquisition analysis
Last-click attribution for conversion optimization

## First Click
model AS (select e.session_id, min(we.created_at) created_at
from events e
join website_event we
on we.session_id = e.session_id
where we.website_id = {{websiteId::uuid}}
    and we.created_at between {{startDate}} and {{endDate}}
group by e.session_id)

## Last Click
model AS (select e.session_id, max(we.created_at) created_at
from events e
join website_event we
on we.session_id = e.session_id
where we.website_id = {{websiteId::uuid}}
    and we.created_at between {{startDate}} and {{endDate}}
    and we.created_at < e.max_dt
group by e.session_id)`;

Revenue attribution with currency-specific tracking

WITH events AS (
select
    we.session_id,
    max(ed.created_at) max_dt,
    sum(coalesce(cast(number_value as decimal(10,2)), cast(string_value as decimal(10,2)))) value
from event_data ed
join website_event we
on we.event_id = ed.website_event_id
  and we.website_id = ed.website_id
join (select website_event_id
      from event_data
      where website_id = {{websiteId::uuid}}
        and created_at between {{startDate}} and {{endDate}}
        and data_key ${like} '%currency%'
        and string_value = {{currency}}) currency
on currency.website_event_id = ed.website_event_id
where ed.website_id = {{websiteId::uuid}}
  and ed.created_at between {{startDate}} and {{endDate}}
  and ${column} = {{conversionStep}}
  and ed.data_key ${like} '%revenue%'
group by 1),

Paid advertising detection across multiple platforms with specific parameter through click IDs and storing them to database:
- Google Ads: gclid parameter
- Facebook/Meta: fbclid parameter
- Microsoft Ads: msclkid parameter
- TikTok Ads: ttclid parameter
- LinkedIn Ads: li_fat_id parameter
- Twitter Ads: twclid parameter
Attribution data analysis: report analyzes multiple marketing dimensions:
- Referrer domains: External websites driving traffic
- Paid advertising: Platform-specific click ID attribution
- UTM parameters: Campaign tracking across source, medium, campaign, content, and term
- Total metrics: Overall pageviews, visitors, and visits for context

The attribution results are displayed through specialized UI components that show both tabular data and pie charts for visual attribution analysis

Implementation insights and best practices

Client-side data collection strategy

The tracking implementation employs several clever techniques for comprehensive yet unobtrusive data collection:

Automatic Event Detection:

History API hooking to capture single-page application navigation without page reloads
Click tracking with automatic event data extraction from HTML attributes, eg: can add data-umami-event attributes to any element to automatically track clicks without writing JavaScript.
Before-send callbacks implements a flexible callback system that allows custom data validation and modification before events are sent to the server. This enables developers to:
- Filter sensitive data from URLs or event parameters
- Add custom metadata to all events
- Implement client-side data validation rules
- Transform event data based on business logic

Data Quality Assurance:

URL normalization with configurable search parameter and hash exclusion
- Search parameter exclusion: The system supports configurable exclusion of URL search parameters through excludeSearch options, preventing sensitive query parameters from being tracked.
- Hash fragment handling: Hash fragments can be optionally excluded via excludeHash configuration, useful for applications that use hash routing but don't want to track fragment changes.
Referrer validation to distinguish internal from external traffic sources
Domain filtering for multi-site deployments with centralized analytics

Revenue and conversion tracking

The platform handles complex e-commerce analytics through flexible event data structures:

Multi-currency support with automatic currency detection and conversion calculation
Custom event parameters for detailed transaction and user behavior analysis
Attribution modeling linking revenue events back to marketing touchpoints, that enabling businesses to:
- Track revenue by marketing channel
- Calculate return on advertising spend (ROAS)
- Analyze conversion value across different traffic sources
- Support both first-click and last-click revenue attribution models

Umami represents a comprehensive solution for privacy-focused web analytics that successfully balances detailed business intelligence with user privacy protection. The platform's multi-database architecture ensures scalability from small websites to enterprise-level deployments, while its extensible report system provides the analytical depth required for data-driven decision making.

The system's dual query implementation for both relational and columnar databases demonstrates sophisticated technical architecture that maintains consistent functionality across different performance and scale requirements. This approach ensures optimal performance whether processing thousands or millions of analytics events.

Properties

Location

Stats

Umami breakdown

System overview

Core functionality

Primary data collection

Advanced analytics reports

Architecture and technical implementation

Technology stack

Database architecture

Data structure design

Core entities:

Performance infrastructure:

Data access layer

Technical challenges and solutions

Privacy protection and bot detection

Performance optimization for high-volume analytics

Marketing attribution modeling

Implementation insights and best practices

Client-side data collection strategy

Revenue and conversion tracking

Subscribe to Dwarves Memo

Properties

Location

Stats

Command Palette

Umami breakdown

System overview

Core functionality

Primary data collection

Advanced analytics reports

Architecture and technical implementation

Technology stack

Database architecture

Data structure design

Core entities:

Performance infrastructure:

Data access layer

Technical challenges and solutions

Privacy protection and bot detection

Performance optimization for high-volume analytics

Marketing attribution modeling

Implementation insights and best practices

Client-side data collection strategy

Revenue and conversion tracking

Subscribe to Dwarves Memo