cupug

Query Patterns

cupug accelerates standard SQL without syntax changes. The patterns below benefit most from GPU execution because they involve large scans, parallel reductions, or hash-intensive joins.

GPU-Optimized Joins

Hash Joins

The GPU builds hash tables in HBM at up to 8 TB/s, making hash joins over large tables significantly faster than CPU execution.

-- Fact-to-dimension join: 500M orders against 10K products
SELECT
    p.category,
    sum(o.amount) AS total_revenue
FROM orders o
JOIN products p ON o.product_id = p.id
WHERE o.created_at >= '2025-01-01'
GROUP BY p.category
ORDER BY total_revenue DESC;

Star Schema Joins

Star schemas are ideal for GPU execution. Dimension tables are small enough to fit entirely in HBM, so the GPU can probe multiple hash tables without touching storage.

-- Star schema: fact table joined to three dimension tables
SELECT
    d.calendar_month,
    s.store_region,
    c.segment,
    sum(f.revenue)     AS total_revenue,
    count(*)           AS transaction_count
FROM sales_fact f
JOIN dim_date     d ON f.date_id     = d.id
JOIN dim_store    s ON f.store_id    = s.id
JOIN dim_customer c ON f.customer_id = c.id
WHERE d.calendar_year = 2025
  AND s.country = 'US'
GROUP BY d.calendar_month, s.store_region, c.segment
ORDER BY d.calendar_month, total_revenue DESC;

Bulk Analytical Queries

Aggregations

GROUP BY with aggregate functions maps naturally to GPU parallel reduction. The GPU partitions rows across thousands of threads, computes partial aggregates, and merges them.

-- Time-series aggregation over a billion-row event table
SELECT
    date_trunc('hour', ts) AS hour,
    event_type,
    count(*)               AS event_count,
    avg(duration_ms)       AS avg_duration,
    percentile_cont(0.99) WITHIN GROUP (ORDER BY duration_ms) AS p99
FROM events
WHERE ts >= '2025-06-01' AND ts < '2025-07-01'
GROUP BY 1, 2
ORDER BY 1, 2;

Window Functions

Window functions with PARTITION BY and ORDER BY benefit from GPU parallel sorting. Each partition is sorted independently across GPU threads.

-- 7-day rolling revenue per customer
SELECT
    customer_id,
    order_date,
    amount,
    sum(amount) OVER (
        PARTITION BY customer_id
        ORDER BY order_date
        RANGE BETWEEN interval '7 days' PRECEDING AND CURRENT ROW
    ) AS rolling_7d_revenue
FROM orders
ORDER BY customer_id, order_date;

Filtered Scans

On columnar tables, cupug reads only the columns referenced by the query. Combined with a selective WHERE clause, this skips the majority of stored data.

-- High-selectivity filter on a wide columnar table (200+ columns)
-- Only ts, user_id, and amount are read from storage
SELECT user_id, sum(amount)
FROM transactions
WHERE ts >= '2025-06-01' AND ts < '2025-06-02'
GROUP BY user_id;

Graph Analytics

Sparse Edge Tables

Graph workloads store edges as relational rows. Columnar storage is a natural fit: the GPU reads (src, dst) pairs with parallel scatter/gather over the adjacency structure.

CREATE TABLE edges (
    src    bigint NOT NULL,
    dst    bigint NOT NULL,
    weight float8 DEFAULT 1.0
) USING cupug_tam WITH (storage = 'columnar');

BFS / Shortest Path

A recursive CTE expresses breadth-first search in standard SQL. Each iteration joins the frontier against the edge table, and the GPU accelerates that join.

WITH RECURSIVE bfs AS (
    -- Seed: starting node
    SELECT dst AS node, 1 AS depth
    FROM edges
    WHERE src = 42

    UNION

    -- Expand frontier
    SELECT e.dst, b.depth + 1
    FROM bfs b
    JOIN edges e ON e.src = b.node
    WHERE b.depth < 6
)
SELECT node, min(depth) AS shortest_depth
FROM bfs
GROUP BY node
ORDER BY shortest_depth;

PageRank via Iterative SQL

PageRank is a repeated join-and-aggregate over the edge table. Each iteration redistributes rank from source to destination nodes.

-- Initialize ranks
CREATE TEMP TABLE pagerank AS
SELECT node, 1.0 / (SELECT count(DISTINCT src) FROM edges) AS rank
FROM (SELECT DISTINCT src AS node FROM edges) nodes;

-- One iteration (wrap in a loop or procedure for convergence)
WITH out_degree AS (
    SELECT src, count(*) AS degree FROM edges GROUP BY src
),
contributions AS (
    SELECT
        e.dst AS node,
        sum(pr.rank / od.degree) AS contrib
    FROM edges e
    JOIN pagerank pr   ON pr.node = e.src
    JOIN out_degree od ON od.src  = e.src
    GROUP BY e.dst
)
UPDATE pagerank pr
SET rank = 0.15 / (SELECT count(*) FROM pagerank)
         + 0.85 * COALESCE(c.contrib, 0)
FROM contributions c
WHERE c.node = pr.node;

Performance Tips

Prefer columnar for analytics, heap for OLTP. Columnar storage reads only the columns a query needs; heap storage supports efficient single-row updates and deletes.
Co-locate dimension tables in cupug. When dimension tables use cupug_tam, the GPU can build hash tables in HBM and probe them without crossing the PCIe bus.
Use EXPLAIN to confirm GPU execution. Look for CuPUG Scan nodes in the query plan to verify the GPU path is active.
COPY over row-at-a-time INSERT for bulk loads. COPY streams data in bulk, letting cupug batch writes efficiently. Individual INSERT statements pay per-statement overhead.