cupug accelerates standard SQL without syntax changes. The patterns below benefit most from GPU execution because they involve large scans, parallel reductions, or hash-intensive joins.
The GPU builds hash tables in HBM at up to 8 TB/s, making hash joins over large tables significantly faster than CPU execution.
-- Fact-to-dimension join: 500M orders against 10K products
SELECT
p.category,
sum(o.amount) AS total_revenue
FROM orders o
JOIN products p ON o.product_id = p.id
WHERE o.created_at >= '2025-01-01'
GROUP BY p.category
ORDER BY total_revenue DESC;
Star schemas are ideal for GPU execution. Dimension tables are small enough to fit entirely in HBM, so the GPU can probe multiple hash tables without touching storage.
-- Star schema: fact table joined to three dimension tables
SELECT
d.calendar_month,
s.store_region,
c.segment,
sum(f.revenue) AS total_revenue,
count(*) AS transaction_count
FROM sales_fact f
JOIN dim_date d ON f.date_id = d.id
JOIN dim_store s ON f.store_id = s.id
JOIN dim_customer c ON f.customer_id = c.id
WHERE d.calendar_year = 2025
AND s.country = 'US'
GROUP BY d.calendar_month, s.store_region, c.segment
ORDER BY d.calendar_month, total_revenue DESC;
GROUP BY with aggregate functions maps naturally to GPU parallel reduction. The GPU partitions rows across thousands of threads, computes partial aggregates, and merges them.
-- Time-series aggregation over a billion-row event table
SELECT
date_trunc('hour', ts) AS hour,
event_type,
count(*) AS event_count,
avg(duration_ms) AS avg_duration,
percentile_cont(0.99) WITHIN GROUP (ORDER BY duration_ms) AS p99
FROM events
WHERE ts >= '2025-06-01' AND ts < '2025-07-01'
GROUP BY 1, 2
ORDER BY 1, 2;
Window functions with PARTITION BY and ORDER BY benefit from GPU parallel sorting. Each partition is sorted independently across GPU threads.
-- 7-day rolling revenue per customer
SELECT
customer_id,
order_date,
amount,
sum(amount) OVER (
PARTITION BY customer_id
ORDER BY order_date
RANGE BETWEEN interval '7 days' PRECEDING AND CURRENT ROW
) AS rolling_7d_revenue
FROM orders
ORDER BY customer_id, order_date;
On columnar tables, cupug reads only the columns referenced by the query. Combined with a selective WHERE clause, this skips the majority of stored data.
-- High-selectivity filter on a wide columnar table (200+ columns)
-- Only ts, user_id, and amount are read from storage
SELECT user_id, sum(amount)
FROM transactions
WHERE ts >= '2025-06-01' AND ts < '2025-06-02'
GROUP BY user_id;
Graph workloads store edges as relational rows. Columnar storage is a
natural fit: the GPU reads (src, dst) pairs with parallel
scatter/gather over the adjacency structure.
CREATE TABLE edges (
src bigint NOT NULL,
dst bigint NOT NULL,
weight float8 DEFAULT 1.0
) USING cupug_tam WITH (storage = 'columnar');
A recursive CTE expresses breadth-first search in standard SQL. Each iteration joins the frontier against the edge table, and the GPU accelerates that join.
WITH RECURSIVE bfs AS (
-- Seed: starting node
SELECT dst AS node, 1 AS depth
FROM edges
WHERE src = 42
UNION
-- Expand frontier
SELECT e.dst, b.depth + 1
FROM bfs b
JOIN edges e ON e.src = b.node
WHERE b.depth < 6
)
SELECT node, min(depth) AS shortest_depth
FROM bfs
GROUP BY node
ORDER BY shortest_depth;
PageRank is a repeated join-and-aggregate over the edge table. Each iteration redistributes rank from source to destination nodes.
-- Initialize ranks
CREATE TEMP TABLE pagerank AS
SELECT node, 1.0 / (SELECT count(DISTINCT src) FROM edges) AS rank
FROM (SELECT DISTINCT src AS node FROM edges) nodes;
-- One iteration (wrap in a loop or procedure for convergence)
WITH out_degree AS (
SELECT src, count(*) AS degree FROM edges GROUP BY src
),
contributions AS (
SELECT
e.dst AS node,
sum(pr.rank / od.degree) AS contrib
FROM edges e
JOIN pagerank pr ON pr.node = e.src
JOIN out_degree od ON od.src = e.src
GROUP BY e.dst
)
UPDATE pagerank pr
SET rank = 0.15 / (SELECT count(*) FROM pagerank)
+ 0.85 * COALESCE(c.contrib, 0)
FROM contributions c
WHERE c.node = pr.node;
Prefer columnar for analytics, heap for OLTP. Columnar storage reads only the columns a query needs; heap storage supports efficient single-row updates and deletes.
Co-locate dimension tables in cupug. When dimension tables use
cupug_tam, the GPU can build hash tables in HBM and probe them
without crossing the PCIe bus.
Use EXPLAIN to confirm GPU execution. Look for CuPUG Scan
nodes in the query plan to verify the GPU path is active.
COPY over row-at-a-time INSERT for bulk loads. COPY streams
data in bulk, letting cupug batch writes efficiently. Individual
INSERT statements pay per-statement overhead.