Performance & Optimization

Choose the right scope

The scope parameter controls how much of the screen CUP captures. Choosing the smallest sufficient scope dramatically reduces both latency and token count.

Scope	Use when...	Typical tokens
`overview`	You only need to know what apps are open	~50–100
`foreground`	You're working with the active window (most common)	~500–2,000
`desktop`	You need desktop icons or widgets	~200–500
`full`	You need a specific non-foreground window	Varies

# Just need the window list? Use overview (near-instant)
windows = session.snapshot(scope="overview")

# Working with the active window? Use foreground (default)
screen = session.snapshot(scope="foreground")

# Need a background app? Filter by name
chrome = session.snapshot(scope="full", app="Chrome")

const windows = await session.snapshot({ scope: "overview" });
const screen = await session.snapshot({ scope: "foreground" });
const chrome = await session.snapshot({ scope: "full", app: "Chrome" });

Compact vs full detail

The detail parameter controls tree pruning. Compact mode removes noise nodes (empty groups, redundant wrappers) and reduces token usage by ~75%.

Detail level	What you get	Best for
`compact` (default)	Pruned tree — noise removed, ~75% fewer tokens	AI agent workflows, most use cases
`full`	Unpruned tree — every node preserved	Debugging, finding hidden elements

# Default — compact, great for agents
screen = session.snapshot(detail="compact")

# Full — see everything, useful for debugging
screen = session.snapshot(detail="full")

const screen = await session.snapshot({ detail: "compact" });
const screen = await session.snapshot({ detail: "full" });

Compact format achieves ~97% token reduction compared to raw JSON. A Spotify window snapshot is ~700 tokens in compact format vs ~25,000 as JSON.

Limit tree depth

For deeply nested UIs, use max_depth to cap how far CUP walks the tree. This reduces both capture time and output size.

# Only capture 3 levels deep
screen = session.snapshot(max_depth=3)

# Unlimited (default)
screen = session.snapshot(max_depth=999)

const screen = await session.snapshot({ maxDepth: 3 });

This is useful when you only need top-level navigation elements and don't care about deeply nested content.

Search the cached tree instead of re-capturing

find() searches the tree from the last snapshot() call without re-capturing. This is much faster than taking a new snapshot just to locate an element.

screen = session.snapshot()

# Fast — searches the cached tree
buttons = session.find(query="submit button")
inputs = session.find(role="textbox")
focused = session.find(state="focused")

# Slower — re-captures the entire tree
screen = session.snapshot()  # only do this if UI has changed

await session.snapshot();

// Fast — searches the cached tree
const buttons = await session.find({ query: "submit button" });
const inputs = await session.find({ role: "textbox" });
const focused = await session.find({ state: "focused" });

Use page() for clipped content

When a scrollable container has more items than shown in the compact output, use page() to read them from the cached tree instead of scrolling and re-capturing.

[e5] list "Results" @10,100 400x600
  [e6] listitem "Result 1" ...
  [e7] listitem "Result 2" ...
  ... 48 more items — page("e5") to see

# Reads from cache — no re-capture needed
page1 = session.page("e5", direction="down")
page2 = session.page("e5", direction="down")

# Only scroll + re-capture for virtual scrolling
session.action("e5", "scroll", direction="down")
screen = session.snapshot()

const page1 = await session.page("e5", { direction: "down" });
const page2 = await session.page("e5", { direction: "down" });

Batch actions to reduce round-trips

Use batch() to execute multiple actions in a single call instead of individual action + snapshot cycles:

# Instead of:
session.action("e5", "click")
session.action("e3", "type", value="hello")
session.press("tab")
session.action("e7", "type", value="world")

# Use batch:
results = session.batch([
    {"element_id": "e5", "action": "click"},
    {"element_id": "e3", "action": "type", "value": "hello"},
    {"action": "press", "keys": "tab"},
    {"element_id": "e7", "action": "type", "value": "world"},
])
# Check results — stops on first failure
for r in results:
    if not r.success:
        print(f"Failed: {r.error}")
        break

const results = await session.batch([
  { element_id: "e5", action: "click" },
  { element_id: "e3", action: "type", value: "hello" },
  { action: "press", keys: "tab" },
  { element_id: "e7", action: "type", value: "world" },
]);
for (const r of results) {
  if (!r.success) {
    console.error(`Failed: ${r.error}`);
    break;
  }
}

Token budget planning for LLM agents

When integrating CUP with LLMs (via MCP or direct API calls), token usage matters. Here's a rough guide:

Scenario	Typical tokens	Strategy
Window list only	50–100	Use `overview` scope
Simple app (Calculator, Notepad)	200–500	Default `foreground` + `compact`
Medium app (Settings, File manager)	500–1,500	Default settings
Complex app (IDE, Browser)	1,500–3,000	Limit depth with `max_depth=5`
Very complex app (Excel with data)	3,000–5,000+	Limit depth + use `find()` to narrow

Tips for LLM integration:

Start with overview to decide which app to target
Use foreground scope (not full) unless you need a background app
Keep detail="compact" (default) — full detail wastes tokens on noise
Use find() to search instead of asking the LLM to parse the entire tree
Use page() for long lists instead of including everything in context

What's next?

Quick reference — all actions, scopes, and roles at a glance
Compact format — how pruning and serialization work
Scopes — detailed scope documentation

On this page