CUPComputer Use Protocol
Guides

Performance & Optimization

Reduce latency and token usage with scope selection, depth limits, caching, and batch actions.

Choose the right scope

The scope parameter controls how much of the screen CUP captures. Choosing the smallest sufficient scope dramatically reduces both latency and token count.

ScopeUse when...Typical tokens
overviewYou only need to know what apps are open~50–100
foregroundYou're working with the active window (most common)~500–2,000
desktopYou need desktop icons or widgets~200–500
fullYou need a specific non-foreground windowVaries
# Just need the window list? Use overview (near-instant)
windows = session.snapshot(scope="overview")

# Working with the active window? Use foreground (default)
screen = session.snapshot(scope="foreground")

# Need a background app? Filter by name
chrome = session.snapshot(scope="full", app="Chrome")
const windows = await session.snapshot({ scope: "overview" });
const screen = await session.snapshot({ scope: "foreground" });
const chrome = await session.snapshot({ scope: "full", app: "Chrome" });

Compact vs full detail

The detail parameter controls tree pruning. Compact mode removes noise nodes (empty groups, redundant wrappers) and reduces token usage by ~75%.

Detail levelWhat you getBest for
compact (default)Pruned tree — noise removed, ~75% fewer tokensAI agent workflows, most use cases
fullUnpruned tree — every node preservedDebugging, finding hidden elements
# Default — compact, great for agents
screen = session.snapshot(detail="compact")

# Full — see everything, useful for debugging
screen = session.snapshot(detail="full")
const screen = await session.snapshot({ detail: "compact" });
const screen = await session.snapshot({ detail: "full" });

Compact format achieves ~97% token reduction compared to raw JSON. A Spotify window snapshot is ~700 tokens in compact format vs ~25,000 as JSON.

Limit tree depth

For deeply nested UIs, use max_depth to cap how far CUP walks the tree. This reduces both capture time and output size.

# Only capture 3 levels deep
screen = session.snapshot(max_depth=3)

# Unlimited (default)
screen = session.snapshot(max_depth=999)
const screen = await session.snapshot({ maxDepth: 3 });

This is useful when you only need top-level navigation elements and don't care about deeply nested content.

Search the cached tree instead of re-capturing

find() searches the tree from the last snapshot() call without re-capturing. This is much faster than taking a new snapshot just to locate an element.

screen = session.snapshot()

# Fast — searches the cached tree
buttons = session.find(query="submit button")
inputs = session.find(role="textbox")
focused = session.find(state="focused")

# Slower — re-captures the entire tree
screen = session.snapshot()  # only do this if UI has changed
await session.snapshot();

// Fast — searches the cached tree
const buttons = await session.find({ query: "submit button" });
const inputs = await session.find({ role: "textbox" });
const focused = await session.find({ state: "focused" });

Use page() for clipped content

When a scrollable container has more items than shown in the compact output, use page() to read them from the cached tree instead of scrolling and re-capturing.

[e5] list "Results" @10,100 400x600
  [e6] listitem "Result 1" ...
  [e7] listitem "Result 2" ...
  ... 48 more items — page("e5") to see
# Reads from cache — no re-capture needed
page1 = session.page("e5", direction="down")
page2 = session.page("e5", direction="down")

# Only scroll + re-capture for virtual scrolling
session.action("e5", "scroll", direction="down")
screen = session.snapshot()
const page1 = await session.page("e5", { direction: "down" });
const page2 = await session.page("e5", { direction: "down" });

Batch actions to reduce round-trips

Use batch() to execute multiple actions in a single call instead of individual action + snapshot cycles:

# Instead of:
session.action("e5", "click")
session.action("e3", "type", value="hello")
session.press("tab")
session.action("e7", "type", value="world")

# Use batch:
results = session.batch([
    {"element_id": "e5", "action": "click"},
    {"element_id": "e3", "action": "type", "value": "hello"},
    {"action": "press", "keys": "tab"},
    {"element_id": "e7", "action": "type", "value": "world"},
])
# Check results — stops on first failure
for r in results:
    if not r.success:
        print(f"Failed: {r.error}")
        break
const results = await session.batch([
  { element_id: "e5", action: "click" },
  { element_id: "e3", action: "type", value: "hello" },
  { action: "press", keys: "tab" },
  { element_id: "e7", action: "type", value: "world" },
]);
for (const r of results) {
  if (!r.success) {
    console.error(`Failed: ${r.error}`);
    break;
  }
}

Token budget planning for LLM agents

When integrating CUP with LLMs (via MCP or direct API calls), token usage matters. Here's a rough guide:

ScenarioTypical tokensStrategy
Window list only50–100Use overview scope
Simple app (Calculator, Notepad)200–500Default foreground + compact
Medium app (Settings, File manager)500–1,500Default settings
Complex app (IDE, Browser)1,500–3,000Limit depth with max_depth=5
Very complex app (Excel with data)3,000–5,000+Limit depth + use find() to narrow

Tips for LLM integration:

  • Start with overview to decide which app to target
  • Use foreground scope (not full) unless you need a background app
  • Keep detail="compact" (default) — full detail wastes tokens on noise
  • Use find() to search instead of asking the LLM to parse the entire tree
  • Use page() for long lists instead of including everything in context

What's next?

On this page