CUPComputer Use Protocol
Core Concepts

Compact Format

The LLM-optimized text encoding that achieves ~97% token reduction vs JSON.

Overview

The compact format is CUP's primary output format, designed for LLM context windows. It encodes the full UI tree in a concise, human-readable text format that achieves approximately 97% fewer tokens than the equivalent JSON representation.

Line format

Each element is encoded as a single line:

[id] role "name" x,y wxh {states} [actions] val="value" (attrs)

Example:

[e14] btn "Submit" 120,340 88x36 {foc} [clk]
PartMeaning
[e14]Element ID
btnRole short code (button)
"Submit"Accessible name
120,340Position (x, y)
88x36Size (width x height)
{foc}Active states (focused)
[clk]Available actions (click)

Tree hierarchy

Indentation (2 spaces per level) indicates parent-child relationships:

[e0] win "Spotify" 120,40 1680x1020
  [e1] doc "Spotify" 120,40 1680x1020
    [e2] btn "Back" 132,52 32x32 [clk]
    [e3] btn "Forward" 170,52 32x32 {dis} [clk]
    [e7] nav "Main" 120,88 240x972
      [e8] lnk "Home" 132,100 216x40 {sel} [clk]
      [e9] lnk "Search" 132,148 216x40 [clk]

Every compact output starts with a metadata header:

# CUP 0.1.0 | windows | 2560x1440
# app: Spotify
# 63 nodes (280 before pruning)

This tells the agent which protocol version, platform, screen dimensions, and application it's looking at, plus how aggressively the tree was pruned.

Optional fields

Fields are only included when present — no empty brackets or null values:

  • Bounds — only for interactable elements (saves tokens for static containers)
  • States — only when at least one state is active
  • Actions — only when the element supports interactions
  • Value — only for inputs/sliders with a current value: val="hello"
  • Attributes — level, placeholder, orientation, range: L2, ph="Enter email", h, range=0..100

Pruning rules

The compact pipeline applies 11 pruning rules to reduce noise:

  1. Skip chrome/decorative — scrollbar, separator, titlebar, tooltip, status
  2. Skip zero-size — elements with no visible bounds
  3. Hoist unnamed generic — remove wrapper generic nodes, promote children
  4. Hoist unnamed region — same for region wrappers
  5. Hoist unnamed group — same for group without meaningful actions
  6. Skip unnamed images — decorative images without alt text
  7. Skip empty text — text nodes with no content
  8. Skip redundant labels — text that duplicates its parent's name
  9. Skip offscreen non-interactive — elements outside viewport with no actions
  10. Collapse single-child — remove structural containers with only one child
  11. Drop focus action — focus is rarely useful for agents

Use detail="full" to disable all pruning and get every node in the tree.

Token comparison

For a typical complex application (Spotify):

FormatTokensReduction
Raw JSON (unpruned)~12,000
CUP JSON (pruned)~3,00075%
CUP Compact~35097%

On this page