CUPComputer Use Protocol
Guides

Quick Reference

One-page cheat sheet — session lifecycle, all actions, scopes, roles, and compact format at a glance.

Session lifecycle

1. Create      session = Session()          / await Session.create()
2. Capture     session.snapshot()           → compact text with element IDs
3. Find        session.find(query="...")     → ranked matches from cached tree
4. Act         session.action("eX", "...")   → ActionResult { success, message }
5. Re-capture  session.snapshot()           → fresh IDs (old ones are invalid)

All 15 actions

ActionParametersDescription
clickClick/invoke an element
doubleclickDouble-click
rightclickOpen context menu
longpressLong-press (touch/mobile)
typevalueType text into a field
setvaluevalueSet value programmatically (slider, spinbutton, text)
toggleToggle a checkbox or switch
scrolldirectionScroll a container (up / down / left / right)
selectSelect item in list, tree, or tab
expandExpand a collapsed element
collapseCollapse an expanded element
incrementIncrement a slider or spinbutton
decrementDecrement a slider or spinbutton
focusMove keyboard focus to element
dismissDismiss a dialog or popup

Session-level (no element ID):

ActionParametersDescription
presskeysSend keyboard shortcut (ctrl+s, enter, alt+f4)
open_appnameOpen app by name (fuzzy matched)

Scopes

ScopeWhat it capturesSpeed
overviewWindow list only (titles, PIDs, bounds)Near-instant
foregroundActive window tree + window list headerDefault, fast
desktopDesktop surface (icons, widgets)Fast
fullAll windows (use app param to filter)Slower

Compact format syntax

Each line in compact output represents one UI element:

[id] role "name" @x,y wxh {states} [actions] val="value"
PartExampleMeaning
[e14]Element IDUse for action() calls
buttonRoleARIA-derived role (59 total)
"Submit"NameAccessible name
@100,200PositionTop-left x,y in screen pixels
300x40SizeWidth x height
{focused,checked}StatesActive states
[click,toggle]ActionsAvailable actions
val="75"ValueCurrent value (sliders, inputs)

Indentation shows parent-child hierarchy. Two spaces = one level deeper.

Role categories

Common key combos for press()

KeysAction
enterConfirm / submit
escapeCancel / close
tabNext focus
shift+tabPrevious focus
ctrl+aSelect all
ctrl+cCopy
ctrl+vPaste
ctrl+zUndo
ctrl+sSave
ctrl+wClose tab
alt+f4Close window
ctrl+shift+pCommand palette (VS Code)

Quick snippets

import cup

# Quick capture
print(cup.snapshot())

# Window list
print(cup.overview())

# Full session workflow
s = cup.Session()
s.snapshot()
hits = s.find(query="Save button")
if hits:
    r = s.action(hits[0]["id"], "click")
    print(r.success, r.message)

# Keyboard shortcut
s.press("ctrl+shift+p")

# Open app
s.open_app("calculator")

# Screenshot
png = s.screenshot()
import { Session, snapshot, overview } from "computeruseprotocol";

// Quick capture
console.log(await snapshot());

// Window list
console.log(await overview());

// Full session workflow
const s = await Session.create();
await s.snapshot();
const hits = await s.find({ query: "Save button" });
if (hits.length > 0) {
  const r = await s.action(hits[0].id, "click");
  console.log(r.success, r.message);
}

// Keyboard shortcut
await s.press("ctrl+shift+p");

// Open app
await s.openApp("calculator");

// Screenshot
const png = await s.screenshot();

On this page