Session API
The Session class — CUP's primary interface for capturing UI trees and performing actions.
Overview
The Session class wraps a platform adapter and action executor. It is the main interface for both the Python and TypeScript SDKs. Convenience functions (snapshot(), overview(), etc.) use a default session internally.
Creating a session
import cup
session = cup.Session(platform=None) # auto-detect OS
session = cup.Session(platform="web") # force web adapterimport { Session } from "computeruseprotocol";
const session = await Session.create(); // auto-detect OS
const session = await Session.create({ platform: "web" }); // force web adapter| Parameter | Type | Description |
|---|---|---|
platform | string? | Force a platform adapter ("windows", "macos", "linux", "web") or None/undefined to auto-detect |
Convenience functions
For quick scripts that don't need session management:
import cup
text = cup.snapshot() # foreground window (compact text)
text = cup.snapshot("full") # all windows
envelope = cup.snapshot_raw() # foreground window (JSON dict)
text = cup.overview() # window list onlyimport { snapshot, snapshotRaw, overview } from "computeruseprotocol";
const text = await snapshot(); // foreground window (compact text)
const text = await snapshot("full"); // all windows
const envelope = await snapshotRaw(); // foreground window (JSON object)
const text = await overview(); // window list onlysnapshot()
Capture the UI tree from the current platform.
result = session.snapshot(
scope="foreground", # overview | foreground | desktop | full
app=None, # window title filter (full scope only)
max_depth=999, # max tree depth
compact=True, # True = text, False = dict
detail="compact", # compact | full
)const result = await session.snapshot({
scope: "foreground", // overview | foreground | desktop | full
app: undefined, // window title filter (full scope only)
maxDepth: 999, // max tree depth
compact: true, // true = text, false = object
detail: "compact", // compact | full
});| Parameter | Type | Default | Description |
|---|---|---|---|
scope | string | "foreground" | Capture scope. See Capture Scopes |
app | string? | None | Window title filter (only with full scope) |
max_depth | int | 999 | Maximum tree depth |
compact | bool | true | Return compact text (true) or JSON envelope (false) |
detail | string | "compact" | "compact" applies pruning; "full" preserves all nodes |
action()
Execute an action on an element from the last snapshot.
result = session.action("e14", "click")
result = session.action("e5", "type", value="hello world")
result = session.action("e9", "scroll", direction="down")
result = session.action("e3", "setvalue", value="42")const result = await session.action("e14", "click");
const result = await session.action("e5", "type", { value: "hello world" });
const result = await session.action("e9", "scroll", { direction: "down" });
const result = await session.action("e3", "setvalue", { value: "42" });For the full list of all 15 element-level actions (plus session-level press and wait), see the Actions Reference.
Action results
Every action returns an ActionResult:
@dataclass
class ActionResult:
success: bool # whether the action succeeded
message: str # human-readable description
error: str | None # error details if failedinterface ActionResult {
success: boolean; // whether the action succeeded
message: string; // human-readable description
error?: string; // error details if failed
}press()
Send a keyboard shortcut to the focused window. This is a session-level action that does not target a specific element.
session.press("ctrl+s") # save
session.press("ctrl+shift+p") # command palette
session.press("enter") # confirm
session.press("escape") # cancel
session.press("alt+f4") # close windowawait session.press("ctrl+s"); // save
await session.press("ctrl+shift+p"); // command palette
await session.press("enter"); // confirm
await session.press("escape"); // cancel
await session.press("alt+f4"); // close windowopen_app() / openApp()
Open an application by name with fuzzy matching. Waits for the app window to appear.
session.open_app("chrome") # fuzzy match
session.open_app("code") # opens VS Code
session.open_app("notepad")await session.openApp("chrome"); // fuzzy match
await session.openApp("code"); // opens VS Code
await session.openApp("notepad");find()
Search the last captured tree for elements matching criteria. Returns results sorted by relevance.
results = session.find(query="play button")
results = session.find(role="textbox", state="focused")
results = session.find(name="Submit")const results = session.find({ query: "play button" });
const results = session.find({ role: "textbox", state: "focused" });
const results = session.find({ name: "Submit" });| Parameter | Type | Description |
|---|---|---|
query | string? | Freeform semantic query (e.g., "play button") |
role | string? | Role filter (supports synonyms) |
name | string? | Fuzzy name match |
state | string? | Exact state match |
limit | int | Max results (default 5) |
find() searches the last captured tree, including elements pruned from the compact output. Call snapshot() first.
batch()
Execute a sequence of actions with optional waits between them.
results = session.batch([
{"element_id": "e3", "action": "click"},
{"action": "wait", "ms": 500},
{"element_id": "e7", "action": "type", "value": "hello"},
{"action": "press", "keys": "enter"},
])const results = await session.batch([
{ element_id: "e3", action: "click" },
{ action: "wait", ms: 500 },
{ element_id: "e7", action: "type", value: "hello" },
{ action: "press", keys: "enter" },
]);screenshot()
Capture a screenshot of the screen as a PNG image.
png_bytes = session.screenshot()
png_bytes = session.screenshot(region={"x": 100, "y": 200, "w": 800, "h": 600})const pngBuffer = await session.screenshot();
const pngBuffer = await session.screenshot({ region: { x: 100, y: 200, w: 800, h: 600 } });Python requires the screenshot extra: pip install computeruseprotocol[screenshot]