CUPComputer Use Protocol
Getting Started

Quick Start

Capture your first UI tree and perform an action in 30 seconds.

Your first snapshot

Capture the foreground window

import cup

screen = cup.snapshot()
print(screen)
import { snapshot } from "computeruseprotocol";

const screen = await snapshot();
console.log(screen);

This prints the current foreground window's UI tree in compact format.

Find an element

session = cup.Session()

# Capture the tree
screen = session.snapshot()

# Search for a button
results = session.find(query="submit button")
print(results)
import { Session } from "computeruseprotocol";

const session = await Session.create();

// Capture the tree
const screen = await session.snapshot();

// Search for a button
const results = await session.find({ query: "submit button" });
console.log(results);

Perform an action

# Click the element
result = session.action("e14", "click")
print(result)  # ActionResult(success=True, message="Clicked Submit")

# Type into a field
session.action("e5", "type", value="hello world")

# Press a keyboard shortcut
session.press("ctrl+s")
// Click the element
const result = await session.action("e14", "click");
console.log(result);  // { success: true, message: "Clicked Submit" }

// Type into a field
await session.action("e5", "type", { value: "hello world" });

// Press a keyboard shortcut
await session.press("ctrl+s");

The typical agent workflow

Most AI agents follow a simple loop:

1. snapshot()  → capture the current UI state
2. find()      → locate the target element
3. action()    → interact with it
4. snapshot()  → verify the result

Each snapshot() returns fresh element IDs. Always re-capture after performing actions, since the UI tree may have changed.

What's next?

On this page