Getting Started
Introduction
What is the Computer Use Protocol and why does it exist?
The problem
Every desktop platform speaks a different accessibility language:
| Platform | Native API | How it names a button |
|---|---|---|
| Windows | UI Automation | ControlType.Button |
| macOS | AXUIElement | AXButton |
| Linux | AT-SPI2 | ROLE_PUSH_BUTTON |
| Web | ARIA | role="button" |
Building an AI agent that works across platforms means writing and maintaining four different tree parsers, four different action executors, and four different output formats.
The solution
CUP defines one canonical schema — 59 ARIA-derived roles, 16 states, 15 actions — that maps cleanly to every platform's native accessibility API. Write your agent logic once. CUP handles the translation.
[e2] btn "Submit" 120,340 88x36 {foc} [clk]That single line tells an LLM everything it needs: element ID, role, name, position, size, state, and available actions. It costs roughly 4 tokens.
Key numbers
| Metric | Value |
|---|---|
| Roles | 59 (ARIA-derived) |
| States | 16 (only active states listed) |
| Actions | 15 canonical verbs |
| Platforms | 6 (Windows, macOS, Linux, Web, Android, iOS) |
| Token reduction | ~97% vs raw JSON |
| Format overhead | ~75% smaller in compact text |
Next steps
- Install the SDK for Python or TypeScript
- Capture your first UI tree in 30 seconds
- Learn the core concepts — trees, roles, states, actions