CUPComputer Use Protocol
Getting Started

Introduction

What is the Computer Use Protocol and why does it exist?

The problem

Every desktop platform speaks a different accessibility language:

PlatformNative APIHow it names a button
WindowsUI AutomationControlType.Button
macOSAXUIElementAXButton
LinuxAT-SPI2ROLE_PUSH_BUTTON
WebARIArole="button"

Building an AI agent that works across platforms means writing and maintaining four different tree parsers, four different action executors, and four different output formats.

The solution

CUP defines one canonical schema — 59 ARIA-derived roles, 16 states, 15 actions — that maps cleanly to every platform's native accessibility API. Write your agent logic once. CUP handles the translation.

[e2] btn "Submit" 120,340 88x36 {foc} [clk]

That single line tells an LLM everything it needs: element ID, role, name, position, size, state, and available actions. It costs roughly 4 tokens.

Key numbers

MetricValue
Roles59 (ARIA-derived)
States16 (only active states listed)
Actions15 canonical verbs
Platforms6 (Windows, macOS, Linux, Web, Android, iOS)
Token reduction~97% vs raw JSON
Format overhead~75% smaller in compact text

Next steps

  1. Install the SDK for Python or TypeScript
  2. Capture your first UI tree in 30 seconds
  3. Learn the core concepts — trees, roles, states, actions

On this page