# Puras — full documentation

> Single-file dump of every doc page on https://puras.co. Pages are ordered by category, then by their declared `order`. Each page keeps its original `## ...` headings.

---

# Overview

Source: https://puras.co/docs/overview
Category: Getting Started

> What Puras is and the moving parts you'll touch as a developer.

Puras is a multi-tenant **agentic Backend-as-a-Service**. You write **skills**, bundle them as a project, push the bundle, and invoke them as **jobs** from any client with a project API key.

A skill is just a directory with a `skill.yaml`. Its entrypoint decides how it runs:

- **`entrypoint: SKILL.md`** (any `.md` file) — the file is loaded as the system prompt and an LLM tool-use loop runs with the tools you declare. Agentic.
- **`entrypoint: main.py:run`** — the worker imports the module and calls the function in an isolated subprocess. Deterministic pipeline, no LLM in the loop.

Same submission API, same billing, same observability. The only difference is what runs inside the worker.

## When to reach for Puras

- You want an agent to run server-side (long-running, retryable, observable) rather than in the user's browser.
- You want a deterministic Python pipeline that can call models, save files to project storage, and bill against project credit — without writing infra.
- You're building from inside Claude Code or Cursor and want MCP tools to deploy and observe live.

## Moving parts

| Piece | What it does |
|---|---|
| **API** (FastAPI on Fly) | Projects, API keys, deployments, jobs, secrets, drive, billing. |
| **Worker** (Python on Fly) | Claims queued jobs, runs the agent tool-use loop or the deterministic Python subprocess, writes results. |
| **MCP server** (`purasbackend-mcp`) | Local stdio server. Claude Code calls it to push deployments, submit jobs, tail events. |
| **Frontend** (Next.js) | Dashboard for projects, deployments, jobs, usage, API keys. |
| **Supabase** | Postgres + Storage (skill bundles, drive). |

## The project-as-unit deployment model

A **deployment** is a single zip of your whole project, not a per-skill push. The bundle contains a `skills/` directory; each immediate child folder with a `skill.yaml` is auto-discovered as a skill. There is **no root manifest file** — adding a skill = creating a directory. Activating a new deployment is a rolling switch — in-flight jobs keep running on their original version. See [[concepts]] for the full model.

## Money

Pricing unit is **MICROS** (1 USD = 1,000,000 micros). Upstream cost (LLM tokens and media generation) is multiplied by `PURAS_MARGIN_PCT` (default 20%). Jobs only claim if `projects.credit_balance_micros > 0`. See [[concepts]] for the marketplace billing model.

## Where to go next

- [[quickstart]] — wire up your first job in five minutes.
- [[example-project]] — a complete starter (skill.yaml + agentic skill + deterministic skill + app snippet) you can copy.
- [[concepts]] — projects, deployments, skills, jobs, billing, drive, secrets.
- [[mcp-tools]] — every tool this MCP server exposes.
- [[sdk-media]] — `from puras import media` for image/video/TTS inside skills.
- [[inputs-and-drive]] — how apps upload files and how skills read them (drive_path / url / base64).
- [[agent-attachments]] — sending images/PDFs to agentic skills, and the `file_read` tool.

---

# Quickstart

Source: https://puras.co/docs/quickstart
Category: Getting Started

> From zero to a running job in five minutes, end-to-end.

This walks you (or the AI agent driving your editor) from a fresh checkout to a job that returns a result.

## 0. Prerequisites

- A Puras project. Create one in the dashboard, then create an API key — it's shown **once** in the format `puras_live_<prefix8>.<secret32>`. The dot between prefix and secret is load-bearing; never strip it.
- The MCP server installed and registered:
  ```bash
  cd mcp && pip install -e .
  claude mcp add purasbackend purasbackend-mcp
  ```

## 1. Configure the MCP

In Claude Code, call `configure` once. The credentials are written to `~/.purasbackend/config.json` (chmod 600).

```
configure(
  api_base="https://puras-api.fly.dev",
  api_key="puras_live_AbCdEfGh.SecretSecretSecretSecretSecre32",
  project_id="<your-project-uuid>",
)
```

## 2. Lay out a project

A project is a directory with a `skills/` folder. Every immediate child of `skills/` that contains a `skill.yaml` is auto-discovered as a skill — there is no root manifest.

```
my-project/
  skills/
    hello/
      skill.yaml
      SKILL.md            # system prompt — agentic entrypoint
    echo/
      skill.yaml
      main.py             # deterministic entrypoint
```

`skills/hello/skill.yaml` (agentic — entrypoint is a `.md` file):

```yaml
description: Answer the user's question briefly.
entrypoint: SKILL.md
input_schema:
  type: object
  properties:
    prompt: { type: string }
  required: [prompt]
output_schema:
  type: object
  properties:
    answer: { type: string }
  required: [answer]
```

`skills/hello/SKILL.md`:

```markdown
You are a helpful assistant. Answer the user's question briefly.
When you are done, call the `set_output` tool with `{ "answer": "<your reply>" }`.
```

`skills/echo/skill.yaml` (deterministic — entrypoint is `<file>.py:<func>`):

```yaml
description: Echo the input back unchanged.
entrypoint: main.py:run
input_schema:
  type: object
  properties:
    text: { type: string }
  required: [text]
output_schema:
  type: object
  properties:
    echoed: { type: string }
  required: [echoed]
```

`skills/echo/main.py`:

```python
def run(inputs: dict) -> dict:
    return {"echoed": inputs["text"]}
```

Skill names must match `^[a-z0-9][a-z0-9_-]*$`. The directory name **is** the skill name — `skills/hello/` registers a skill called `hello`.

## 3. Push the bundle

From Claude Code:

```
push(project_dir="/abs/path/to/my-project", notes="first cut", activate=true)
```

This zips the directory (excluding `.git`, `.venv`, `node_modules`, …), uploads it as a new deployment, and makes it the active one. In-flight jobs keep running on the previous deployment.

## 4. Submit a job

One endpoint, one body shape, regardless of whether the skill is agentic or deterministic:

```bash
# async — fire-and-forget, returns {id, status:"queued"} immediately
curl -X POST $API/v1/jobs -H "Authorization: Bearer $KEY" \
  -d '{"skill":"hello","inputs":{"prompt":"hi"}}'

# sync — block until done (or up to `timeout` seconds)
curl -X POST "$API/v1/jobs?wait=true&timeout=30" -H "Authorization: Bearer $KEY" \
  -d '{"skill":"echo","inputs":{"text":"hi"}}'

# stream — SSE of every tool call / model response as the worker emits it
curl -N -X POST "$API/v1/jobs?stream=true" -H "Authorization: Bearer $KEY" \
  -d '{"skill":"hello","inputs":{"prompt":"hi"}}'
```

The worker reads the skill's `skill.yaml` and dispatches to the agent loop (`.md` entrypoint) or the deterministic runner (`.py:func` entrypoint) automatically.

From the MCP convenience wrappers:

```
submit_job(skill="echo", inputs={"text": "hi"}, wait=true, timeout=10)
submit_job(skill="hello", inputs={"prompt": "What is Puras?"})
```

## 5. Observe

```
list_jobs(limit=10)
get_job(job_id="...")
tail_job(job_id="...", max_seconds=60)   # streams events until terminal
```

Status transitions: `queued → running → succeeded | failed | cancelled`.

## What you've got now

- A project with one agentic skill and one deterministic skill.
- A live deployment you can roll back via `list_deployments` + `activate_deployment`.
- Jobs you can submit from any HTTP client with the same API key (the MCP is a convenience, not a requirement).

Next: [[example-project]] for a complete starter you can copy as the shape of a real project, [[inputs-and-drive]] for passing files into jobs, [[concepts]] for the deeper model, [[sdk-media]] for image/video/TTS inside skills, [[mcp-tools]] for the full tool list.

---

# Example project

Source: https://puras.co/docs/example-project
Category: Getting Started

> A complete worked project — two skills (one deterministic, one agentic) + frontend snippet — you can copy as a starter.

A minimal but realistic project: an app uploads an image, a **deterministic skill** returns its dimensions and a downscaled thumbnail, and an **agentic skill** writes a one-line caption by looking at the photo directly via vision. Use this as the shape to copy when you start a new Puras project — `push` it as-is and it works.

## Layout

```
image-tools/
  requirements.txt              # optional; worker pip-installs into a per-deployment venv
  skills/
    image-info/                 # deterministic skill (Python entrypoint)
      skill.yaml
      main.py
    caption/                    # agentic skill (markdown entrypoint = system prompt)
      skill.yaml
      SKILL.md
```

No root manifest. Each `skills/<name>/skill.yaml` is auto-discovered; the directory name **is** the skill name.

## requirements.txt

```
pillow>=10
```

(The worker reads `requirements.txt` from the bundle root and installs it into the deployment's venv before any job runs. Skip the file if your skills only need the stdlib + `puras`.)

## skills/image-info/skill.yaml

```yaml
description: Return image dimensions and write a 512px thumbnail back into the drive.
entrypoint: main.py:run
input_schema:
  type: object
  properties:
    image:
      oneOf:
        - { type: string }
        - { type: object }
  required: [image]
output_schema:
  type: object
  properties:
    width:  { type: integer }
    height: { type: integer }
    format: { type: string }
    thumb:
      type: object
      properties:
        drive_path: { type: string }
      required: [drive_path]
  required: [width, height, format, thumb]
```

The `.py:func` entrypoint tells the worker this skill is deterministic — no LLM in the loop. The function runs in an isolated subprocess.

## skills/image-info/main.py

```python
"""Return image dimensions and write a 512px thumbnail back into the drive."""
from PIL import Image

from puras import load_path


def run(inputs: dict) -> dict:
    src = load_path(inputs["image"], suffix=".img")
    with Image.open(src) as im:
        width, height, fmt = im.width, im.height, im.format

        im.thumbnail((512, 512))
        out_rel = f"thumbs/{src.stem}.jpg"
        im.convert("RGB").save(f"drive/{out_rel}", "JPEG", quality=85)

    return {
        "width": width,
        "height": height,
        "format": fmt,
        "thumb": {"drive_path": out_rel},
    }
```

The skill works whether the caller sent `{image: {drive_path}}`, `{image: {url}}`, `{image: {data: "data:..."}}`, or a bare string. `load_path` does the routing; you write the file logic once. See [[inputs-and-drive]] for the full taxonomy.

## skills/caption/skill.yaml

```yaml
description: Write a short caption for a product photo.
entrypoint: SKILL.md
model: claude/sonnet-4-7
input_schema:
  type: object
  properties:
    prompt: { type: string }
    attachments:
      type: array
      items: { type: object }
      minItems: 1
      maxItems: 1
  required: [attachments]
output_schema:
  type: object
  properties:
    caption: { type: string }
  required: [caption]
```

The `.md` entrypoint tells the worker this skill is agentic — the file's contents become the system prompt and the LLM tool-use loop runs. Because `output_schema` is set, the agent gets an auto-injected `set_output` tool and must call it once with `{ "caption": "..." }` to finish.

## skills/caption/SKILL.md

```markdown
You write short, punchy captions for product photos.

You'll receive the photo as an attachment in the first user message (you can
see it directly — no tool calls needed to "open" it) and a `tone` hint in the
prompt text (e.g. "playful", "minimal", "luxury"). Default tone is "minimal".

Reply with exactly one caption, 8–14 words, in the requested tone. No
preamble, no quotes, no markdown.
```

That's the whole skill. The agent natively sees the image via the attachments mechanism — see [[agent-attachments]] for the wire format and the supported file types.

## Pushing it

```
push(project_dir="/abs/path/to/image-tools", notes="starter", activate=true)
```

## Calling it from an app

```js
const API_BASE = "https://puras-api.fly.dev";
const KEY = "puras_live_AbCdEfGh.SecretSecretSecretSecretSecre32";

async function analyze(file /* a browser File */) {
  // 1) Upload once.
  const fd = new FormData(); fd.append("file", file);
  const up = await fetch(`${API_BASE}/v1/drive/upload`, {
    method: "POST",
    headers: { Authorization: `Bearer ${KEY}` },
    body: fd,
  }).then(r => r.json());
  // up: { drive_path, full_path, signed_url, bytes, content_type }

  // 2) Fast deterministic skill — wait inline.
  const info = await fetch(`${API_BASE}/v1/jobs?wait=true&timeout=20`, {
    method: "POST",
    headers: { Authorization: `Bearer ${KEY}`, "Content-Type": "application/json" },
    body: JSON.stringify({
      skill: "image-info",
      inputs: { image: { drive_path: up.drive_path } },
    }),
  }).then(r => r.json());
  // info.result → { width, height, format, thumb: { drive_path } }

  // 3) Kick off the agentic caption skill — attach the photo natively.
  const cap = await fetch(`${API_BASE}/v1/jobs`, {
    method: "POST",
    headers: { Authorization: `Bearer ${KEY}`, "Content-Type": "application/json" },
    body: JSON.stringify({
      skill: "caption",
      inputs: {
        prompt: "Tone: playful. Write the caption.",
        attachments: [{ drive_path: up.drive_path }],
      },
    }),
  }).then(r => r.json());
  // Poll GET /v1/jobs/{cap.id} or stream GET /v1/jobs/{cap.id}/stream

  return { info: info.result, captionJobId: cap.id };
}
```

Same submit body shape for both skills — no `type` field, no distinction at the call site. The worker reads each skill's `entrypoint` and dispatches accordingly.

To display the thumbnail in the UI, mint a signed URL for it:

```
GET /v1/drive/sign?path=<thumb-drive-path>&ttl=3600
```

## What this project demonstrates

- **Two skill styles, one API** — a deterministic Python skill for fast structured work, an agentic skill for multi-step LLM work, both submitted with the same `POST /v1/jobs` shape.
- **One source of truth for files** — the upload happens once; both skills reference it by `drive_path`.
- **Polymorphic file inputs in a deterministic skill** — the same code works for uploads, URLs, and inline base64 ([[inputs-and-drive]]).
- **Native vision in an agentic skill** — the agent looks at the image directly via `inputs.attachments`; no `bash cat`, no `media` tool round-trip, no manual URL signing ([[agent-attachments]]).

## Where to take it next

- Add a `tools:` list to `caption/skill.yaml` if you want the agent to call your own Python helpers mid-run (e.g. a `query_dimensions` tool that wraps the same Pillow logic).
- Add more skills under `skills/` for follow-up ops (crop, OCR, classify) — agentic skills can chain them via tool-use.
- Set project secrets (`set_secret`) for any third-party keys your skill code needs; they're injected as env vars at run time.

---

# Concepts

Source: https://puras.co/docs/concepts
Category: Getting Started

> Projects, deployments, skills, jobs, secrets, drive, billing.

## Project

The unit of tenancy. Holds API keys, deployments, secrets, drive files, credit balance, and jobs. A user can own multiple projects; an API key is always project-scoped.

## API key

Format: `puras_live_<prefix8>.<secret32>`. The **dot** separator is part of the key — do not strip or replace it. The prefix is stored in plaintext (used for fast lookup); only `sha256(secret)` is stored. Keys are shown once at creation. Pass as `Authorization: Bearer <key>` on every API call.

## Deployment (project-as-unit)

A deployment is a zip of the whole project, not a per-skill push. The bundle is auto-discovered — there is no root manifest file. The worker scans `skills/*/skill.yaml`; each immediate child directory of `skills/` that contains a `skill.yaml` is registered as a skill, and the directory name **is** the skill name.

Activating a new deployment is a **rolling switch**: new jobs use the active deployment; jobs already running keep their original code until they terminate.

Bundle layout:

```
my-project/
  requirements.txt          # optional — extra pip deps for the deployment venv
  skills/
    ad-creative/
      skill.yaml
      SKILL.md              # system prompt (agentic entrypoint)
      tools/
        render_video.py     # per-skill tool, referenced from skill.yaml
    image-info/
      skill.yaml
      main.py               # deterministic entrypoint
```

## Skill

A directory under `skills/` containing a `skill.yaml`. The yaml's `entrypoint` decides what kind of skill it is:

- **`entrypoint: SKILL.md`** (or any `.md` file) → **agentic**. The file is read as the system prompt, the worker starts an LLM tool-use loop, exposes the platform tools (`bash`, `media`, `web_search`, …) plus any user tools you declared, and iterates until the agent stops.
- **`entrypoint: main.py:run`** → **deterministic**. The worker imports `main` from the skill directory and calls `run(inputs: dict)` in an isolated subprocess. No LLM in the loop unless your code calls one.

`skill.yaml` shape:

```yaml
description: One-line summary shown in dashboards and playgrounds.
entrypoint: SKILL.md                       # or "main.py:run"
model: claude/opus-4-7                     # optional, agentic only — see docs/models
disable_bash: false                        # optional, agentic only
input_schema: { ... JSON Schema ... }      # validated before run
output_schema: { ... JSON Schema ... }     # validated after run (for deterministic skills)
                                           # or via the auto-injected `set_output` tool (agentic)
tools:                                     # optional, agentic only
  - name: render_video
    description: Render a video from a storyboard.
    entrypoint: tools/render_video.py:run  # path relative to the skill dir
    input_schema: { ... }
    output_schema: { ... }
```

Defaults: bash on for agentic skills, the platform default model is used if `model` is unset. See [[models]] for the available slugs.

### Tools inside an agentic skill

The `tools:` list on an agentic skill declares Python callables the model can invoke via tool-use. Each tool runs in the same subprocess runner the deterministic skills use, with the tool's `input_schema` enforced before dispatch and its `output_schema` enforced before the result goes back to the model. Tools are **per-skill** and namespaced by the skill — they're not a separate top-level concept.

## Job

Submitted via `POST /v1/jobs` (or the `submit_job` MCP tool):

```json
{ "skill": "<skill-name>", "inputs": { } }
```

The worker reads the named skill from the active deployment and dispatches to the agent loop or the deterministic runner based on the skill's entrypoint. There is no `type` field to pick — the skill's manifest is the source of truth.

Lifecycle: `queued → running → succeeded | failed | cancelled`.

### Three call modes

A single endpoint covers all three patterns — pick the one that matches the caller's appetite for latency vs simplicity:

| Mode   | Request                          | Response                                              | Use when                                      |
|--------|----------------------------------|-------------------------------------------------------|-----------------------------------------------|
| async  | `POST /v1/jobs`                  | `JobOut` immediately, `status="queued"`               | Fire and forget; caller polls `GET /v1/jobs/{id}` |
| sync   | `POST /v1/jobs?wait=true&timeout=N` (1–60s) | `JobOut` once terminal — or current row at timeout | Short jobs (deterministic skills, fast agentic skills) |
| stream | `POST /v1/jobs?stream=true`      | `text/event-stream` (SSE) of `job_events` live        | Long agentic skills where the caller wants tool calls / model responses in real time |

`wait` and `stream` are mutually exclusive. `stream` returns SSE frames as JSON-encoded `{id, type, payload}` blocks, terminated by an `event: end` frame with the final status. `GET /v1/jobs/{id}/stream` attaches to an in-flight job using the same protocol — useful for reconnects.

Delivery: the worker fires `pg_notify('puras_job_events:{job_id}', ...)` on every event row, so SSE latency is roughly one network roundtrip. A 15s heartbeat (`: ping`) keeps proxies and idle clients honest and acts as a safety net for missed notifies.

On `wait` timeout, the row is returned in its current non-terminal state — keep polling or call `tail_job`.

The worker claims jobs with `SELECT ... FOR UPDATE SKIP LOCKED FROM jobs WHERE status='queued' AND projects.credit_balance_micros > 0`, plus a `pg_notify` fast path. A job with no credit will sit in `queued` until the balance becomes positive.

## Secrets

Project-scoped key/value pairs. Names must match `^[A-Z_][A-Z0-9_]*$` (env-var style). Values are encrypted at rest and **never** returned by the API; only names are listable. Injected as environment variables into both the agent's bash tool and the skill subprocess at run time.

## Drive

Per-project private storage (Supabase bucket under the hood). All projects share one `drive` bucket; isolation is by path prefix (`<project_id>/...`). Skills can write outputs (images, videos, audio) into the drive and refer to them by path.

Three HTTP surfaces against the drive — all auth via JWT or API key, all project-scoped:

- `POST /v1/drive/upload` (multipart) — apps push files in, get back a relative `drive_path`.
- `GET /v1/drive/list?prefix=...` — list direct children (folders + files).
- `GET /v1/drive/sign?path=...&ttl=...` — mint a signed URL to display or download. Also exposed as the `drive_sign` MCP tool.

Inside a running job the project drive is symlinked at `./drive/` — read/write it as plain files. See [[inputs-and-drive]] for the upload + read flow end-to-end.

## Billing

Currency: **MICROS**. `1 USD = 1_000_000 micros`. All balances and per-call costs are micros — there are no floats in the ledger.

Upstream cost (LLM tokens and media generation) is multiplied by the platform margin `PURAS_MARGIN_PCT` (default 20%) before being debited from `projects.credit_balance_micros`. The model is a marketplace: you pay upstream + margin, atomically, when each call lands.

Balance is checked on **claim**, not per-call mid-job. A long job can run the balance negative on its last call; subsequent jobs will not claim until you top up. Admin top-up: `scripts/grant_credits.sh <project_id> <usd_amount>`.

### Per-job cost

Every job carries a denormalized rollup field `cost_micros` on `JobOut`. It accumulates every charge that landed against the job — LLM steps (provider/model from the active deployment), media.run calls, and web tool calls — so the dashboard and the API caller can show "this job cost $X" without joining `usage_events`.

For the breakdown — which model, how many calls, how many tokens, how much each line item cost — call `GET /v1/jobs/{job_id}/usage`. Each row is one `usage_events` entry with `provider`, `model`, `input_tokens`, `output_tokens`, and `billed_micros`. The sum of `billed_micros` over those rows equals the job's `cost_micros`.

See [[sdk-media]] for the media generation surface and how its pricing flows through.

---

# MCP tools

Source: https://puras.co/docs/mcp-tools
Category: Reference

> Every tool the purasbackend MCP server exposes, grouped by area.

The MCP server is a thin stdio wrapper over the Puras HTTP API. It loads `~/.purasbackend/config.json` once per call and authenticates with the stored API key. Anything you can do here you can do over HTTP with the same key — the MCP is purely a convenience for AI coding agents.

## Config

- `configure(api_base, api_key, project_id) -> str` — persist credentials to `~/.purasbackend/config.json` (chmod 600). Call this once per machine.
- `show_config() -> dict` — return current config. Only the **prefix** of the API key is included, never the secret.

## Deployments

- `push(project_dir, notes="", activate=true) -> dict` — zip the directory (excluding `.git`, `.venv`, `node_modules`, etc.) and upload as a new deployment. The directory must contain a `skills/` folder with at least one `skills/<name>/skill.yaml` inside; no root manifest is needed. Activates immediately by default; in-flight jobs keep running on the previous version.
- `list_deployments() -> list[dict]` — newest first.
- `activate_deployment(deployment_id) -> dict` — rolling switch to a specific deployment.
- `delete_deployment(deployment_id) -> str` — cannot delete the currently active one.

## Jobs

- `submit_job(skill, inputs={}, wait=false, timeout=30) -> dict` — submit a job against the named skill in the active deployment. The worker reads `skill.yaml` and dispatches to the agent loop (`.md` entrypoint) or the deterministic Python runner (`.py:func` entrypoint) — same submission shape either way. Use `wait=true` only for skills you expect to finish in seconds; the hard ceiling is 60s and the row may still be non-terminal on return.
- `get_job(job_id) -> dict` — current status + result.
- `list_jobs(status="", limit=25) -> list[dict]` — filter by `queued|running|succeeded|failed|cancelled` or empty for all.
- `tail_job(job_id, max_seconds=60) -> dict` — polls the events endpoint until the job terminates or the deadline hits. Returns `{ job, events, timed_out? }`.
- `cancel_job(job_id) -> dict` — marks the job cancelled. The worker checks for cancellation between agent steps; an in-flight tool call still completes.

The legacy `submit_agentic_job` and `submit_function_job` tools are kept as aliases for backwards compatibility — both now resolve to `submit_job` (the worker no longer needs a `type` hint, the skill's manifest is the source of truth). Prefer `submit_job`.

## Feedback

End-user (or owner) thumbs-up / down + optional comment on a job result. One row per `(job, end_user_id)` — calling again with the same id overwrites in place. The dashboard renders this on the job detail page; your own frontend can hit the same endpoints with the project API key.

- `submit_job_feedback(job_id, rating=0, comment="", end_user_id="") -> dict` — upsert. `rating` is `-1` (down) / `0` (no thumb) / `+1` (up). Either a non-zero rating or a non-empty comment is required.
- `list_job_feedback(job_id) -> list[dict]` — newest first.
- `job_feedback_stats(job_id) -> dict` — `{up, down, count, score}`.

## Secrets

- `list_secrets() -> list[dict]` — names only. Values are never returned by the API.
- `set_secret(name, value) -> dict` — create or overwrite. Name must match `^[A-Z_][A-Z0-9_]*$`.
- `delete_secret(name) -> str`.

## Drive

- `drive_sign(path, ttl=3600) -> dict` — mint a signed URL for a file under the project's drive. `ttl` in seconds.

Uploading and listing are HTTP-only (no MCP wrappers — multipart over stdio is awkward and AI agents push files via the API directly):

- `POST /v1/drive/upload` — multipart upload, returns `{drive_path, signed_url, ...}`.
- `GET /v1/drive/list?prefix=...` — list folders + files under a project subpath.

See [[inputs-and-drive]] for the upload + read pipeline.

## Docs

- `list_docs() -> list[dict]` — every doc page available locally to this MCP install (the ones you're reading).
- `read_doc(slug) -> str` — full markdown body of a single page.
- `search_docs(query, limit=5) -> list[dict]` — ranked hits with snippets.

## Conventions an AI agent should follow

- **Always** pass the full API key including the dot separator: `puras_live_<prefix>.<secret>`.
- **Prefer** `from puras import media` inside skill code over raw HTTP to `/v1/media/generate` — same backend, but the SDK handles auth, drive paths, and billing context for you. See [[sdk-media]].
- **Don't** push partial bundles; a deployment is the whole project. If you want to ship one fix, that's still a new full deployment.
- **Don't** assume `wait=true` returns terminal state. Long jobs need `tail_job` or repeated `get_job`.

---

# Media SDK

Source: https://puras.co/docs/sdk-media
Category: Reference

> media.run() — call a registered media model with any inputs, billed to your project.

There is one knob: `media.run(model, inputs)`. `model` is one of our registered slugs (e.g. `openai/gpt-image-2`, `kuaishou/kling-v3-i2v`). Anything the model accepts as arguments, you pass as `inputs` — we don't validate input shape, the model does. Cost is debited from your project's credit balance at the registered rate.

Most families ship as multiple slugs — one per model variant (text-to-X, image-to-X, reference-to-X, fast vs. standard). Pick the slug that matches what you're doing; the catalog at `GET /v1/pricing` lists them all. A handful of slugs also have **input-conditional pricing** (e.g. audio on/off, with/without a video reference) — the registered `rate_table` is the source of truth and we bill exactly the row your inputs hit.

The full catalog of slugs and their rates is the [pricing page](/pricing) or `GET /v1/pricing`. Unknown slugs are rejected (400).

## Surface

```python
from puras import media, secret

media.run(
    model: str,
    inputs: dict | None = None,
    *,
    output_path: str | None = None,
    output_url_path: str | None = None,
    kind: str = "auto",       # "image" | "video" | "audio" | "auto"
    **kwargs,                 # merged into inputs (kwargs win)
) -> dict

secret(name: str) -> str      # read a project secret
```

Returns:

```python
{
  "model": "kuaishou/kling-v3-i2v",
  "kind": "video",                              # resolved from "auto" or echoed
  "drive_path": "media/12b5e4d5...mp4",         # path inside your drive
  "output_url": "https://...supabase.../mp4",   # signed URL, TTL ~1h
  "request_id": "...",
  "billed_micros": 672000,
  "billed_usd": 0.672,
  "meta": {"metrics": {"inference_time": 12.4}, ...},
}
```

## Patterns

### Text-to-image

```python
img = media.run(
    "openai/gpt-image-2",
    {"prompt": "a vintage red bicycle", "size": "1024x1024", "quality": "high"},
)

# Edit / composite reference images
edited = media.run(
    "bytedance/seedream-v4-edit",
    {"image_url": img["output_url"], "prompt": "give it neon trim"},
)
```

### Image-to-video

```python
vid = media.run(
    "bytedance/seedance-2-i2v",
    {
        "image_url": "https://...",
        "prompt": "make it spin slowly",
        "duration": 8,
    },
    output_path="renders/spin.mp4",
)
```

### Reference-to-video (Seedance r2v)

```python
clip = media.run(
    "bytedance/seedance-2-r2v",
    {
        "prompt": "match the style of the reference clip",
        "image_urls": ["https://..."],
        "video_url": "https://...",   # triggers the with-reference rate
        "duration": 6,
    },
)
```

### Audio + voice control (Kling v3, Veo 3)

```python
# Audio off — cheapest tier
clip = media.run("kuaishou/kling-v3-t2v", prompt="rainy alley", duration=5)

# Audio on — billed at the audio-on per-second rate
clip = media.run(
    "google/veo-3-t2v",
    prompt="thunder rolling over hills",
    duration=4,
    generate_audio=True,
)
```

### Fast tiers

Where a model has a fast variant, it's a separate slug (`-fast-`) at a lower per-second rate:

```python
quick = media.run("bytedance/seedance-2-fast-t2v", prompt="...", duration=5)
quick = media.run("google/veo-3-fast-i2v", image_url="...", duration=4)
```

### A model with an unusual response shape

If we can't find the output URL automatically, point at it with `output_url_path` (jq-style):

```python
weird = media.run(
    "kuaishou/kling-v3-image",
    {...},
    output_url_path="outputs[0].asset.url",
)
```

## Inside a deterministic skill (or a per-skill tool)

```python
from puras import media

def run(inputs: dict) -> dict:
    img = media.run(
        "openai/gpt-image-2",
        {"prompt": inputs["prompt"]},
    )
    return {"drive_path": img["drive_path"], "billed_usd": img["billed_usd"]}
```

Same import works from any Python callable the worker dispatches — a deterministic skill's `entrypoint`, or one of an agentic skill's declared `tools:`.

## As an agent tool (built-in)

Agentic skills automatically get a `media` tool exposed to the model (same surface as `media.run()`). The agent picks a model slug and inputs at runtime — you don't declare it in `skill.yaml`. See [[concepts]] for skill setup; the `tools:` list on a skill is for your own Python helpers, not for the built-in `media` tool.

## How billing resolves

Every successful call is priced from the registry — there is no live lookup and no fallback. Each slug carries one of:

- **per-call** (most image models)
- **per-second** of output (video / audio)
- **per-megapixel** (some image models)
- **input-conditional** — a rate table indexed by inputs (audio on/off, with/without video reference, quality × size). The bill is computed from the inputs you actually sent.

The exact amount lands in `billed_micros` and is also written to a `usage_events` row you can audit.

## Conventions

- **Always** prefer `media.run` over hitting `/v1/media/generate` with raw `httpx`. The SDK injects the worker's service token and job context for you — a raw call won't bill correctly.
- **Don't** rely on the file extension matching the `kind`. Some models return `.webp` for `kind="image"`; the SDK detects extension from the URL and saves accordingly. `drive_path` is authoritative.
- **Don't** open the returned `output_url` from server code to "verify" the file — it's a signed URL meant for the client. Use `drive_path` server-side; mint a fresh signed URL with the `drive_sign` MCP tool when you need to share.
- **Don't** retry on a failed call without inspecting `error`. Most model errors are deterministic (bad params, NSFW filter, model down) — a blind retry just burns more credit. Fix the inputs first.

---

# Models and pricing

Source: https://puras.co/docs/models
Category: Reference

> Every model you can put in skill.yaml, plus how media generation is billed.

Two surfaces are billed: **agentic skills** (per token, by the model in `skill.yaml`) and **media generation** (per call / second / megapixel, by the model slug you pass to `media.run()`). Deterministic Python skills are free at the platform level — you only pay for what they call into.

Money is tracked in **MICROS** (`1 USD = 1,000,000 micros`); the dashboard and API responses convert back to dollars.

## Agentic skill models

The `model:` field in `skill.yaml` is a public slug in `family/variant` form. Use one of:

```yaml
# skills/my-skill/skill.yaml
model: claude/sonnet-4-7
```

Three families are available — **Claude**, **GPT**, and **Gemini**. Pick by need; you don't need to know how each one is served.

### Claude — prices

Per **1 million tokens**, rounded to the cent.

| Slug | Family | Input | Output | Notes |
|---|---|---|---|---|
| `claude/opus-4-7` | Opus 4.7 | $18.00 | $90.00 | Highest reasoning, 1M context tier available |
| `claude/opus-4-6` | Opus 4.6 | $18.00 | $90.00 | |
| `claude/opus-4-5` | Opus 4.5 | $18.00 | $90.00 | |
| `claude/sonnet-4-7` | Sonnet 4.7 | $3.60 | $18.00 | Balanced — recommended default |
| `claude/sonnet-4-6` | Sonnet 4.6 | $3.60 | $18.00 | |
| `claude/sonnet-4-5` | Sonnet 4.5 | $3.60 | $18.00 | |
| `claude/haiku-4-5` | Haiku 4.5 | $0.30 | $1.50 | Fastest, cheapest — good for narrow tools |

Vision and PDF attachments are supported on every Claude slug above. See [[agent-attachments]] for how attachments flow through.

### GPT — prices

Per **1 million tokens**, rounded to the cent.

| Slug | Family | Input | Output | Notes |
|---|---|---|---|---|
| `gpt/5` | GPT-5 | $1.50 | $12.00 | Strong general reasoning |
| `gpt/5-mini` | GPT-5 mini | $0.30 | $2.40 | Cheap; good for narrow tools |
| `gpt/4o` | GPT-4o | $3.00 | $12.00 | Mature, multimodal |
| `gpt/4o-mini` | GPT-4o mini | $0.18 | $0.72 | Cheapest in the catalog |

Vision (image) attachments supported. PDF attachments are not supported — convert pages to images first.

### Gemini — prices

Per **1 million tokens**, rounded to the cent.

| Slug | Family | Input | Output | Notes |
|---|---|---|---|---|
| `gemini/2.5-pro` | 2.5 Pro | $1.50 | $12.00 | Long-context reasoning |
| `gemini/2.5-flash` | 2.5 Flash | $0.09 | $0.36 | Fast, very cheap |
| `gemini/2.0-flash` | 2.0 Flash | $0.12 | $0.48 | Older fast tier |

Vision (image) attachments supported. PDF attachments are not supported — convert pages to images first.

> Indicative rates. Final cost per job is shown on the job detail page based on the actual tokens used.

### Defaults and fallbacks

- If `model:` is **omitted** from `skill.yaml`, the platform default is used (`claude/sonnet-4-6` today). The fallback is stable per deployment — you'll see it on the agent_start event.
- **Unknown slugs** are rejected at deploy time with a clear error listing the available slugs.

## Media generation

`media.run(model, inputs)` invokes any model we have a registered cost for. The full catalog of slugs and their rates is the [pricing page](/pricing) or `GET /v1/pricing`. Cost is **static per model** — unknown slugs are rejected with HTTP 400 (so a request can't slip through unpriced).

Four pricing shapes — most models use exactly one:

| Shape | Formula | Used for |
|---|---|---|
| Per call | flat fee | Image generators billed per image |
| Per second | rate × `inference_time` (or input duration for video/audio) | Video and audio generation |
| Per megapixel | rate × (width × height ÷ 1,000,000) | Upscalers, some image models |
| Input-conditional | rate table indexed by inputs (audio on/off, quality × size, with/without video reference) | GPT Image 2, Kling v3 video, Veo 3, Seedance r2v |

Cost is debited at job-completion time and shown on the job detail page.

Most model families ship multiple slugs — one per variant. The naming convention:

- `-t2v`, `-i2v`, `-r2v` → text-to-video, image-to-video, reference-to-video
- `-edit` → image-to-image edit
- `-fast-` segment → the fast tier of that variant (lower per-second rate, slightly lower quality)

### Example

```python
from puras import media

# Per-call image, default high@1024x1024
img = media.run("openai/gpt-image-2", {"prompt": "a red bicycle"})

# Cheap edit
cheap = media.run("bytedance/seedream-v4-edit", {
    "image_url": img["output_url"],
    "prompt": "make it neon",
})

# Per-second video with audio on (billed at the audio-on rate)
clip = media.run("google/veo-3-t2v", {"prompt": "...", "duration": 5, "generate_audio": True})

# Image-to-video, fast tier
vid = media.run(
    "bytedance/seedance-2-fast-i2v",
    {"image_url": img["output_url"], "prompt": "spin slowly", "duration": 8},
)
```

The dashboard's Usage tab breaks down spend by model for every billable call so you can see exactly what cost what.

## Free things

- Function execution (deterministic Python skills) and worker time.
- Drive uploads / signed URLs / list calls.
- MCP server, dashboard, API key checks.
- `web_search`, `image_search`, `web_fetch`, `download_url`, `bash`, `file_read` agent tools at the platform level (you only pay for what they trigger — e.g. a `media.run` inside a tool is billed normally).

## Conventions

- **Default to Sonnet** unless you have a reason. Opus's premium pays off for genuinely hard reasoning; Haiku and the mini tiers shine for narrow, structured tools where you can keep the prompt tight.
- **Budget caps live on the project, not the skill.** A runaway agent will burn the project's balance, not just one skill's. Set per-project balance limits in the dashboard.

See [[concepts]] for how usage rolls up into the billing surface, and [[sdk-media]] for the full `media.run` contract.

---

# Media model reference

Source: https://puras.co/docs/media-models-reference
Category: Reference

> Per-slug input schema and pricing for every model usable via `media.run()`. Auto-generated from the catalog.

Every slug here is callable via `media.run(slug, inputs)`. The input fields map 1:1 to the model itself — we don't validate shape, the model does. Cost is debited from your project balance at the listed rate, computed from the inputs you send.

For the SDK signature and patterns, see [[sdk-media]]. For the broader catalog and policy, see [[models]].

## Index

| Slug | Kind | Price |
|---|---|---|
| `openai/gpt-image-2` | image | $0.253 / call |
| `openai/gpt-image-2-edit` | image | $0.263 / call |
| `bytedance/seedream-v4` | image | $0.036 / call |
| `bytedance/seedream-v4-edit` | image | $0.036 / call |
| `google/imagen-4` | image | $0.060 / call |
| `kuaishou/kling-v3-image` | image | $0.034 / call |
| `bytedance/seedance-2-t2v` | video | $0.364 / s |
| `bytedance/seedance-2-i2v` | video | $0.363 / s |
| `bytedance/seedance-2-r2v` | video | $0.363 / s |
| `bytedance/seedance-2-fast-t2v` | video | $0.290 / s |
| `bytedance/seedance-2-fast-i2v` | video | $0.290 / s |
| `bytedance/seedance-2-fast-r2v` | video | $0.290 / s |
| `kuaishou/kling-v3-t2v` | video | $0.134 / s |
| `kuaishou/kling-v3-i2v` | video | $0.134 / s |
| `google/veo-3-t2v` | video | $0.600 / s |
| `google/veo-3-i2v` | video | $0.240 / s |
| `google/veo-3-fast-t2v` | video | $0.300 / s |
| `google/veo-3-fast-i2v` | video | $0.300 / s |

---

## Images

### `openai/gpt-image-2`

**GPT Image 2 · text → image**

Quality × size matrix. Set `quality` ∈ low|medium|high and `size` (e.g. 1024x1024).

**Pricing**

| Variant | Price |
|---|---|
| low · 1024x768 | $0.0060/image |
| low · 1024x1024 | $0.0072/image |
| low · 1024x1536 | $0.0060/image |
| low · 1920x1080 | $0.0060/image |
| low · 2560x1440 | $0.0084/image |
| low · 3840x2160 | $0.014/image |
| medium · 1024x768 | $0.044/image |
| medium · 1024x1024 | $0.064/image |
| medium · 1024x1536 | $0.050/image |
| medium · 1920x1080 | $0.048/image |
| medium · 2560x1440 | $0.067/image |
| medium · 3840x2160 | $0.121/image |
| high · 1024x768 | $0.174/image |
| high · 1024x1024 | $0.253/image |
| high · 1024x1536 | $0.198/image |
| high · 1920x1080 | $0.190/image |
| high · 2560x1440 | $0.266/image |
| high · 3840x2160 | $0.481/image |

**Inputs**

| Field | Type | Required | Default | Notes |
|---|---|:---:|---|---|
| `prompt` | string | ✓ | — | The prompt for image generation. (max 32,000 chars, min 2 chars) |
| `image_size` | enum |  | `"landscape_4_3"` | The size of the generated image. Supports preset names, explicit {width, height}, or 'auto' to let the model pick the best size. Concrete sizes must have both dimensions as multiples of 16, max edge 3840px, aspect ratio <= 3:1, total pixels between 655,360 and 8,294,400. Values: `"square_hd"` \| `"square"` \| `"portrait_4_3"` \| `"portrait_16_9"` \| `"landscape_4_3"` \| `"landscape_16_9"` \| `"auto"` |
| `num_images` | integer |  | `1` | Number of images to generate. (≥ 1, ≤ 4) |
| `output_format` | enum |  | `"png"` | Output format for the images. Values: `"jpeg"` \| `"png"` \| `"webp"` |
| `quality` | enum |  | `"high"` | Quality for the generated image. Use 'auto' to let the model pick the best quality for the prompt. Values: `"auto"` \| `"low"` \| `"medium"` \| `"high"` |

**Output**

Saved to your project drive at `drive_path`; a signed `output_url` (TTL ~1h) is returned.

**Example**

```python
result = media.run(
    "openai/gpt-image-2",
    prompt="...",
    image_size="square_hd",
    quality="high",
)
```

### `openai/gpt-image-2-edit`

**GPT Image 2 (edit) · image → image (edit)**

Image-to-image edit. Quality × size matrix; provide `image_url(s)`.

**Pricing**

| Variant | Price |
|---|---|
| low · 1024x768 | $0.013/image |
| low · 1024x1024 | $0.018/image |
| low · 1024x1536 | $0.022/image |
| low · 1920x1080 | $0.020/image |
| low · 2560x1440 | $0.023/image |
| low · 3840x2160 | $0.029/image |
| medium · 1024x768 | $0.052/image |
| medium · 1024x1024 | $0.073/image |
| medium · 1024x1536 | $0.065/image |
| medium · 1920x1080 | $0.064/image |
| medium · 2560x1440 | $0.082/image |
| medium · 3840x2160 | $0.136/image |
| high · 1024x768 | $0.181/image |
| high · 1024x1024 | $0.263/image |
| high · 1024x1536 | $0.214/image |
| high · 1920x1080 | $0.190/image |
| high · 2560x1440 | $0.281/image |
| high · 3840x2160 | $0.496/image |

**Inputs**

| Field | Type | Required | Default | Notes |
|---|---|:---:|---|---|
| `image_urls` | array<string> | ✓ | — | The URLs of the images to use as a reference for the generation |
| `prompt` | string | ✓ | — | The prompt for image generation. (max 32,000 chars, min 2 chars) |
| `image_size` | enum |  | `"auto"` | The size of the generated image. Use 'auto' to infer from input images. Values: `"square_hd"` \| `"square"` \| `"portrait_4_3"` \| `"portrait_16_9"` \| `"landscape_4_3"` \| `"landscape_16_9"` \| `"auto"` |
| `mask_url` | string |  | — | The URL of the mask image to use for the generation. This indicates what part of the image to edit |
| `num_images` | integer |  | `1` | Number of images to generate. (≥ 1, ≤ 4) |
| `output_format` | enum |  | `"png"` | Output format for the images. Values: `"jpeg"` \| `"png"` \| `"webp"` |
| `quality` | enum |  | `"high"` | Quality for the generated image. Use 'auto' to let the model pick the best quality for the prompt. Values: `"auto"` \| `"low"` \| `"medium"` \| `"high"` |

**Output**

Saved to your project drive at `drive_path`; a signed `output_url` (TTL ~1h) is returned.

**Example**

```python
result = media.run(
    "openai/gpt-image-2-edit",
    prompt="...",
    image_urls=["https://..."],
    image_size="square_hd",
    quality="high",
)
```

### `bytedance/seedream-v4`

**Seedream v4 · text → image**

**Pricing**

$0.036 per call

**Inputs**

| Field | Type | Required | Default | Notes |
|---|---|:---:|---|---|
| `prompt` | string | ✓ | — | The text prompt used to generate the image |
| `enable_safety_checker` | boolean |  | `true` | If set to true, the safety checker will be enabled |
| `enhance_prompt_mode` | enum |  | `"standard"` | The mode to use for enhancing prompt enhancement. Standard mode provides higher quality results but takes longer to generate. Fast mode provides average quality results but takes less time to generate. Values: `"standard"` \| `"fast"` |
| `image_size` | enum |  | `{height: 2048, width: 2048}` | The size of the generated image. Total pixels must be between 960x960 and 4096x4096. Values: `"square_hd"` \| `"square"` \| `"portrait_4_3"` \| `"portrait_16_9"` \| `"landscape_4_3"` \| `"landscape_16_9"` \| `"auto"` \| `"auto_2K"` \| `"auto_4K"` |
| `max_images` | integer |  | `1` | If set to a number greater than one, enables multi-image generation. The model will potentially return up to `max_images` images every generation, and in total, `num_images` generations will be carried out. In total, the number of images generated will be between `num_images` and `max_images*num_images`. (≥ 1, ≤ 6) |
| `num_images` | integer |  | `1` | Number of separate model generations to be run with the prompt. (≥ 1, ≤ 6) |
| `seed` | integer |  | — | Random seed to control the stochasticity of image generation |

**Output**

Saved to your project drive at `drive_path`; a signed `output_url` (TTL ~1h) is returned. Additional metadata available under `meta` (`seed`).

**Example**

```python
result = media.run(
    "bytedance/seedream-v4",
    prompt="...",
    image_size="square_hd",
)
```

### `bytedance/seedream-v4-edit`

**Seedream v4 (edit) · image → image (edit)**

Edit/composite reference images. Provide `image_url` (or list) plus a prompt.

**Pricing**

$0.036 per call

**Inputs**

| Field | Type | Required | Default | Notes |
|---|---|:---:|---|---|
| `image_urls` | array<string> | ✓ | — | List of URLs of input images for editing. Presently, up to 10 image inputs are allowed. If over 10 images are sent, only the last 10 will be used |
| `prompt` | string | ✓ | — | The text prompt used to edit the image |
| `enable_safety_checker` | boolean |  | `true` | If set to true, the safety checker will be enabled |
| `enhance_prompt_mode` | enum |  | `"standard"` | The mode to use for enhancing prompt enhancement. Standard mode provides higher quality results but takes longer to generate. Fast mode provides average quality results but takes less time to generate. Values: `"standard"` \| `"fast"` |
| `image_size` | enum |  | `{height: 2048, width: 2048}` | The size of the generated image. The minimum total image area is 921600 pixels. Failing this, the image size will be adjusted to by scaling it up, while maintaining the aspect ratio. Values: `"square_hd"` \| `"square"` \| `"portrait_4_3"` \| `"portrait_16_9"` \| `"landscape_4_3"` \| `"landscape_16_9"` \| `"auto"` \| `"auto_2K"` \| `"auto_4K"` |
| `max_images` | integer |  | `1` | If set to a number greater than one, enables multi-image generation. The model will potentially return up to `max_images` images every generation, and in total, `num_images` generations will be carried out. In total, the number of images generated will be between `num_images` and `max_images*num_images`. The total number of images (image inputs + image outputs) must not exceed 15. (≥ 1, ≤ 6) |
| `num_images` | integer |  | `1` | Number of separate model generations to be run with the prompt. (≥ 1, ≤ 6) |
| `seed` | integer |  | — | Random seed to control the stochasticity of image generation |

**Output**

Saved to your project drive at `drive_path`; a signed `output_url` (TTL ~1h) is returned. Additional metadata available under `meta` (`seed`).

**Example**

```python
result = media.run(
    "bytedance/seedream-v4-edit",
    prompt="...",
    image_urls=["https://..."],
    image_size="square_hd",
)
```

### `google/imagen-4`

**Imagen 4 · text → image**

**Pricing**

$0.060 per call

**Inputs**

| Field | Type | Required | Default | Notes |
|---|---|:---:|---|---|
| `prompt` | string | ✓ | — | The text prompt to generate an image from. (max 5,000 chars, min 3 chars) |
| `aspect_ratio` | enum |  | `"1:1"` | The aspect ratio of the generated image. Values: `"1:1"` \| `"16:9"` \| `"9:16"` \| `"4:3"` \| `"3:4"` |
| `num_images` | integer |  | `1` | The number of images to generate. (≥ 1, ≤ 4) |
| `output_format` | enum |  | `"png"` | The format of the generated image. Values: `"jpeg"` \| `"png"` \| `"webp"` |
| `resolution` | enum |  | `"1K"` | The resolution of the generated image. Values: `"1K"` \| `"2K"` |
| `safety_tolerance` | enum |  | `"4"` | The safety tolerance level for content moderation. 1 is the most strict (blocks most content), 6 is the least strict. Values: `"1"` \| `"2"` \| `"3"` \| `"4"` \| `"5"` \| `"6"` |
| `seed` | integer |  | — | The seed for the random number generator |

**Output**

Saved to your project drive at `drive_path`; a signed `output_url` (TTL ~1h) is returned.

**Example**

```python
result = media.run(
    "google/imagen-4",
    prompt="...",
)
```

### `kuaishou/kling-v3-image`

**Kling v3 (image) · text → image**

**Pricing**

$0.034 per call

**Inputs**

| Field | Type | Required | Default | Notes |
|---|---|:---:|---|---|
| `prompt` | string | ✓ | — | Text prompt for image generation. Max 2500 characters. (max 2,500 chars) |
| `aspect_ratio` | enum |  | `"16:9"` | Aspect ratio of generated images. Values: `"16:9"` \| `"9:16"` \| `"1:1"` \| `"4:3"` \| `"3:4"` \| `"3:2"` \| `"2:3"` \| `"21:9"` |
| `elements` | array<elementinput> |  | — | Optional: Elements (characters/objects) to include in the image for face control. Each element can have a frontal image and optionally reference images |
| `negative_prompt` | string |  | — | Negative text prompt. It is recommended to supplement negative prompt information through negative sentences directly within positive prompts |
| `num_images` | integer |  | `1` | Number of images to generate (1-9). (≥ 1, ≤ 9) |
| `output_format` | enum |  | `"png"` | The format of the generated image. Values: `"jpeg"` \| `"png"` \| `"webp"` |
| `resolution` | enum |  | `"1K"` | Image generation resolution. 1K: standard, 2K: high-res. Values: `"1K"` \| `"2K"` |

**Output**

Saved to your project drive at `drive_path`; a signed `output_url` (TTL ~1h) is returned.

**Example**

```python
result = media.run(
    "kuaishou/kling-v3-image",
    prompt="...",
)
```


---

## Videos

### `bytedance/seedance-2-t2v`

**Seedance 2.0 — text→video**

720p–1080p text-to-video. Audio included.

**Pricing**

$0.364 per second of output

**Inputs**

| Field | Type | Required | Default | Notes |
|---|---|:---:|---|---|
| `prompt` | string | ✓ | — | The text prompt used to generate the video |
| `aspect_ratio` | enum |  | `"auto"` | The aspect ratio of the generated video. Use 16:9 for landscape, 9:16 for portrait/vertical, 1:1 for square, 21:9 for ultrawide cinematic, or auto to let the model decide. Values: `"auto"` \| `"21:9"` \| `"16:9"` \| `"4:3"` \| `"1:1"` \| `"3:4"` \| `"9:16"` |
| `duration` | enum |  | `"auto"` | Duration of the video in seconds. Supports 4 to 15 seconds, or auto to let the model decide based on the prompt. Values: `"auto"` \| `"4"` \| `"5"` \| `"6"` \| `"7"` \| `"8"` \| `"9"` \| `"10"` \| `"11"` \| `"12"` \| `"13"` \| `"14"` \| `"15"` |
| `generate_audio` | boolean |  | `true` | Whether to generate synchronized audio for the video, including sound effects, ambient sounds, and lip-synced speech. The cost of video generation is the same regardless of whether audio is generated or not |
| `resolution` | enum |  | `"720p"` | Video resolution - 480p for faster generation, 720p for balance, 1080p for highest quality. Values: `"480p"` \| `"720p"` \| `"1080p"` |
| `seed` | integer |  | — | Random seed for reproducibility. Note that results may still vary slightly even with the same seed |

**Output**

Saved to your project drive at `drive_path`; a signed `output_url` (TTL ~1h) is returned. Additional metadata available under `meta` (`seed`).

**Example**

```python
result = media.run(
    "bytedance/seedance-2-t2v",
    prompt="...",
    duration="auto",
)
```

### `bytedance/seedance-2-i2v`

**Seedance 2.0 — image→video**

**Pricing**

$0.363 per second of output

**Inputs**

| Field | Type | Required | Default | Notes |
|---|---|:---:|---|---|
| `image_url` | string | ✓ | — | The URL of the starting frame image to animate. Supported formats: JPEG, PNG, WebP. Max 30 MB |
| `prompt` | string | ✓ | — | The text prompt describing the desired motion and action for the video |
| `aspect_ratio` | enum |  | `"auto"` | The aspect ratio of the generated video. Use 16:9 for landscape, 9:16 for portrait/vertical, 1:1 for square, 21:9 for ultrawide cinematic, or auto to infer from the input image. Values: `"auto"` \| `"21:9"` \| `"16:9"` \| `"4:3"` \| `"1:1"` \| `"3:4"` \| `"9:16"` |
| `duration` | enum |  | `"auto"` | Duration of the video in seconds. Supports 4 to 15 seconds, or auto to let the model decide based on the prompt. Values: `"auto"` \| `"4"` \| `"5"` \| `"6"` \| `"7"` \| `"8"` \| `"9"` \| `"10"` \| `"11"` \| `"12"` \| `"13"` \| `"14"` \| `"15"` |
| `end_image_url` | string |  | — | The URL of the image to use as the last frame of the video. When provided, the generated video will transition from the starting image to this ending image. Supported formats: JPEG, PNG, WebP. Max 30 MB |
| `generate_audio` | boolean |  | `true` | Whether to generate synchronized audio for the video, including sound effects, ambient sounds, and lip-synced speech. The cost of video generation is the same regardless of whether audio is generated or not |
| `resolution` | enum |  | `"720p"` | Video resolution - 480p for faster generation, 720p for balance, 1080p for highest quality. Values: `"480p"` \| `"720p"` \| `"1080p"` |
| `seed` | integer |  | — | Random seed for reproducibility. Note that results may still vary slightly even with the same seed |

**Output**

Saved to your project drive at `drive_path`; a signed `output_url` (TTL ~1h) is returned. Additional metadata available under `meta` (`seed`).

**Example**

```python
result = media.run(
    "bytedance/seedance-2-i2v",
    prompt="...",
    image_url="https://...",
    duration="auto",
)
```

### `bytedance/seedance-2-r2v`

**Seedance 2.0 — reference→video**

Up to 9 image / 3 video / 3 audio references. Per-second drops 40% when a video reference is passed.

**Pricing**

| Variant | Price |
|---|---|
| without video reference | $0.363/s |
| with video reference | $0.218/s |

**Inputs**

| Field | Type | Required | Default | Notes |
|---|---|:---:|---|---|
| `prompt` | string | ✓ | — | The text prompt used to generate the video |
| `aspect_ratio` | enum |  | `"auto"` | The aspect ratio of the generated video. Use 16:9 for landscape, 9:16 for portrait/vertical, 1:1 for square, 21:9 for ultrawide cinematic, or auto to let the model decide. Values: `"auto"` \| `"21:9"` \| `"16:9"` \| `"4:3"` \| `"1:1"` \| `"3:4"` \| `"9:16"` |
| `audio_urls` | array<string> |  | — | Reference audio to guide video generation. Refer to them in the prompt as @Audio1, @Audio2, etc. Supported formats: MP3, WAV. Up to 3 files, combined duration must not exceed 15 seconds. Max 15 MB per file.If audio is provided, at least one reference image or video is required |
| `duration` | enum |  | `"auto"` | Duration of the video in seconds. Supports 4 to 15 seconds, or auto to let the model decide based on the prompt. Values: `"auto"` \| `"4"` \| `"5"` \| `"6"` \| `"7"` \| `"8"` \| `"9"` \| `"10"` \| `"11"` \| `"12"` \| `"13"` \| `"14"` \| `"15"` |
| `generate_audio` | boolean |  | `true` | Whether to generate synchronized audio for the video, including sound effects, ambient sounds, and lip-synced speech. The cost of video generation is the same regardless of whether audio is generated or not |
| `image_urls` | array<string> |  | — | Reference images to guide video generation. Refer to them in the prompt as @Image1, @Image2, etc. Supported formats: JPEG, PNG, WebP. Max 30 MB per image. Up to 9 images. Total files across all modalities must not exceed 12 |
| `resolution` | enum |  | `"720p"` | Video resolution - 480p for faster generation, 720p for balance, 1080p for highest quality. Values: `"480p"` \| `"720p"` \| `"1080p"` |
| `seed` | integer |  | — | Random seed for reproducibility. Note that results may still vary slightly even with the same seed |
| `video_urls` | array<string> |  | — | Reference videos to guide video generation. Refer to them in the prompt as @Video1, @Video2, etc. Supported formats: MP4, MOV. Up to 3 videos, combined duration must be between 2 and 15 seconds, total size under 50 MB. Each video must be between ~480p (640x640) and ~720p (834x1112) in resolution |

**Output**

Saved to your project drive at `drive_path`; a signed `output_url` (TTL ~1h) is returned. Additional metadata available under `meta` (`seed`).

**Example**

```python
result = media.run(
    "bytedance/seedance-2-r2v",
    prompt="...",
    image_urls=["https://..."],
    duration="auto",
)
```

### `bytedance/seedance-2-fast-t2v`

**Seedance 2.0 Fast — text→video**

**Pricing**

$0.290 per second of output

**Inputs**

| Field | Type | Required | Default | Notes |
|---|---|:---:|---|---|
| `prompt` | string | ✓ | — | The text prompt used to generate the video |
| `aspect_ratio` | enum |  | `"auto"` | The aspect ratio of the generated video. Use 16:9 for landscape, 9:16 for portrait/vertical, 1:1 for square, 21:9 for ultrawide cinematic, or auto to let the model decide. Values: `"auto"` \| `"21:9"` \| `"16:9"` \| `"4:3"` \| `"1:1"` \| `"3:4"` \| `"9:16"` |
| `duration` | enum |  | `"auto"` | Duration of the video in seconds. Supports 4 to 15 seconds, or auto to let the model decide based on the prompt. Values: `"auto"` \| `"4"` \| `"5"` \| `"6"` \| `"7"` \| `"8"` \| `"9"` \| `"10"` \| `"11"` \| `"12"` \| `"13"` \| `"14"` \| `"15"` |
| `generate_audio` | boolean |  | `true` | Whether to generate synchronized audio for the video, including sound effects, ambient sounds, and lip-synced speech. The cost of video generation is the same regardless of whether audio is generated or not |
| `resolution` | enum |  | `"720p"` | Video resolution - 480p for faster generation, 720p for balance. Values: `"480p"` \| `"720p"` |
| `seed` | integer |  | — | Random seed for reproducibility. Note that results may still vary slightly even with the same seed |

**Output**

Saved to your project drive at `drive_path`; a signed `output_url` (TTL ~1h) is returned. Additional metadata available under `meta` (`seed`).

**Example**

```python
result = media.run(
    "bytedance/seedance-2-fast-t2v",
    prompt="...",
    duration="auto",
)
```

### `bytedance/seedance-2-fast-i2v`

**Seedance 2.0 Fast — image→video**

**Pricing**

$0.290 per second of output

**Inputs**

| Field | Type | Required | Default | Notes |
|---|---|:---:|---|---|
| `image_url` | string | ✓ | — | The URL of the starting frame image to animate. Supported formats: JPEG, PNG, WebP. Max 30 MB |
| `prompt` | string | ✓ | — | The text prompt describing the desired motion and action for the video |
| `aspect_ratio` | enum |  | `"auto"` | The aspect ratio of the generated video. Use 16:9 for landscape, 9:16 for portrait/vertical, 1:1 for square, 21:9 for ultrawide cinematic, or auto to infer from the input image. Values: `"auto"` \| `"21:9"` \| `"16:9"` \| `"4:3"` \| `"1:1"` \| `"3:4"` \| `"9:16"` |
| `duration` | enum |  | `"auto"` | Duration of the video in seconds. Supports 4 to 15 seconds, or auto to let the model decide based on the prompt. Values: `"auto"` \| `"4"` \| `"5"` \| `"6"` \| `"7"` \| `"8"` \| `"9"` \| `"10"` \| `"11"` \| `"12"` \| `"13"` \| `"14"` \| `"15"` |
| `end_image_url` | string |  | — | The URL of the image to use as the last frame of the video. When provided, the generated video will transition from the starting image to this ending image. Supported formats: JPEG, PNG, WebP. Max 30 MB |
| `generate_audio` | boolean |  | `true` | Whether to generate synchronized audio for the video, including sound effects, ambient sounds, and lip-synced speech. The cost of video generation is the same regardless of whether audio is generated or not |
| `resolution` | enum |  | `"720p"` | Video resolution - 480p for faster generation, 720p for balance. Values: `"480p"` \| `"720p"` |
| `seed` | integer |  | — | Random seed for reproducibility. Note that results may still vary slightly even with the same seed |

**Output**

Saved to your project drive at `drive_path`; a signed `output_url` (TTL ~1h) is returned. Additional metadata available under `meta` (`seed`).

**Example**

```python
result = media.run(
    "bytedance/seedance-2-fast-i2v",
    prompt="...",
    image_url="https://...",
    duration="auto",
)
```

### `bytedance/seedance-2-fast-r2v`

**Seedance 2.0 Fast — reference→video**

**Pricing**

| Variant | Price |
|---|---|
| without video reference | $0.290/s |
| with video reference | $0.174/s |

**Inputs**

| Field | Type | Required | Default | Notes |
|---|---|:---:|---|---|
| `prompt` | string | ✓ | — | The text prompt used to generate the video |
| `aspect_ratio` | enum |  | `"auto"` | The aspect ratio of the generated video. Use 16:9 for landscape, 9:16 for portrait/vertical, 1:1 for square, 21:9 for ultrawide cinematic, or auto to let the model decide. Values: `"auto"` \| `"21:9"` \| `"16:9"` \| `"4:3"` \| `"1:1"` \| `"3:4"` \| `"9:16"` |
| `audio_urls` | array<string> |  | — | Reference audio to guide video generation. Refer to them in the prompt as @Audio1, @Audio2, etc. Supported formats: MP3, WAV. Up to 3 files, combined duration must not exceed 15 seconds. Max 15 MB per file.If audio is provided, at least one reference image or video is required |
| `duration` | enum |  | `"auto"` | Duration of the video in seconds. Supports 4 to 15 seconds, or auto to let the model decide based on the prompt. Values: `"auto"` \| `"4"` \| `"5"` \| `"6"` \| `"7"` \| `"8"` \| `"9"` \| `"10"` \| `"11"` \| `"12"` \| `"13"` \| `"14"` \| `"15"` |
| `generate_audio` | boolean |  | `true` | Whether to generate synchronized audio for the video, including sound effects, ambient sounds, and lip-synced speech. The cost of video generation is the same regardless of whether audio is generated or not |
| `image_urls` | array<string> |  | — | Reference images to guide video generation. Refer to them in the prompt as @Image1, @Image2, etc. Supported formats: JPEG, PNG, WebP. Max 30 MB per image. Up to 9 images. Total files across all modalities must not exceed 12 |
| `resolution` | enum |  | `"720p"` | Video resolution - 480p for faster generation, 720p for balance. Values: `"480p"` \| `"720p"` |
| `seed` | integer |  | — | Random seed for reproducibility. Note that results may still vary slightly even with the same seed |
| `video_urls` | array<string> |  | — | Reference videos to guide video generation. Refer to them in the prompt as @Video1, @Video2, etc. Supported formats: MP4, MOV. Up to 3 videos, combined duration must be between 2 and 15 seconds, total size under 50 MB. Each video must be between ~480p (640x640) and ~720p (834x1112) in resolution |

**Output**

Saved to your project drive at `drive_path`; a signed `output_url` (TTL ~1h) is returned. Additional metadata available under `meta` (`seed`).

**Example**

```python
result = media.run(
    "bytedance/seedance-2-fast-r2v",
    prompt="...",
    image_urls=["https://..."],
    duration="auto",
)
```

### `kuaishou/kling-v3-t2v`

**Kling v3 Pro — text→video**

Set `generate_audio: true` to enable audio, `voice_control: true` for voice.

**Pricing**

| Variant | Price |
|---|---|
| audio off | $0.134/s |
| audio on | $0.202/s |
| audio + voice control | $0.235/s |

**Inputs**

| Field | Type | Required | Default | Notes |
|---|---|:---:|---|---|
| `aspect_ratio` | enum |  | `"16:9"` | The aspect ratio of the generated video frame. Values: `"16:9"` \| `"9:16"` \| `"1:1"` |
| `cfg_scale` | number |  | `0.5` | The CFG (Classifier Free Guidance) scale is a measure of how close you want
            the model to stick to your prompt. (≥ 0, ≤ 1) |
| `duration` | enum |  | `"5"` | The duration of the generated video in seconds. Values: `"3"` \| `"4"` \| `"5"` \| `"6"` \| `"7"` \| `"8"` \| `"9"` \| `"10"` \| `"11"` \| `"12"` \| `"13"` \| `"14"` \| `"15"` |
| `generate_audio` | boolean |  | `true` | Whether to generate native audio for the video. Supports Chinese and English voice output. Other languages are automatically translated to English. For English speech, use lowercase letters; for acronyms or proper nouns, use uppercase |
| `multi_prompt` | array<klingv3multipromptelement> |  | — | List of prompts for multi-shot video generation. If provided, overrides the single prompt and divides the video into multiple shots with specified prompts and durations |
| `negative_prompt` | string |  | `"blur, distort, and low quality"` | (max 2,500 chars) |
| `prompt` | string |  | — | Text prompt for video generation. Either prompt or multi_prompt must be provided, but not both |
| `shot_type` | enum |  | `"customize"` | The type of multi-shot video generation. 'intelligent' lets the model automatically determine shot structure. Values: `"customize"` \| `"intelligent"` |

**Output**

Saved to your project drive at `drive_path`; a signed `output_url` (TTL ~1h) is returned.

**Example**

```python
result = media.run(
    "kuaishou/kling-v3-t2v",
    prompt="...",
    duration="5",
)
```

### `kuaishou/kling-v3-i2v`

**Kling v3 Pro — image→video**

**Pricing**

| Variant | Price |
|---|---|
| audio off | $0.134/s |
| audio on | $0.202/s |
| audio + voice control | $0.235/s |

**Inputs**

| Field | Type | Required | Default | Notes |
|---|---|:---:|---|---|
| `start_image_url` | string | ✓ | — | URL of the image to be used for the video |
| `cfg_scale` | number |  | `0.5` | The CFG (Classifier Free Guidance) scale is a measure of how close you want
            the model to stick to your prompt. (≥ 0, ≤ 1) |
| `duration` | enum |  | `"5"` | The duration of the generated video in seconds. Values: `"3"` \| `"4"` \| `"5"` \| `"6"` \| `"7"` \| `"8"` \| `"9"` \| `"10"` \| `"11"` \| `"12"` \| `"13"` \| `"14"` \| `"15"` |
| `elements` | array<klingv3comboelementinput> |  | — | Elements (characters/objects) to include in the video. Each example can either be an image set (frontal + reference images) or a video. Reference in prompt as @Element1, @Element2, etc |
| `end_image_url` | string |  | — | URL of the image to be used for the end of the video |
| `generate_audio` | boolean |  | `true` | Whether to generate native audio for the video. Supports Chinese and English voice output. Other languages are automatically translated to English. For English speech, use lowercase letters; for acronyms or proper nouns, use uppercase |
| `multi_prompt` | array<klingv3multipromptelement> |  | — | List of prompts for multi-shot video generation. If provided, divides the video into multiple shots |
| `negative_prompt` | string |  | `"blur, distort, and low quality"` | (max 2,500 chars) |
| `prompt` | string |  | — | Text prompt for video generation. Either prompt or multi_prompt must be provided, but not both |
| `shot_type` | enum |  | `"customize"` | The type of multi-shot video generation. 'intelligent' lets the model automatically determine shot structure. Values: `"customize"` \| `"intelligent"` |

**Output**

Saved to your project drive at `drive_path`; a signed `output_url` (TTL ~1h) is returned.

**Example**

```python
result = media.run(
    "kuaishou/kling-v3-i2v",
    prompt="...",
    duration="5",
)
```

### `google/veo-3-t2v`

**Veo 3 — text→video**

**Pricing**

| Variant | Price |
|---|---|
| audio off | $0.600/s |
| audio on | $0.900/s |

**Inputs**

| Field | Type | Required | Default | Notes |
|---|---|:---:|---|---|
| `prompt` | string | ✓ | — | The text prompt describing the video you want to generate. (max 20,000 chars) |
| `aspect_ratio` | enum |  | `"16:9"` | The aspect ratio of the generated video. Values: `"16:9"` \| `"9:16"` |
| `auto_fix` | boolean |  | `true` | Whether to automatically attempt to fix prompts that fail content policy or other validation checks by rewriting them |
| `duration` | enum |  | `"8s"` | The duration of the generated video. Values: `"4s"` \| `"6s"` \| `"8s"` |
| `generate_audio` | boolean |  | `true` | Whether to generate audio for the video |
| `negative_prompt` | string |  | — | A negative prompt to guide the video generation |
| `resolution` | enum |  | `"720p"` | The resolution of the generated video. Values: `"720p"` \| `"1080p"` |
| `safety_tolerance` | enum |  | `"4"` | The safety tolerance level for content moderation. 1 is the most strict (blocks most content), 6 is the least strict. Values: `"1"` \| `"2"` \| `"3"` \| `"4"` \| `"5"` \| `"6"` |
| `seed` | integer |  | — | The seed for the random number generator |

**Output**

Saved to your project drive at `drive_path`; a signed `output_url` (TTL ~1h) is returned.

**Example**

```python
result = media.run(
    "google/veo-3-t2v",
    prompt="...",
    duration="8s",
)
```

### `google/veo-3-i2v`

**Veo 3 — image→video**

**Pricing**

| Variant | Price |
|---|---|
| audio off | $0.240/s |
| audio on | $0.480/s |

**Inputs**

| Field | Type | Required | Default | Notes |
|---|---|:---:|---|---|
| `image_url` | string | ✓ | — | URL of the input image to animate. Should be 720p or higher resolution in 16:9 or 9:16 aspect ratio. If the image is not in 16:9 or 9:16 aspect ratio, it will be cropped to fit |
| `prompt` | string | ✓ | — | The text prompt describing how the image should be animated. (max 20,000 chars) |
| `aspect_ratio` | enum |  | `"auto"` | The aspect ratio of the generated video. Values: `"auto"` \| `"16:9"` \| `"9:16"` |
| `auto_fix` | boolean |  | `false` | Whether to automatically attempt to fix prompts that fail content policy or other validation checks by rewriting them |
| `duration` | enum |  | `"8s"` | The duration of the generated video. Values: `"4s"` \| `"6s"` \| `"8s"` |
| `generate_audio` | boolean |  | `true` | Whether to generate audio for the video |
| `negative_prompt` | string |  | — | A negative prompt to guide the video generation |
| `resolution` | enum |  | `"720p"` | The resolution of the generated video. Values: `"720p"` \| `"1080p"` |
| `safety_tolerance` | enum |  | `"4"` | The safety tolerance level for content moderation. 1 is the most strict (blocks most content), 6 is the least strict. Values: `"1"` \| `"2"` \| `"3"` \| `"4"` \| `"5"` \| `"6"` |
| `seed` | integer |  | — | The seed for the random number generator |

**Output**

Saved to your project drive at `drive_path`; a signed `output_url` (TTL ~1h) is returned.

**Example**

```python
result = media.run(
    "google/veo-3-i2v",
    prompt="...",
    image_url="https://...",
    duration="8s",
)
```

### `google/veo-3-fast-t2v`

**Veo 3 Fast — text→video**

**Pricing**

| Variant | Price |
|---|---|
| audio off | $0.300/s |
| audio on | $0.480/s |

**Inputs**

| Field | Type | Required | Default | Notes |
|---|---|:---:|---|---|
| `prompt` | string | ✓ | — | The text prompt describing the video you want to generate. (max 20,000 chars) |
| `aspect_ratio` | enum |  | `"16:9"` | The aspect ratio of the generated video. Values: `"16:9"` \| `"9:16"` |
| `auto_fix` | boolean |  | `true` | Whether to automatically attempt to fix prompts that fail content policy or other validation checks by rewriting them |
| `duration` | enum |  | `"8s"` | The duration of the generated video. Values: `"4s"` \| `"6s"` \| `"8s"` |
| `generate_audio` | boolean |  | `true` | Whether to generate audio for the video |
| `negative_prompt` | string |  | — | A negative prompt to guide the video generation |
| `resolution` | enum |  | `"720p"` | The resolution of the generated video. Values: `"720p"` \| `"1080p"` |
| `safety_tolerance` | enum |  | `"4"` | The safety tolerance level for content moderation. 1 is the most strict (blocks most content), 6 is the least strict. Values: `"1"` \| `"2"` \| `"3"` \| `"4"` \| `"5"` \| `"6"` |
| `seed` | integer |  | — | The seed for the random number generator |

**Output**

Saved to your project drive at `drive_path`; a signed `output_url` (TTL ~1h) is returned.

**Example**

```python
result = media.run(
    "google/veo-3-fast-t2v",
    prompt="...",
    duration="8s",
)
```

### `google/veo-3-fast-i2v`

**Veo 3 Fast — image→video**

**Pricing**

| Variant | Price |
|---|---|
| audio off | $0.300/s |
| audio on | $0.480/s |

**Inputs**

| Field | Type | Required | Default | Notes |
|---|---|:---:|---|---|
| `image_url` | string | ✓ | — | URL of the input image to animate. Should be 720p or higher resolution in 16:9 or 9:16 aspect ratio. If the image is not in 16:9 or 9:16 aspect ratio, it will be cropped to fit |
| `prompt` | string | ✓ | — | The text prompt describing how the image should be animated. (max 20,000 chars) |
| `aspect_ratio` | enum |  | `"auto"` | The aspect ratio of the generated video. Values: `"auto"` \| `"16:9"` \| `"9:16"` |
| `auto_fix` | boolean |  | `false` | Whether to automatically attempt to fix prompts that fail content policy or other validation checks by rewriting them |
| `duration` | enum |  | `"8s"` | The duration of the generated video. Values: `"4s"` \| `"6s"` \| `"8s"` |
| `generate_audio` | boolean |  | `true` | Whether to generate audio for the video |
| `negative_prompt` | string |  | — | A negative prompt to guide the video generation |
| `resolution` | enum |  | `"720p"` | The resolution of the generated video. Values: `"720p"` \| `"1080p"` |
| `safety_tolerance` | enum |  | `"4"` | The safety tolerance level for content moderation. 1 is the most strict (blocks most content), 6 is the least strict. Values: `"1"` \| `"2"` \| `"3"` \| `"4"` \| `"5"` \| `"6"` |
| `seed` | integer |  | — | The seed for the random number generator |

**Output**

Saved to your project drive at `drive_path`; a signed `output_url` (TTL ~1h) is returned.

**Example**

```python
result = media.run(
    "google/veo-3-fast-i2v",
    prompt="...",
    image_url="https://...",
    duration="8s",
)
```

---

# Agent tools reference

Source: https://puras.co/docs/agent-tools-reference
Category: Reference

> Per-tool spec for every built-in tool the skill agent sees at runtime (bash, file_read, media, web_*). Auto-generated from the worker tool specs.

Every agentic skill (one with a markdown entrypoint) runs an Anthropic Messages loop with this set of built-in tools available, plus any user tools the skill declares under `tools:` and the auto-injected `set_output` tool when an `output_schema` is set.

The built-ins are platform-provided — you don't declare them in `skill.yaml`. Only `bash` can be turned off (via `disable_bash: true`); the rest are always on.

For when the agent should reach for which tool — and for the broader attachment model — see [[agent-attachments]]. For media model slugs and pricing, see [[media-models-reference]]. For the deterministic-skill equivalents (`puras.media.run`, etc.), see [[sdk-media]] and [[inputs-and-drive]].

## Index

| Tool | Kind |
|---|---|
| `bash` | shell |
| `media` | generate |
| `web_search` | web |
| `image_search` | web |
| `web_fetch` | web |
| `file_read` | file → context |
| `download_url` | url → drive |
| `set_output` | lifecycle |

---

## Built-in tools

### `bash`

Run a shell command in the job's working directory. The current dir contains a `drive/` folder for files that should persist across jobs (synced to project storage). Anything written elsewhere is ephemeral. Returns combined stdout+stderr (last 8KB) and the exit code.

**Inputs**

| Field | Type | Required | Default | Notes |
|---|---|:---:|---|---|
| `command` | string | ✓ | — | Shell command to execute |
| `timeout` | integer |  | — | Max seconds (default 60, hard ceiling 600) |

**Environment**

- `cwd` = the job's working directory. A `drive/` subdirectory is synced to project storage — anything written there persists across jobs; everything else is ephemeral.
- `$PURAS_DEPLOYMENT_ROOT` points at the deployment bundle.
- `$PYTHONPATH` includes the deployment root and the workdir, so `python -c "import <your_module>"` works against bundled code.
- Project secrets are injected as env vars (see [[concepts]]).
- The skill's venv `bin/` is on `$PATH`, so installed CLIs work.

**Output**

Combined stdout+stderr, last 8KB only. Use redirection (`> drive/log.txt`) for larger output.

**Disable**

Set `disable_bash: true` in `skill.yaml` to remove `bash` from the agent's tool list — useful for skills that should only call user-defined tools.

### `media`

Generate media (image, video, audio) by calling a registered model. Pass a model slug (e.g. 'openai/gpt-image-2', 'bytedance/seedream-v4-edit', 'kuaishou/kling-v3-i2v', 'bytedance/seedance-2-fast-t2v') and the inputs that model expects — we pass them through unchanged. Returns the drive path where the file was saved plus a fresh signed URL.

**Inputs**

| Field | Type | Required | Default | Notes |
|---|---|:---:|---|---|
| `inputs` | object | ✓ | — | Inputs sent straight to the upstream model — bring whatever the chosen model wants: 'prompt', 'image_url', 'duration', 'aspect_ratio', 'num_images', etc |
| `model` | string | ✓ | — | Registered model slug. Images: 'openai/gpt-image-2' (+ '-edit'), 'bytedance/seedream-v4' (+ '-edit'), 'google/imagen-4', 'kuaishou/kling-v3-image'. Video families with t2v/i2v/r2v + fast variants: 'bytedance/seedance-2-{t2v,i2v,r2v}[-fast]', 'kuaishou/kling-v3-{t2v,i2v}', 'google/veo-3-{t2v,i2v}[-fast]'. Some slugs are input-conditional — e.g. Kling/Veo charge more with `generate_audio: true`; Seedance r2v drops 40% when a video reference is supplied |
| `kind` | enum |  | — | Hint for content-type/extension. Default 'auto'. Values: `"image"` \| `"video"` \| `"audio"` \| `"auto"` |
| `output_path` | string |  | — | Optional drive subpath (e.g. 'renders/spin.mp4') |
| `output_url_path` | string |  | — | Optional jq-style path to the output URL in the upstream response (e.g. 'video.url', 'images[0].url'). Only set this if auto-detect picks the wrong field |

The output file is saved to the project drive and a fresh signed URL is returned. The agent typically passes the `drive_path` back to the user or follows up with `file_read` to look at the result itself.

**See also**

- [[media-models-reference]] — per-slug input schemas.
- [[sdk-media]] — the deterministic-skill equivalent, `puras.media.run(slug, **inputs)`. Same upstream call — just from Python rather than as an agent tool.

### `web_search`

Search the web via the platform's search provider. Returns a list of results with title, url, and a short snippet.

**Inputs**

| Field | Type | Required | Default | Notes |
|---|---|:---:|---|---|
| `query` | string | ✓ | — | Search query |
| `max_results` | integer |  | `5` | Max results to return (1-20, default 5) |

Returns a list of `{title, url, snippet}`. To then load a result's full text, follow up with `web_fetch`.

### `image_search`

Search for images on the web via the platform's search provider. Returns image URLs, thumbnails, dimensions, and source pages.

**Inputs**

| Field | Type | Required | Default | Notes |
|---|---|:---:|---|---|
| `query` | string | ✓ | — | Image search query |
| `max_results` | integer |  | `5` | Max results to return (1-20, default 5) |

Returns image URLs, thumbnails, dimensions, and source pages. To actually look at one of the images, follow up with `download_url` + `file_read` — see [[agent-attachments]] for the canonical search → download → look pattern.

### `web_fetch`

Fetch a web page (HTTP GET) and return its plain text content with scripts/styles stripped. Does NOT execute JavaScript — for SPAs that render client-side this will be mostly empty. Returns the final URL (after redirects), page title, and extracted text (truncated to `max_chars`).

**Inputs**

| Field | Type | Required | Default | Notes |
|---|---|:---:|---|---|
| `url` | string | ✓ | — | The URL to fetch (http:// or https://) |
| `max_chars` | integer |  | `20000` | Max chars of body text to return (500-200000, default 20000) |

**Does NOT execute JavaScript.** For SPAs that render client-side the body will be mostly empty. Returns the final URL (after redirects), the page title, and extracted text truncated to `max_chars`.

### `file_read`

Read one or more files from the project's drive and attach them to the conversation. Images (jpg/png/gif/webp) and PDFs come back as vision/document blocks you can look at directly. Text files (code, markdown, JSON, CSV, etc.) are inlined as text. Use this when you need to actually inspect contents — for listing files, use bash `ls drive/` instead. Hard cap: 5MB per file, 10 files per call.

**Inputs**

| Field | Type | Required | Default | Notes |
|---|---|:---:|---|---|
| `paths` | array<string> | ✓ | — | Drive paths relative to the project drive root (e.g. ['uploads/photo.jpg', 'data/report.pdf']). A leading 'drive/' is accepted and stripped. (min 1 items, max 10 items) |

**Returns** a block list — one labeled header per file, then the content. Images/PDFs become vision/document blocks the model looks at directly; text files are inlined as text.

**Constraints**

- Drive paths only. A leading `drive/` is accepted and stripped. For arbitrary URLs, the agent should `download_url` first, then `file_read`.
- Hard caps: 5 MB per file, 10 paths per call, text inlined up to 100k chars then truncated.
- On non-vision models, image/document files in the path list are skipped with an error in the result; text files still come through.

See [[agent-attachments]] for the broader attachment model (including `inputs.attachments` at submit time).

### `download_url`

Download a file from a URL via plain HTTP GET and save it to the project's drive bucket. Use this for images, PDFs, CSVs, etc. Returns the drive path and a fresh signed URL. Does NOT resolve share links (Google Drive/Dropbox/YouTube) — only direct HTTP(S) URLs work. Hard cap: 50MB per file.

**Inputs**

| Field | Type | Required | Default | Notes |
|---|---|:---:|---|---|
| `path` | string | ✓ | — | Where to save in the drive. Either a full path with filename ('data/report.pdf') or a directory ending in '/' ('downloads/') — in the latter case the filename is inferred from the URL |
| `url` | string | ✓ | — | Direct file URL (http:// or https://) |

Returns the resolved `drive_path` and a fresh signed `output_url` (TTL ~1h).

**Constraints**

- Plain HTTP(S) only. Share links (Google Drive, Dropbox, YouTube) are **not** resolved.
- 50 MB hard cap per file.
- If `path` ends in `/`, the filename is inferred from the URL's last segment.

---

## Lifecycle tool

### `set_output` (auto-injected, conditional)

Record the final structured output for this job. Calling this ends the run. The argument must match the schema below.

**When it's exposed**

Only when the skill declares an `output_schema` in `skill.yaml`. The tool's `input_schema` *is* the skill's `output_schema` verbatim — so the agent gets a strongly-typed slot to fill before the run ends.

**Semantics**

- Must be called **exactly once**.
- Calling it ends the run; any subsequent tool calls in the same model response are ignored.
- If the run ends without `set_output` being called, the job fails with `agent finished without calling set_output`.
- Validation: the input is enforced against the declared `output_schema` before the job's final `output` is recorded.

**Example skill.yaml fragment**

```yaml
entrypoint: SKILL.md
output_schema:
  type: object
  properties:
    caption: { type: string }
  required: [caption]
```

Inside the agent the model then calls `set_output({"caption": "…"})` and the run terminates.

---

# Inputs and drive

Source: https://puras.co/docs/inputs-and-drive
Category: Guides

> How apps send files to skills — upload, public URL, or base64 inline — and how skills read them.

Jobs accept JSON inputs only — there is no multipart job submission. Files come in via one of three shapes, and the SDK normalizes all of them on the skill side so you write the skill once. This applies to both deterministic (`.py`) and agentic (`.md`) skills.

## The three input shapes

A frontend (or any caller with a project API key) can pass an image (or any binary) in three ways. Pick by file size and where the file already lives.

```json
// 1. drive_path — file already uploaded to this project's drive
{ "image": { "drive_path": "uploads/abc123.jpg" } }

// 2. url — public HTTPS URL the worker will fetch
{ "image": { "url": "https://example.com/photo.jpg" } }

// 3. data — base64 or full dataURL, inline in the job inputs
{ "image": { "data": "data:image/jpeg;base64,/9j/4AAQSkZJRg..." } }
```

You can also pass any of these as a bare string (`"https://..."`, `"data:..."`, or a relative drive path like `"uploads/abc.jpg"`); the SDK detects the shape.

Which to use:

- **drive_path** for anything > ~200 KB or anything the user uploads from your app. One round-trip uploads it once and keeps it in project storage (so the same file can feed multiple jobs).
- **url** when the file already lives somewhere fetchable (a CDN, S3 bucket, etc.). The worker downloads it fresh on each job.
- **data** for tiny inline payloads (icons, signatures, generated SVGs). Job inputs land in Postgres as `jsonb`; keep base64 under ~1 MB or you'll bloat the jobs table.

## Uploading from an app

```js
const apiKey = "puras_live_AbCdEfGh.SecretSecretSecretSecretSecre32";

// 1) Upload the file
const fd = new FormData();
fd.append("file", file);                  // a browser File / Blob
// Optional: fd.append("path", "uploads/"); to choose a subfolder
const up = await fetch(`${API_BASE}/v1/drive/upload`, {
  method: "POST",
  headers: { Authorization: `Bearer ${apiKey}` },
  body: fd,
}).then(r => r.json());
// → { drive_path: "uploads/<uuid>.jpg", full_path, signed_url, bytes, content_type }

// 2) Submit a job that references the upload
await fetch(`${API_BASE}/v1/jobs`, {
  method: "POST",
  headers: { Authorization: `Bearer ${apiKey}`, "Content-Type": "application/json" },
  body: JSON.stringify({
    skill: "analyze_image",
    inputs: { image: { drive_path: up.drive_path } },
  }),
});
```

The same submit body shape works whether `analyze_image` is a deterministic (`.py`) or agentic (`.md`) skill — the worker reads the skill's manifest and dispatches accordingly.

## Reading file inputs in a deterministic skill

```python
from puras import load_bytes, load_path

def run(inputs: dict) -> dict:
    # Same call handles drive_path, url, data, or a bare string.
    img_bytes = load_bytes(inputs["image"])

    # Or — get a local filesystem path you can hand to PIL, ffmpeg, etc.
    img_path = load_path(inputs["image"], suffix=".jpg")
    from PIL import Image
    with Image.open(img_path) as im:
        return {"width": im.width, "height": im.height, "format": im.format}
```

The same `load_bytes` / `load_path` helpers are available inside per-skill **tools** declared on an agentic skill — anywhere the worker dispatches a Python callable.

What `load_path` returns:

- `drive_path` inputs → the live symlink under `./drive/...` (lazy read).
- `url` inputs → a temp file downloaded on the spot.
- `data` / base64 inputs → a temp file written from the decoded bytes.

Temp files are cleaned up when the job teardown removes the workdir.

## Reading file inputs in a skill (agentic)

Agentic skills have two ways to see file inputs:

1. **`inputs.attachments`** — submit a list of files alongside the prompt. The worker resolves each one (drive_path / url / base64) and attaches it to the first user message as a vision or document block the model can look at directly. This is the right path for images and PDFs.

2. **`bash` over the drive symlink** — the project's drive is mounted at `./drive/` inside the job. For files the agent only needs to manipulate (resize, transcode, parse) — not visually understand — bash is faster and cheaper than burning vision tokens.

```
bash: file ./drive/uploads/abc123.jpg
bash: convert ./drive/uploads/abc123.jpg -resize 512x512 ./drive/thumbs/abc123.jpg
```

When the skill needs to pull a drive file into the model's context mid-run (e.g. after `download_url` saved a reference image), it calls the platform-provided `file_read` tool. See [[agent-attachments]] for both routes in detail, including the supported MIME types and the model requirement for vision.

## Saving outputs back to the drive

Anything a skill (or a per-skill tool) writes under `./drive/...` persists in the project's drive bucket and survives the job. Return the relative path so the caller can mint a fresh signed URL with `drive_sign(...)` (or `GET /v1/drive/sign?path=...`):

```python
def run(inputs: dict) -> dict:
    img_path = load_path(inputs["image"], suffix=".jpg")
    from PIL import Image
    out_rel = f"thumbs/{img_path.stem}.jpg"
    with Image.open(img_path) as im:
        im.thumbnail((512, 512))
        im.save(f"drive/{out_rel}", "JPEG", quality=85)
    return {"thumb": {"drive_path": out_rel}}
```

The app can then either display the returned `signed_url` directly or call `drive_sign` again later.

## Drive HTTP API (for apps)

All endpoints accept either a project JWT (dashboard) or a project API key (your app).

| Endpoint | What it does |
|---|---|
| `POST /v1/drive/upload` (multipart `file`, optional `path`) | Writes bytes into `<project_id>/<path>`. Auto-generates `uploads/<uuid><ext>` when `path` is empty or a directory. 50 MB cap. Returns `{drive_path, full_path, signed_url, bytes, content_type}`. |
| `GET /v1/drive/list?prefix=...` | Lists direct children (folders + files) under a project subpath. |
| `GET /v1/drive/sign?path=...&ttl=...` | Mints a short-lived signed URL for a file in the drive. |
| `DELETE /v1/drive/object?path=...` | Deletes a file. If `path` ends in `/` or names an extensionless folder, every file underneath is removed recursively. Returns `{deleted}`. |
| `GET /v1/drive/zip?prefix=...` | Streams every file under `prefix` as a single zip download. The archive's root folder matches the prefix's last segment. |
| `GET /v1/drive/origin?path=...` | Looks up which job produced a file. Returns `{job_id, skill_name, tool, created_at}` for media/download outputs, or all-null for files that were uploaded directly. |
| `POST /v1/drive/share?path=...` | Mints a long-lived (10y) signed URL safe to share publicly. Use this for "copy public link" — the URL is effectively permanent and revocable in aggregate by rotating the bucket secret. Returns `{url, expires_at}`. |

For JWT callers the `project_id` is a required query/form field; for API-key callers it's inferred from the key.

## Conventions

- **Always** keep inline base64 small. Anything large should go through `/v1/drive/upload` so the bytes don't sit in `jobs.inputs` forever.
- **Always** use `puras.load_bytes` / `puras.load_path` instead of branching on input shape inside your skill code — it future-proofs you against new input forms.
- **Don't** trust an arbitrary `url` from end-user input without checking who the caller is; the worker will dutifully fetch whatever you point it at (subject to the 50 MB cap and a 60s timeout).
- **Don't** hard-code `<project_id>/` prefixes in skill code. Inside a job, drive lives at `./drive/`; the upload/list/sign endpoints prepend the project prefix for you.

See [[example-project]] for a complete project (deterministic skill + agentic skill + app snippet) that exercises this end-to-end. See [[sdk-media]] for generating *new* media.

---

# Agent attachments

Source: https://puras.co/docs/agent-attachments
Category: Guides

> How to feed images, PDFs, and text files to an agentic skill — via inputs.attachments at submit time, or the file_read tool mid-run.

Agentic skills can see images and documents — not just text. There are two routes:

1. **`inputs.attachments`** — submit files together with the prompt. They become part of the first user message the model sees.
2. **`file_read` tool** — the agent calls it mid-run to attach drive files to its own context.

Both routes produce the same content blocks (`image` / `document` / `text`) for the model. Models without vision/document support fail the job upfront with a clear error.

## inputs.attachments — submitting files with the prompt

```json
{
  "skill": "ad-creative",
  "inputs": {
    "prompt": "Write a 60-word product blurb from this photo.",
    "attachments": [
      { "drive_path": "uploads/shoe.jpg" },
      { "url": "https://cdn.example.com/spec.pdf" },
      { "base64": "iVBORw0KGgo...", "media_type": "image/png" }
    ]
  }
}
```

(The submit body has no `type` field — agentic vs deterministic is decided by the skill's `entrypoint` in `skill.yaml`.)

Each entry is one of three shapes (same convention as [[inputs-and-drive]]'s function inputs, but as an explicit list so the worker knows to attach instead of inline-as-text):

- **`drive_path`** — file already in this project's drive. Resolved with path-traversal protection. Best for anything the user uploaded.
- **`url`** — public HTTPS URL. For images and PDFs we pass the URL straight to the model — the worker doesn't download. For text/* we fetch first.
- **`base64`** — raw base64 (no `data:...,` prefix). Set `media_type` explicitly. Best for tiny generated payloads.

Optional `media_type` overrides MIME sniffing on any of the three shapes.

## Supported file types

| Kind | MIME | How the model sees it |
|---|---|---|
| Image | `image/jpeg`, `image/png`, `image/gif`, `image/webp` | Vision block — model literally looks at the pixels. |
| Document | `application/pdf` | PDF block — text + page images together. Supported on the Claude family only. |
| Text | `text/*`, `application/json`, `application/xml`, `application/yaml` | Inlined as a text block (utf-8). |

Anything else is rejected with a clear error. Hard limit: **5 MB per file**. Text files are inlined up to 100k characters, then truncated.

## file_read — letting the agent pull files into its own context

When the agent needs to look at something it didn't get up front (or wants to inspect a file it just wrote), it calls `file_read`:

```jsonc
file_read({
  "paths": ["uploads/photo.jpg", "docs/spec.md", "renders/output.pdf"]
})
```

The tool result is a block list — one labeled header per file, then the content:

```
=== uploads/photo.jpg (image/jpeg, 234.1KB) ===
<image attached, model sees it>

=== docs/spec.md (text/markdown, 12.4KB) ===
# Product spec
...

=== renders/output.pdf (application/pdf, 1.2MB) ===
<document attached, model sees it>
```

Constraints:

- **Drive paths only.** A leading `drive/` is accepted and stripped. For arbitrary URLs, the agent should call `download_url` first, then `file_read`.
- **Max 10 paths per call**, same 5 MB / 100k-char per-file limit as `inputs.attachments`.
- On non-vision models, image/document files in the path list are skipped with an error in the result; text files still come through.

`file_read` is always exposed — you don't declare it in `skill.yaml`. It joins the platform-provided agent tools (`bash`, `media`, `web_search`, `image_search`, `web_fetch`, `download_url`). See [[mcp-tools]] for tools an *MCP client* invokes, not what the skill agent has at runtime.

## Worked example — image-in / text-out skill

`SKILL.md`:

```markdown
You are a product copywriter. The user will attach a single product photo.
Look at the photo and write a 60-word marketing blurb. Reply with just the blurb.
```

App submits:

```js
const up = await uploadFile(file);   // → { drive_path: "uploads/<uuid>.jpg" }
await fetch(`${API_BASE}/v1/jobs`, {
  method: "POST",
  headers: { Authorization: `Bearer ${apiKey}`, "Content-Type": "application/json" },
  body: JSON.stringify({
    skill: "product-copywriter",
    inputs: {
      prompt: "Write the blurb.",
      attachments: [{ drive_path: up.drive_path }],
    },
  }),
});
```

The agent's first message arrives as a multimodal block list (`text` + `image`). No `bash` cat, no function tool. The model sees the photo directly and replies.

## Mid-run pattern — search, download, look

When the agent needs to find an image online and then study it:

```
1. image_search("vintage red bicycle")          → list of URLs
2. download_url(url, path="research/ref.jpg")   → saved to drive
3. file_read(paths=["research/ref.jpg"])        → attached to context
4. (model now reasons over the actual image)
```

This is the canonical way to bring external visual material into the agent's working memory.

## Choosing a model

Vision/PDF features require a vision-capable model. The platform fails the job upfront with a clear error when the chosen model can't process attached images or documents. Safe defaults:

- `claude/opus-4-7` — supports both images and PDFs.
- `claude/sonnet-4-7` — supports both.
- `gpt/4o`, `gemini/2.5-pro` — images only (no native PDF). The job fails if a PDF is attached.

Set the model in the skill's `skill.yaml`:

```yaml
# skills/product-copywriter/skill.yaml
description: Write product blurbs from a photo.
entrypoint: SKILL.md
model: claude/sonnet-4-7
input_schema: { ... }
output_schema: { ... }
```

## Conventions

- **Use `drive_path` for anything > ~200 KB.** Inline base64 lives in `jobs.inputs` forever (Postgres `jsonb`); large blobs there bloat the table.
- **Prefer `inputs.attachments` over `file_read`** when the file is known at submit time. The model sees it without burning a tool round-trip.
- **Use `file_read` for files the agent decides to look at**, like outputs of `download_url`, files the user dropped into the drive between jobs, or one of several candidates picked at runtime.
- **Don't paste image URLs into the prompt text** expecting the model to "browse" them. Models don't auto-fetch — the URL is just text. Put it in `attachments` or call `download_url` + `file_read`.
- **Don't pass a PDF to a non-Claude model.** GPT and Gemini slugs accept images but not documents; convert pages to images upstream if you need them.

See [[inputs-and-drive]] for the deterministic-skill side of file handling (how a Python skill or a per-skill tool reads the same input shapes), and [[example-project]] for a complete starter project.

---

# Playground schema conventions

Source: https://puras.co/docs/playground
Category: Guides

> How input_schema and output_schema drive auto-generated UI forms (contentMediaType + x-puras extensions).

A "playground" is any UI that renders a form for a skill's inputs from its
declared `input_schema` and shows the resulting `output_schema`. The dashboard
ships one; you can build your own with the same key.

A bare `type: string` carries no UX hint. To tell a playground "this is an
image upload" or "this should be a dropdown," skill authors add two layers of
metadata that JSON Schema validators silently ignore:

1. **`contentMediaType`** — standard JSON Schema (Draft 2020-12). Tells the
   playground what kind of content the string carries.
2. **`x-puras`** — a Puras-specific extension block with richer widget hints
   (`x-*` keys are reserved for non-validating extensions in JSON Schema).

Skills don't need any of this to *run* — the worker only enforces the type
constraints. The metadata is purely for the rendering layer.

## The `x-puras` block

```yaml
x-puras:
  widget: image           # see widget vocabulary below
  accept: ["image/jpeg", "image/png", "image/webp"]   # widget-specific
  max_size_mb: 5
  upload: drive           # how the playground should hand the file back
  placeholder: "drag photo here"
  help: "tam boy, net bir fotoğraf önerilir"
```

All fields are optional. Unknown fields are ignored by the playground (so
you can prototype your own).

## Widget vocabulary

| `widget` | What the playground renders | Schema shape it pairs with |
|---|---|---|
| `image` | Drop-zone + URL field + preview | `string` (URL) **or** the polymorphic file object |
| `video` | Same as image, video preview | `string` (URL) or file object |
| `audio` | File picker, audio preview | `string` (URL) or file object |
| `file` | Generic file picker | `string` (URL) or file object |
| `drive_file` | Picker over the project's drive | `string` (drive_path) |
| `attachments` | Multi-file drop-zone | `array` of file objects |
| `text` | Single-line `<input>` | `string` |
| `long_text` | `<textarea>` | `string`, usually `maxLength >= 500` |
| `code` | Monospace editor (lang from `x-puras.lang`) | `string` |
| `select` | Dropdown | `string` with `enum` |
| `multiselect` | Multi-select | `array` of `string` with `items.enum` |
| `switch` | Toggle | `boolean` |
| `slider` | Range slider | `number` with `minimum` + `maximum` |

Default widget if unset:

- `boolean` → `switch`
- `string` with `enum` → `select`
- `string` with `contentMediaType` matching `image/*` → `image`
- `string` with `contentMediaType` matching `video/*` → `video`
- `string` with `contentMediaType` matching `audio/*` → `audio`
- `string` with `maxLength >= 500` → `long_text`
- `array` of strings with `items.enum` → `multiselect`
- `array` of file objects → `attachments`
- everything else `string` → `text`

## File inputs — two shapes

A skill that needs a binary input can declare it two ways:

### A. As a URL string (simplest)

```yaml
user_image_url:
  type: string
  format: uri
  contentMediaType: image/*
  x-puras:
    widget: image
    upload: drive
    accept: ["image/jpeg", "image/png", "image/webp"]
    max_size_mb: 5
  description: User's photo URL (HTTPS).
```

The playground uploads to `POST /v1/drive/upload`, takes the response's
`signed_url`, and passes that string into `inputs.user_image_url`. The skill
just sees a URL.

### B. As a polymorphic file object (puras-native)

```yaml
user_image:
  type: object
  x-puras:
    widget: image
    accept: ["image/jpeg", "image/png", "image/webp"]
  oneOf:
    - { required: [drive_path], properties: { drive_path: {type: string} } }
    - { required: [url],        properties: { url: {type: string, format: uri} } }
    - { required: [base64],     properties: { base64: {type: string}, media_type: {type: string} } }
```

The skill reads it with `puras.load_bytes(inputs["user_image"])` or
`puras.load_path(...)` — these helpers accept all three shapes transparently.
This form is more flexible (frontend can send a drive_path, an external URL,
or inline base64) at the cost of more verbose validation.

Pick **A** when the skill itself only needs a URL (e.g. handing off to another model).
Pick **B** when the skill code wants raw bytes or a local path.

## `upload` modes

For widgets that produce a file (`image`, `video`, `audio`, `file`):

- `drive` (default) — playground POSTs to `/v1/drive/upload` and sends the
  signed URL back into the input. Persists across jobs; recommended for
  anything > 200 KB.
- `inline` — playground base64-encodes and sends inline. Use for tiny
  payloads only (job inputs live forever in Postgres `jsonb`).
- `url-only` — no upload widget; user pastes a URL. For skills that should
  only consume public URLs.

## Long text and code

```yaml
prompt:
  type: string
  minLength: 1
  maxLength: 4000
  x-puras:
    widget: long_text
    placeholder: "Tell the agent what to do"
```

```yaml
sql:
  type: string
  x-puras:
    widget: code
    lang: sql
```

## Enums

`enum` alone gives you a select. Add labels with `x-puras.options`:

```yaml
tone:
  type: string
  enum: [playful, bold, trustworthy]
  x-puras:
    widget: select
    options:
      - { value: playful, label: "🎉 Playful" }
      - { value: bold, label: "🔥 Bold" }
      - { value: trustworthy, label: "🛡 Trustworthy" }
```

## Arrays

For an array of attachments (e.g. agentic skill that accepts vision inputs):

```yaml
attachments:
  type: array
  minItems: 1
  maxItems: 4
  x-puras:
    widget: attachments
    accept: ["image/jpeg", "image/png", "image/webp", "application/pdf"]
    max_size_mb: 5
  items:
    type: object
    properties:
      drive_path: { type: string }
      url:        { type: string, format: uri }
      base64:     { type: string }
      media_type: { type: string }
    additionalProperties: true
```

Playground renders one drop-zone that accepts multiple files and submits
them as the array.

## Output rendering

`output_schema` doesn't get widget hints — playgrounds render it as
read-only structured data. But the same `contentMediaType` convention
applies: an output field with `contentMediaType: image/*` should be
rendered as an `<img>`, audio as `<audio>`, etc. A playground can also
honor `x-puras.widget` on outputs if you want a specific renderer (e.g.
`widget: code` to syntax-highlight a returned string).

## What playgrounds MUST NOT do

- Hide a field because it has no `x-puras` block — fall back to defaults.
- Reject unknown `widget` values — render a placeholder warning, keep
  the form usable.
- Strip `x-puras` from the request body. Send `inputs` exactly as the
  schema describes; the worker only cares about validated values.

See [[concepts]] for how skills, inputs, and outputs are stored. See the
shipped [[example-project]] for a working skill that uses these hints.

---