Example project

A minimal but realistic project: an app uploads an image, a deterministic skill returns its dimensions and a downscaled thumbnail, and an agentic skill writes a one-line caption by looking at the photo directly via vision. Use this as the shape to copy when you start a new Puras project — push it as-is and it works.

Layout

image-tools/
  requirements.txt              # optional; worker pip-installs into a per-deployment venv
  skills/
    image-info/                 # deterministic skill (Python entrypoint)
      skill.yaml
      main.py
    caption/                    # agentic skill (markdown entrypoint = system prompt)
      skill.yaml
      SKILL.md

No root manifest. Each skills/<name>/skill.yaml is auto-discovered; the directory name is the skill name.

requirements.txt

pillow>=10

(The worker reads requirements.txt from the bundle root and installs it into the deployment's venv before any job runs. Skip the file if your skills only need the stdlib + puras.)

skills/image-info/skill.yaml

yaml

description: Return image dimensions and write a 512px thumbnail back into the drive.
entrypoint: main.py:run
input_schema:
  type: object
  properties:
    image:
      oneOf:
        - { type: string }
        - { type: object }
  required: [image]
output_schema:
  type: object
  properties:
    width:  { type: integer }
    height: { type: integer }
    format: { type: string }
    thumb:
      type: object
      properties:
        drive_path: { type: string }
      required: [drive_path]
  required: [width, height, format, thumb]

The .py:func entrypoint tells the worker this skill is deterministic — no LLM in the loop. The function runs in an isolated subprocess.

skills/image-info/main.py

python

"""Return image dimensions and write a 512px thumbnail back into the drive."""
from PIL import Image

from puras import load_path


def run(inputs: dict) -> dict:
    src = load_path(inputs["image"], suffix=".img")
    with Image.open(src) as im:
        width, height, fmt = im.width, im.height, im.format

        im.thumbnail((512, 512))
        out_rel = f"thumbs/{src.stem}.jpg"
        im.convert("RGB").save(f"drive/{out_rel}", "JPEG", quality=85)

    return {
        "width": width,
        "height": height,
        "format": fmt,
        "thumb": {"drive_path": out_rel},
    }

The skill works whether the caller sent {image: {drive_path}}, {image: {url}}, {image: {data: "data:..."}}, or a bare string. load_path does the routing; you write the file logic once. See inputs-and-drive for the full taxonomy.

skills/caption/skill.yaml

yaml

description: Write a short caption for a product photo.
entrypoint: SKILL.md
model: claude/sonnet-4-7
input_schema:
  type: object
  properties:
    prompt: { type: string }
    attachments:
      type: array
      items: { type: object }
      minItems: 1
      maxItems: 1
  required: [attachments]
output_schema:
  type: object
  properties:
    caption: { type: string }
  required: [caption]

The .md entrypoint tells the worker this skill is agentic — the file's contents become the system prompt and the LLM tool-use loop runs. Because output_schema is set, the agent gets an auto-injected set_output tool and must call it once with { "caption": "..." } to finish.

skills/caption/SKILL.md

markdown

You write short, punchy captions for product photos.

You'll receive the photo as an attachment in the first user message (you can
see it directly — no tool calls needed to "open" it) and a `tone` hint in the
prompt text (e.g. "playful", "minimal", "luxury"). Default tone is "minimal".

Reply with exactly one caption, 8–14 words, in the requested tone. No
preamble, no quotes, no markdown.

That's the whole skill. The agent natively sees the image via the attachments mechanism — see agent-attachments for the wire format and the supported file types.

Pushing it

push(project_dir="/abs/path/to/image-tools", notes="starter", activate=true)

Calling it from an app

const API_BASE = "https://puras-api.fly.dev";
const KEY = "puras_live_AbCdEfGh.SecretSecretSecretSecretSecre32";

async function analyze(file /* a browser File */) {
  // 1) Upload once.
  const fd = new FormData(); fd.append("file", file);
  const up = await fetch(`${API_BASE}/v1/drive/upload`, {
    method: "POST",
    headers: { Authorization: `Bearer ${KEY}` },
    body: fd,
  }).then(r => r.json());
  // up: { drive_path, full_path, signed_url, bytes, content_type }

  // 2) Fast deterministic skill — wait inline.
  const info = await fetch(`${API_BASE}/v1/jobs?wait=true&timeout=20`, {
    method: "POST",
    headers: { Authorization: `Bearer ${KEY}`, "Content-Type": "application/json" },
    body: JSON.stringify({
      skill: "image-info",
      inputs: { image: { drive_path: up.drive_path } },
    }),
  }).then(r => r.json());
  // info.result → { width, height, format, thumb: { drive_path } }

  // 3) Kick off the agentic caption skill — attach the photo natively.
  const cap = await fetch(`${API_BASE}/v1/jobs`, {
    method: "POST",
    headers: { Authorization: `Bearer ${KEY}`, "Content-Type": "application/json" },
    body: JSON.stringify({
      skill: "caption",
      inputs: {
        prompt: "Tone: playful. Write the caption.",
        attachments: [{ drive_path: up.drive_path }],
      },
    }),
  }).then(r => r.json());
  // Poll GET /v1/jobs/{cap.id} or stream GET /v1/jobs/{cap.id}/stream

  return { info: info.result, captionJobId: cap.id };
}

Same submit body shape for both skills — no type field, no distinction at the call site. The worker reads each skill's entrypoint and dispatches accordingly.

To display the thumbnail in the UI, mint a signed URL for it:

GET /v1/drive/sign?path=<thumb-drive-path>&ttl=3600

What this project demonstrates

Two skill styles, one API — a deterministic Python skill for fast structured work, an agentic skill for multi-step LLM work, both submitted with the same POST /v1/jobs shape.
One source of truth for files — the upload happens once; both skills reference it by drive_path.
Polymorphic file inputs in a deterministic skill — the same code works for uploads, URLs, and inline base64 (inputs-and-drive).
Native vision in an agentic skill — the agent looks at the image directly via inputs.attachments; no bash cat, no media tool round-trip, no manual URL signing (agent-attachments).

Where to take it next

Add a tools: list to caption/skill.yaml if you want the agent to call your own Python helpers mid-run (e.g. a query_dimensions tool that wraps the same Pillow logic).
Add more skills under skills/ for follow-up ops (crop, OCR, classify) — agentic skills can chain them via tool-use.
Set project secrets (set_secret) for any third-party keys your skill code needs; they're injected as env vars at run time.