Puras SDK Runtime Reference

These helpers run inside a deployed skill — a deterministic skill's entrypoint or an agentic skill's Python tool. The worker injects the job context (workspace, drive, billing, service token), so there is no API key: calls are billed to the skill's workspace automatically. To call a deployed skill from your own app instead, see the SDK Client Reference.

For the per-model input schemas, see the Media model reference; this page is the generated API surface.

`media` — create media + transcribe speech

`media.generate_image(prompt: 'str', *, model: 'str' = 'auto', refs: 'list[str] | None' = None, aspect_ratio: 'str | None' = None, resolution: 'str | None' = None, n: 'int | None' = None, output_path: 'str | None' = None) -> 'dict[str, Any]'`

Generate an image from a prompt — model-portable.

Pass refs (image URLs) to run an edit/compose instead of text-to-image. model is a family (google/nano-banana, openai/gpt-image, bytedance/seedream, google/imagen, …) or "auto"; the platform picks the concrete model and adapts the inputs. Returns the same shape as run.

To bind part of the prompt to a specific reference, write @Image1, @Image2, … (1-indexed, in refs order). The platform normalizes them per model — kept verbatim for models that read them, rewritten to prose ("the first reference image") for the rest — so the same prompt is portable.

Args: prompt: what to draw / how to edit. Use @Image1/@Image2 to point at a specific refs entry. refs: optional reference image URLs → edit mode (must be URLs, not drive paths — sign drive paths with puras.drive.url first). Address them in the prompt as @Image1, @Image2, … in this order. aspect_ratio: e.g. "1:1", "16:9", "9:16". resolution: "1K" | "2K" | "4K" (honored where the model supports it). n: number of images.

`media.generate_video(prompt: 'str' = '', *, model: 'str' = 'auto', image: 'str | None' = None, last_frame: 'str | None' = None, refs: 'list[str] | None' = None, lipsync_audio: 'str | None' = None, duration: 'int | None' = None, aspect_ratio: 'str | None' = None, resolution: 'str | None' = None, audio: 'bool' = False, output_path: 'str | None' = None) -> 'dict[str, Any]'`

Generate a video — model-portable. The inputs you pass fix the mode:

lipsync_audio (+ image) → lip-synced talking head (avatar);
refs → reference-to-video (keep referenced subjects on-model);
image → image-to-video (animate a still);
neither → text-to-video.

model is a family (bytedance/seedance, kuaishou/kling, google/veo, bytedance/seedance-fast, google/veo-fast) or "auto". If the chosen family can't do the inferred mode you get a clear error (e.g. only Kling does lip-sync). Reference/image inputs must be URLs (sign drive paths first). Returns the same shape as run.

Reference images: bind a description to one with @Image1, @Image2, … (1-indexed, in refs order). Portable — kept verbatim for models that read the tags (Seedance / Kling r2v), rewritten to prose for the rest (Veo).

Native audio: with audio=True the chosen model voices dialogue, SFX and ambience in the SAME pass (all families support it). Write a spoken line in the prompt as Speaker says: "<line>" (a colon, not quotes around the speaker) and append (no subtitles) so it isn't stamped on the frame. Only reach for lipsync_audio for a fixed verbatim read — it looks static.

Args: prompt: the scene/action (a short delivery note for lip-sync). Label multi-shot beats Shot 1: / Cut to:; use @Image1/@Image2 for refs. image: a still to animate (i2v) or the portrait (lip-sync). last_frame: optional URL of a still the clip should END on (tail keyframe). Pair with image for first→last interpolation, or use alone to land a t2v/r2v clip on a set frame (e.g. an end card). Seedance / Kling only; ignored with a warning on other families. refs: reference image URLs (r2v); address them as @Image1, @Image2, …. lipsync_audio: narration audio URL → lip-sync mode. duration: seconds; snapped to the model's allowed set (e.g. Veo 4/6/8) with a warning. Ignored for lip-sync (length follows the audio). aspect_ratio / resolution: honored where the model supports them. audio: generate native audio (t2v/i2v/r2v) — dialogue + SFX in one pass.

`media.generate_audio(text: 'str', *, model: 'str' = 'auto', voice: 'str | None' = None, language: 'str | None' = None, stability: 'float | None' = None, similarity_boost: 'float | None' = None, style: 'float | None' = None, speed: 'float | None' = None, output_path: 'str | None' = None) -> 'dict[str, Any]'`

Generate speech from text (text-to-speech) — model-portable.

model is a family (elevenlabs/tts) or "auto", both of which resolve to ElevenLabs v3 — the expressive tier that reads inline audio tags. Wrap a delivery cue in square brackets and v3 acts it out without speaking it: "[excited] It's finally here! [whispers] Don't tell anyone." Common tags: [excited] [sad] [angry] [whispers] [laughs] [sighs] [sarcastic] [British accent]. Lower stability (~0.3) makes tags land harder; near 1.0 they stop landing.

v3 shapes delivery through audio tags + stability only. The style / speed / similarity_boost knobs are v2-only and are ignored on v3 — pass model="elevenlabs/tts-v2" if you need them (you then lose audio-tag support). Returns the same shape as run.

Args: text: the words to speak (billed per character). May contain v3 audio tags in [brackets]. voice: a voice name/persona (e.g. "Aria", "Roger") or a voice id. language: ISO 639-1 code to lock pronunciation; omit to auto-detect. stability: 0–1. Lower = more expressive / tag-responsive, higher = steadier. The primary v3 delivery knob. similarity_boost / style / speed: extra delivery controls — v2 only (ignored on the default v3 model).

`media.transcribe(audio: 'str', *, keyterms: 'list[str] | None' = None, language_code: 'str | None' = None, model: 'str' = 'elevenlabs/scribe-v2') -> 'dict[str, Any]'`

Transcribe speech to text. Returns {text, words, language_code, ...} — words carries per-word start/end seconds. There is NO file (drive_path is empty). Billed per second of audio.

Args: audio: an audio URL the model can fetch (https:// or a data: URI). keyterms: optional brand / proper-noun spellings to bias transcription. language_code: optional ISO 639-1 hint; omit to auto-detect.

`drive` — mint a URL for a drive file

`drive.url(path: 'str', *, ttl: 'int' = 3600) -> 'str'`

Convenience wrapper around sign that returns just the signed URL.

`drive.sign(path: 'str', *, ttl: 'int' = 3600) -> 'dict'`

Sign a drive file and return the full {path, signed_url, expires_in}.

path is workspace-relative (e.g. 'media/abc.png'); a leading '<workspace_id>/' is accepted and kept as-is. ttl is seconds until the URL expires (30 .. 30 days, default 1h).

`secret` — read a skillpack secret

`secret(name: 'str', default: 'str | None' = None) -> 'str'`

Read a skillpack secret by NAME. Skillpack secrets are injected as env vars into your function subprocess at run time (see Secrets tab in the dashboard). They travel with the skillpack code — when another workspace runs your public skillpack, your secrets are still what the skill sees.

Raises SecretError if missing and no default is provided.

`inputs` — resolve file inputs to local bytes/paths

`load_bytes(value: 'Any') -> 'bytes'`

Resolve an input value to raw bytes.

Accepts dicts ({drive_path|url|data: ...}) or bare strings (URL, dataURL, base64, or drive path).

`load_path(value: 'Any', *, suffix: 'str | None' = None) -> 'Path'`

Resolve an input value to a local filesystem path.

For drive_path inputs we return the live symlink path so the caller can read it lazily. For URL / base64 / dataURL inputs we download/decode to a temp file and return that. The temp file is left on disk for the duration of the job — the workdir is cleaned up on job teardown.

`subagent` — run an isolated subagent

`subagent.run(target: 'str | None' = None, inputs: 'dict[str, Any] | None' = None, *, prompt: 'str | None' = None, version: 'int | None' = None, timeout: 'int' = 600) -> 'Any'`

Run a subagent synchronously and return its output.

Pass exactly one of target or prompt.

Args: target: what to run — - "references/foo.md" (any bundle-relative *.md path) — run that markdown file as an isolated subagent in the caller's own skillpack. No manifest entry or schema needed; the path is relative to the caller's skill directory. Inputs are passed verbatim and any file-shaped values (URLs / drive paths) are staged for the child. - "skill_name" (same skillpack), - "skillpack_slug/skill_name" (another skillpack in the caller's workspace), or - "workspace_slug/skillpack_slug/skill_name" (a public skillpack in another workspace). inputs: dict passed straight to the subagent. For a declared skill it is validated against the skill's input_schema before the job is queued; for a .md or inline-prompt subagent it is passed verbatim. prompt: an inline system prompt to run as a one-off subagent (mutually exclusive with target). The subagent runs in the caller's bundle context with the built-in tools and a free-form set_output. version: pin a declared skill target to a specific deployment version of its skillpack (e.g. 3). Omit to use the active deployment. Only valid for a skill-ref target — passing it with an inline prompt or a *.md path is an error (those always run against the caller's own deployment). timeout: max seconds to wait for the child to reach a terminal state. The platform also enforces its own per-job timeouts; this is just how long the calling side will block.

Returns: The child's result value. For deterministic skills that's whatever their :run function returned. For agentic / ad-hoc / inline subagents it's whatever was passed to set_output.

Raises: SubagentRunError: child failed, was cancelled, or didn't finish in time. ValueError: neither or both of target / prompt were given.

`SubagentRunError`

Raised when a subagent run ended in failed or cancelled status, or did not reach a terminal status within timeout seconds.

Attributes: job_id: The child job's id, useful for fetching events / logs. status: One of failed, cancelled, queued, running. message: Human-readable error from the child (or a timeout note).

media — create media + transcribe speech

media.generate_image(prompt: 'str', *, model: 'str' = 'auto', refs: 'list[str] | None' = None, aspect_ratio: 'str | None' = None, resolution: 'str | None' = None, n: 'int | None' = None, output_path: 'str | None' = None) -> 'dict[str, Any]'

media.transcribe(audio: 'str', *, keyterms: 'list[str] | None' = None, language_code: 'str | None' = None, model: 'str' = 'elevenlabs/scribe-v2') -> 'dict[str, Any]'

drive — mint a URL for a drive file

drive.url(path: 'str', *, ttl: 'int' = 3600) -> 'str'

drive.sign(path: 'str', *, ttl: 'int' = 3600) -> 'dict'

secret — read a skillpack secret

secret(name: 'str', default: 'str | None' = None) -> 'str'

inputs — resolve file inputs to local bytes/paths

load_bytes(value: 'Any') -> 'bytes'

load_path(value: 'Any', *, suffix: 'str | None' = None) -> 'Path'

subagent — run an isolated subagent

subagent.run(target: 'str | None' = None, inputs: 'dict[str, Any] | None' = None, *, prompt: 'str | None' = None, version: 'int | None' = None, timeout: 'int' = 600) -> 'Any'

SubagentRunError