Puras SDK Runtime Reference
Auto-generated reference for the in-skill puras runtime (media, secret, inputs, skills).
These helpers run inside a deployed skill — a deterministic skill's entrypoint or an agentic skill's Python tool. The worker injects the job context (workspace, drive, billing, service token), so there is no API key: calls are billed to the skill's workspace automatically. To call a deployed skill from your own app instead, see the SDK Client Reference.
For the per-model input schemas, see the Media model reference; this page is the generated API surface.
media — create media + transcribe speech
media.generate_image(prompt: 'str', *, model: 'str' = 'auto', refs: 'list[str] | None' = None, aspect_ratio: 'str | None' = None, resolution: 'str | None' = None, n: 'int | None' = None, output_path: 'str | None' = None) -> 'dict[str, Any]'
Generate an image from a prompt — model-portable.
Pass refs (image URLs) to run an edit/compose instead of text-to-image.
model is a family (google/nano-banana, openai/gpt-image,
bytedance/seedream, google/imagen, …) or "auto"; the platform picks
the concrete model and adapts the inputs. Returns the same shape as run.
To bind part of the prompt to a specific reference, write @Image1,
@Image2, … (1-indexed, in refs order). The platform normalizes them per
model — kept verbatim for models that read them, rewritten to prose ("the
first reference image") for the rest — so the same prompt is portable.
Args:
prompt: what to draw / how to edit. Use @Image1/@Image2 to point at a
specific refs entry.
refs: optional reference image URLs → edit mode (must be URLs, not drive
paths — sign drive paths with puras.drive.url first). Address them in
the prompt as @Image1, @Image2, … in this order.
aspect_ratio: e.g. "1:1", "16:9", "9:16".
resolution: "1K" | "2K" | "4K" (honored where the model supports it).
n: number of images.
media.generate_video(prompt: 'str' = '', *, model: 'str' = 'auto', image: 'str | None' = None, last_frame: 'str | None' = None, refs: 'list[str] | None' = None, lipsync_audio: 'str | None' = None, duration: 'int | None' = None, aspect_ratio: 'str | None' = None, resolution: 'str | None' = None, audio: 'bool' = False, output_path: 'str | None' = None) -> 'dict[str, Any]'
Generate a video — model-portable. The inputs you pass fix the mode:
lipsync_audio(+image) → lip-synced talking head (avatar);refs→ reference-to-video (keep referenced subjects on-model);image→ image-to-video (animate a still);- neither → text-to-video.
model is a family (bytedance/seedance, kuaishou/kling, google/veo,
bytedance/seedance-fast, google/veo-fast) or "auto". If the chosen
family can't do the inferred mode you get a clear error (e.g. only Kling
does lip-sync). Reference/image inputs must be URLs (sign drive paths
first). Returns the same shape as run.
Reference images: bind a description to one with @Image1, @Image2, …
(1-indexed, in refs order). Portable — kept verbatim for models that read
the tags (Seedance / Kling r2v), rewritten to prose for the rest (Veo).
Native audio: with audio=True the chosen model voices dialogue, SFX and
ambience in the SAME pass (all families support it). Write a spoken line in
the prompt as Speaker says: "<line>" (a colon, not quotes around the
speaker) and append (no subtitles) so it isn't stamped on the frame. Only
reach for lipsync_audio for a fixed verbatim read — it looks static.
Args:
prompt: the scene/action (a short delivery note for lip-sync). Label
multi-shot beats Shot 1: / Cut to:; use @Image1/@Image2 for refs.
image: a still to animate (i2v) or the portrait (lip-sync).
last_frame: optional URL of a still the clip should END on (tail keyframe).
Pair with image for first→last interpolation, or use alone to land a
t2v/r2v clip on a set frame (e.g. an end card). Seedance / Kling only;
ignored with a warning on other families.
refs: reference image URLs (r2v); address them as @Image1, @Image2, ….
lipsync_audio: narration audio URL → lip-sync mode.
duration: seconds; snapped to the model's allowed set (e.g. Veo 4/6/8) with
a warning. Ignored for lip-sync (length follows the audio).
aspect_ratio / resolution: honored where the model supports them.
audio: generate native audio (t2v/i2v/r2v) — dialogue + SFX in one pass.
media.generate_audio(text: 'str', *, model: 'str' = 'auto', voice: 'str | None' = None, language: 'str | None' = None, stability: 'float | None' = None, similarity_boost: 'float | None' = None, style: 'float | None' = None, speed: 'float | None' = None, output_path: 'str | None' = None) -> 'dict[str, Any]'
Generate speech from text (text-to-speech) — model-portable.
model is a family (elevenlabs/tts) or "auto", both of which resolve to
ElevenLabs v3 — the expressive tier that reads inline audio tags.
Wrap a delivery cue in square brackets and v3 acts it out without speaking
it: "[excited] It's finally here! [whispers] Don't tell anyone." Common
tags: [excited] [sad] [angry] [whispers] [laughs] [sighs]
[sarcastic] [British accent]. Lower stability (~0.3) makes tags land
harder; near 1.0 they stop landing.
v3 shapes delivery through audio tags + stability only. The
style / speed / similarity_boost knobs are v2-only and are ignored
on v3 — pass model="elevenlabs/tts-v2" if you need them (you then lose
audio-tag support). Returns the same shape as run.
Args: text: the words to speak (billed per character). May contain v3 audio tags in [brackets]. voice: a voice name/persona (e.g. "Aria", "Roger") or a voice id. language: ISO 639-1 code to lock pronunciation; omit to auto-detect. stability: 0–1. Lower = more expressive / tag-responsive, higher = steadier. The primary v3 delivery knob. similarity_boost / style / speed: extra delivery controls — v2 only (ignored on the default v3 model).
media.transcribe(audio: 'str', *, keyterms: 'list[str] | None' = None, language_code: 'str | None' = None, model: 'str' = 'elevenlabs/scribe-v2') -> 'dict[str, Any]'
Transcribe speech to text. Returns {text, words, language_code, ...} —
words carries per-word start/end seconds. There is NO file
(drive_path is empty). Billed per second of audio.
Args: audio: an audio URL the model can fetch (https:// or a data: URI). keyterms: optional brand / proper-noun spellings to bias transcription. language_code: optional ISO 639-1 hint; omit to auto-detect.
drive — mint a URL for a drive file
drive.url(path: 'str', *, ttl: 'int' = 3600) -> 'str'
Convenience wrapper around sign that returns just the signed URL.
drive.sign(path: 'str', *, ttl: 'int' = 3600) -> 'dict'
Sign a drive file and return the full {path, signed_url, expires_in}.
path is workspace-relative (e.g. 'media/abc.png'); a leading
'<workspace_id>/' is accepted and kept as-is. ttl is seconds until the
URL expires (30 .. 30 days, default 1h).
secret — read a skillpack secret
secret(name: 'str', default: 'str | None' = None) -> 'str'
Read a skillpack secret by NAME. Skillpack secrets are injected as env vars into your function subprocess at run time (see Secrets tab in the dashboard). They travel with the skillpack code — when another workspace runs your public skillpack, your secrets are still what the skill sees.
Raises SecretError if missing and no default is provided.
inputs — resolve file inputs to local bytes/paths
load_bytes(value: 'Any') -> 'bytes'
Resolve an input value to raw bytes.
Accepts dicts ({drive_path|url|data: ...}) or bare strings (URL,
dataURL, base64, or drive path).
load_path(value: 'Any', *, suffix: 'str | None' = None) -> 'Path'
Resolve an input value to a local filesystem path.
For drive_path inputs we return the live symlink path so the caller can
read it lazily. For URL / base64 / dataURL inputs we download/decode to a
temp file and return that. The temp file is left on disk for the duration
of the job — the workdir is cleaned up on job teardown.
subagent — run an isolated subagent
subagent.run(target: 'str | None' = None, inputs: 'dict[str, Any] | None' = None, *, prompt: 'str | None' = None, version: 'int | None' = None, timeout: 'int' = 600) -> 'Any'
Run a subagent synchronously and return its output.
Pass exactly one of target or prompt.
Args:
target: what to run —
- "references/foo.md" (any bundle-relative *.md path) — run that
markdown file as an isolated subagent in the caller's own skillpack.
No manifest entry or schema needed; the path is relative to the
caller's skill directory. Inputs are passed verbatim and any
file-shaped values (URLs / drive paths) are staged for the child.
- "skill_name" (same skillpack),
- "skillpack_slug/skill_name" (another skillpack in the caller's
workspace), or
- "workspace_slug/skillpack_slug/skill_name" (a public skillpack in
another workspace).
inputs: dict passed straight to the subagent. For a declared skill it is
validated against the skill's input_schema before the job is queued;
for a .md or inline-prompt subagent it is passed verbatim.
prompt: an inline system prompt to run as a one-off subagent (mutually
exclusive with target). The subagent runs in the caller's bundle
context with the built-in tools and a free-form set_output.
version: pin a declared skill target to a specific deployment version
of its skillpack (e.g. 3). Omit to use the active deployment. Only
valid for a skill-ref target — passing it with an inline prompt or
a *.md path is an error (those always run against the caller's own
deployment).
timeout: max seconds to wait for the child to reach a terminal state.
The platform also enforces its own per-job timeouts; this is just
how long the calling side will block.
Returns:
The child's result value. For deterministic skills that's whatever
their :run function returned. For agentic / ad-hoc / inline subagents
it's whatever was passed to set_output.
Raises:
SubagentRunError: child failed, was cancelled, or didn't finish in time.
ValueError: neither or both of target / prompt were given.
SubagentRunError
Raised when a subagent run ended in failed or cancelled status,
or did not reach a terminal status within timeout seconds.
Attributes:
job_id: The child job's id, useful for fetching events / logs.
status: One of failed, cancelled, queued, running.
message: Human-readable error from the child (or a timeout note).