Media SDK
media.run() — call a registered media model with any inputs, billed to your project.
There is one knob: media.run(model, inputs). model is one of our registered slugs (e.g. openai/gpt-image-2, kuaishou/kling-v3-i2v). Anything the model accepts as arguments, you pass as inputs — we don't validate input shape, the model does. Cost is debited from your project's credit balance at the registered rate.
Most families ship as multiple slugs — one per model variant (text-to-X, image-to-X, reference-to-X, fast vs. standard). Pick the slug that matches what you're doing; the catalog at GET /v1/pricing lists them all. A handful of slugs also have input-conditional pricing (e.g. audio on/off, with/without a video reference) — the registered rate_table is the source of truth and we bill exactly the row your inputs hit.
The full catalog of slugs and their rates is the pricing page or GET /v1/pricing. Unknown slugs are rejected (400).
Surface
from puras import media, secret
media.run(
model: str,
inputs: dict | None = None,
*,
output_path: str | None = None,
output_url_path: str | None = None,
kind: str = "auto", # "image" | "video" | "audio" | "auto"
**kwargs, # merged into inputs (kwargs win)
) -> dict
secret(name: str) -> str # read a project secret
Returns:
{
"model": "kuaishou/kling-v3-i2v",
"kind": "video", # resolved from "auto" or echoed
"drive_path": "media/12b5e4d5...mp4", # path inside your drive
"output_url": "https://...supabase.../mp4", # signed URL, TTL ~1h
"request_id": "...",
"billed_micros": 672000,
"billed_usd": 0.672,
"meta": {"metrics": {"inference_time": 12.4}, ...},
}
Patterns
Text-to-image
img = media.run(
"openai/gpt-image-2",
{"prompt": "a vintage red bicycle", "size": "1024x1024", "quality": "high"},
)
# Edit / composite reference images
edited = media.run(
"bytedance/seedream-v4-edit",
{"image_url": img["output_url"], "prompt": "give it neon trim"},
)
Image-to-video
vid = media.run(
"bytedance/seedance-2-i2v",
{
"image_url": "https://...",
"prompt": "make it spin slowly",
"duration": 8,
},
output_path="renders/spin.mp4",
)
Reference-to-video (Seedance r2v)
clip = media.run(
"bytedance/seedance-2-r2v",
{
"prompt": "match the style of the reference clip",
"image_urls": ["https://..."],
"video_url": "https://...", # triggers the with-reference rate
"duration": 6,
},
)
Audio + voice control (Kling v3, Veo 3)
# Audio off — cheapest tier
clip = media.run("kuaishou/kling-v3-t2v", prompt="rainy alley", duration=5)
# Audio on — billed at the audio-on per-second rate
clip = media.run(
"google/veo-3-t2v",
prompt="thunder rolling over hills",
duration=4,
generate_audio=True,
)
Fast tiers
Where a model has a fast variant, it's a separate slug (-fast-) at a lower per-second rate:
quick = media.run("bytedance/seedance-2-fast-t2v", prompt="...", duration=5)
quick = media.run("google/veo-3-fast-i2v", image_url="...", duration=4)
A model with an unusual response shape
If we can't find the output URL automatically, point at it with output_url_path (jq-style):
weird = media.run(
"kuaishou/kling-v3-image",
{...},
output_url_path="outputs[0].asset.url",
)
Inside a deterministic skill (or a per-skill tool)
from puras import media
def run(inputs: dict) -> dict:
img = media.run(
"openai/gpt-image-2",
{"prompt": inputs["prompt"]},
)
return {"drive_path": img["drive_path"], "billed_usd": img["billed_usd"]}
Same import works from any Python callable the worker dispatches — a deterministic skill's entrypoint, or one of an agentic skill's declared tools:.
As an agent tool (built-in)
Agentic skills automatically get a media tool exposed to the model (same surface as media.run()). The agent picks a model slug and inputs at runtime — you don't declare it in skill.yaml. See concepts for skill setup; the tools: list on a skill is for your own Python helpers, not for the built-in media tool.
How billing resolves
Every successful call is priced from the registry — there is no live lookup and no fallback. Each slug carries one of:
- per-call (most image models)
- per-second of output (video / audio)
- per-megapixel (some image models)
- input-conditional — a rate table indexed by inputs (audio on/off, with/without video reference, quality × size). The bill is computed from the inputs you actually sent.
The exact amount lands in billed_micros and is also written to a usage_events row you can audit.
Conventions
- Always prefer
media.runover hitting/v1/media/generatewith rawhttpx. The SDK injects the worker's service token and job context for you — a raw call won't bill correctly. - Don't rely on the file extension matching the
kind. Some models return.webpforkind="image"; the SDK detects extension from the URL and saves accordingly.drive_pathis authoritative. - Don't open the returned
output_urlfrom server code to "verify" the file — it's a signed URL meant for the client. Usedrive_pathserver-side; mint a fresh signed URL with thedrive_signMCP tool when you need to share. - Don't retry on a failed call without inspecting
error. Most model errors are deterministic (bad params, NSFW filter, model down) — a blind retry just burns more credit. Fix the inputs first.