Whisper Transcriptions

Tools provides a queue-based Whisper transcription service for media URLs and uploaded audio/video files.

This guide focuses on the public contract: what the feature does, how users interact with it, which endpoints exist, and what clients should expect in requests and responses.

What the feature does

Whisper in Tools can:

queue media for transcription
show live queue/job progress
store the finished transcript
optionally generate transcript analysis
optionally generate transcript translations
optionally attach estimated speaker labels when available
create a public transcript share page

Access models

Signed-in web/JWT API

The ordinary Whisper UI and authenticated API use:

signed-in web session auth, or
JWT bearer auth from POST /api/account/login

User permission requirements:

whisper.use for ordinary queue access
whisper.manage for full-queue/admin actions such as run-now and all-user visibility
provider_openai when transcript analysis/translations should run for a non-admin user

Token-authenticated transcribe API

Tools now also exposes a separate server-to-server transcription API for token-based integrations.

Auth requirements:

an active personal token with the access scope whisper.api (the built-in generator still creates a provider_whisper_api token row for convenience)
recommended transport: Authorization: Bearer YOUR_API_TOKEN
token owner must have whisper.api
token owner must also have normal Whisper access (whisper.use)
admin users bypass ordinary permission checks

Legacy X-Api-Key or apikey transport may still exist for backwards compatibility, but new integrations should use the Authorization header.

Queue behavior

Whisper jobs are processed asynchronously.

Job statuses:

queued
downloading
transcribing
finalizing
completed
failed

Jobs can also expose a queue origin:

queue_channel="web"
queue_channel="api"

Signed-in queue/detail views and authenticated /api/whisper/jobs* payloads can therefore show whether a job came from the regular user queue or the token-authenticated API queue.

Admin-owned jobs are prioritized ahead of non-admin jobs when queued work is claimed.

Web UI

`/whisper`

The signed-in queue UI lets users:

submit a media URL
upload a media file
choose model and language hints
set an optional title/label and free-text note
select analysis/translation language preferences
follow live queue progress
open job detail pages

`/whisper/jobs/{jobId}`

The signed-in detail page shows:

current status and progress
source title/description
runtime log
transcript
transcript analysis
transcript translations
speaker-aware transcript when available
public share status

Completed jobs can create a public transcript share page.

Public transcript share page

Completed transcripts can be exposed through a tokenized public page under:

/shared/whisper/transcript/{token}

The share page is intended for reading/transcript sharing, not queue administration.

For token-authenticated API submissions, Tools can now create that share automatically when the transcript completes successfully, and the callback payload includes the direct share URL.

Authenticated Whisper API (`/api/whisper/*`)

These endpoints use signed-in web/JWT auth, not the dedicated Whisper API token.

`GET /api/whisper/status`

Returns queue counters and capability flags.

Typical response shape:

{
  "ok": true,
  "summary": {
    "queued": 3,
    "processing": 1,
    "completed": 21,
    "failed": 2
  },
  "can_manage_all": false,
  "config": {
    "enabled": true,
    "default_model": "large",
    "upload_max_mb": 64,
    "upload_limit": {
      "configured_mb": 200,
      "php_upload_max_mb": 64,
      "php_post_max_mb": 128,
      "effective_max_mb": 64,
      "effective_max_label": "64 MB",
      "limited_by_php": true
    },
    "ytdlp_configured": true
  }
}

upload_max_mb now reflects the practical/effective limit for uploaded Whisper media on the current host, and additive config.upload_limit can explain when PHP upload/body limits are lower than Whisper's own configured cap.

`GET /api/whisper/jobs?limit=100`

Returns visible Whisper jobs for the authenticated user.

`POST /api/whisper/jobs`

Queues a new Whisper job.

Supported request styles:

JSON/form body with source_url
multipart/form-data with media_file

Important rule:

send either source_url or media_file, not both
if the uploaded file is too large for the current host, uploaded only partially, or is blocked by temporary-storage/PHP upload errors, the endpoint now returns a clearer 422 validation error under media_file instead of only the generic “failed to upload” wording

Example JSON body:

{
  "source_url": "https://example.com/audio.mp3",
  "source_label": "Interview with customer",
  "source_note": "Recorded support follow-up call.",
  "model": "large",
  "language": "sv",
  "analysis_language": "sv",
  "translation_target_languages": ["en"]
}

`GET /api/whisper/jobs/{jobId}`

Returns one visible Whisper job.

Additive job fields now include:

queue_channel
queue_channel_label
source_type
source_label
source_note
source_mime
source_size_bytes
source_duration_seconds
source_duration_human
stage_label
stage_detail
runtime_log[]
liveness
analysis
translations[]
diarization
share
callback (primarily relevant for API-queue jobs)

`POST /api/whisper/jobs/{jobId}/analyze`

Runs transcript analysis for a completed transcript.

Guardrails:

transcript must already exist
non-admin users must have OpenAI access

`POST /api/whisper/jobs/{jobId}/cancel`

Requests cooperative cancellation for an actively processing job.

`POST /api/whisper/jobs/{jobId}/restart`

Queues a failed/queued job for retry.

`DELETE /api/whisper/jobs/{jobId}`

Deletes a non-processing job.

`POST /api/whisper/run-now`

Admin/manager helper endpoint.

Request body can include:

{
  "limit": 1,
  "reset_failed": true
}

Token-authenticated transcribe API (`/api/whisper/transcribe/*`)

This is the dedicated server-to-server callback API.

`GET /api/whisper/transcribe/status`

Returns queue counters for the token-authenticated API queue channel.

`GET /api/whisper/transcribe/jobs?limit=100`

Returns visible jobs from the API queue channel.

`GET /api/whisper/transcribe/jobs/{jobId}`

Returns one visible API-queue job.

`POST /api/whisper/transcribe`

Queues a new token-authenticated Whisper job.

Required field:

callback_url

Supported submission styles:

URL jobs using source_url
multipart file jobs using media_file

Upload validation guidance:

when the uploaded file is larger than the current practical host limit, the API can now return 422 with errors.media_file[] explaining the effective Whisper upload limit
the same media_file validation path is also used for partial uploads, missing temp-folder failures, write failures, and other PHP upload transport errors before the job is queued

The token API accepts the same additive metadata as the ordinary queue endpoint, including:

source_label
source_note
model
language
analysis_language
translation_target_languages[]
disable_diarization

Example JSON body:

{
  "source_url": "https://example.com/audio.mp3",
  "callback_url": "https://api.example.test/whisper/callback",
  "source_label": "Customer interview",
  "source_note": "Transcribe and send the final result back to our integration.",
  "model": "large",
  "language": "en",
  "analysis_language": "en",
  "translation_target_languages": ["sv"]
}

Example success response:

{
  "ok": true,
  "message": "Whisper API job queued. A callback will be sent when the job reaches a terminal state.",
  "job": {
    "id": 123,
    "queue_channel": "api",
    "queue_channel_label": "API queue",
    "status": "queued",
    "callback": {
      "url": "https://api.example.test/whisper/callback",
      "status": "pending",
      "http_status": null,
      "last_attempt_at": null,
      "delivered_at": null,
      "error": null
    },
    "share": null
  }
}

Callback contract

When a token-authenticated Whisper API job reaches terminal completed or failed, Tools sends one JSON POST to the submitted callback_url.

Callback envelope:

{
  "ok": true,
  "event": "whisper.job.completed",
  "job": {
    "job_id": 123,
    "status": "completed",
    "status_label": "Completed",
    "queue_channel": "api",
    "queue_channel_label": "API queue",
    "source": "Customer interview",
    "model": "large",
    "language": "en",
    "job_url": "https://tools.example.test/whisper/jobs/123",
    "share_url": "https://tools.example.test/shared/whisper/transcript/example-token-redacted",
    "transcript_text": "...",
    "analysis_text": "...",
    "translations": [],
    "share": {
      "url": "https://tools.example.test/shared/whisper/transcript/example-token-redacted"
    }
  }
}

Failure callbacks use event="whisper.job.failed" and can include failure_error instead of transcript/share data.

Client guidance:

treat callbacks as asynchronous terminal-state notifications
store them idempotently by job.job_id
do not assume a share URL exists on failed jobs
do not assume transcript analysis/translations are always present for every account

Error handling

Typical error classes:

401 unauthenticated / token rejected
403 missing permission
404 job not found or not visible to the caller
422 validation or business-rule failure
429 throttled
5xx temporary backend/provider failure

Rate limiting

Whisper API routes use a general throttle:120,1 policy.

Clients should still implement normal backoff for repeated polling or transient failures.

Safe client recommendations

Prefer Authorization: Bearer YOUR_API_TOKEN
Treat callback_url as required for token-authenticated submissions
Expect jobs to finish asynchronously
Surface queue_channel and queue_channel_label in operator/debug UIs
Treat transcript_text as the primary result and speaker_aware_transcript as additive helper output
Treat share.url as public access and handle it carefully

Tornevall Networks