← Back to feed

The Struggle: Transcribe stuff for free with Whisper and WSL/Linux – With a GTX 1060

Permalink
Published: 2025-12-23 11:26:07
Discovered: 2026-03-19 13:50:20
Hash: b0fbb9c4287dd26aa452f1adc93e224e681051e1
https://www.tornevalls.se/the-struggle-transcribe-stuff-for-free-with-whisper-and-wsl-linux-with-a-gtx-1060/
Description
I’ve been struggling with transcription issues for quite some time, for a variety of reasons. Examples: I need a text transcribed to be pasted into Suno, that only exists as a m4a-file (i.e. music, that sometimes has hardcoded subtitles that...
Content
I’ve been struggling with transcription issues for quite some time, for a variety of reasons. Examples: I need a text transcribed to be pasted into Suno, that only exists as a m4a-file (i.e. music, that sometimes has hardcoded subtitles that has to be manually transcribed). Etc.

I first found a Samsung app that could handle transcription, but it quickly became clear that it was limited to its own ecosystem. In practice, you could only transcribe audio that had been recorded inside that specific app.

Since then, I’ve been looking around on and off, and more recently I picked it up again as the need increased – partly to get correct transcriptions, but also to be able to process any audio files I download or record. Samsung’s app is decent, but the quality varies. Right after recording, it performs a quick transcription, but the result is noticeably worse than if you re-run the transcription once the audio file is fully finalized.

At that point I came across “Whisper Transcribe” for Windows. It works, but it requires an account and, of course, paid credits to continue transcribing. You get a small number of free credits at first, but once those run out, you’re expected to pay quite a bit just to keep going.

I already knew that there must be software capable of doing this completely locally. I had previously discovered that Whisper exists in an open-source form as well (I’m not even sure whether the Windows application actually builds on that or not). So today I decided to finally figure out how to do it properly myself.

The end result was the following (thanks to ChatGPT):

A Whisper installer for WSL/Linux, with explicit support for NVIDIA GTX 1060 – something newer Python libraries clearly no longer handle well.

A Whisper runner for WSL/Linux: run whisper <input-file> and get a .txt transcript generated from the audio file.

A Windows Registry file that allows transcription to be executed directly from Windows Explorer via right-click.

A batch file that bridges Windows and WSL so everything runs cleanly, including proper handling of spaces and non-ASCII characters in file names.

The result is a fully local, offline transcription setup that works on any audio file, without accounts, credits, or vendor lock-in.

WSL uses python and pip…

Table of Contents
Toggle
whisper.batwhisper.reg (explorer right clicks)installer för WSL/Linux (with 1060-compatibilty and pre-uninstaller)The script itself
whisper.bat

@echo off
setlocal EnableExtensions

REM Force UTF-8 codepage (fixes å ä ö)
chcp 65001 >nul

REM File passed from Explorer
set "WIN_FILE=%~1"

REM Convert Windows path to WSL path (UTF-8 safe now)
for /f "delims=" %%i in ('wsl wslpath "%WIN_FILE%"') do set "WSL_FILE=%%i"

REM Run whisper on that file
wsl bash -lc "/usr/local/tornevall/whisper \"%WSL_FILE%\""

endlocal

whisper.reg (explorer right clicks)

Windows Registry Editor Version 5.00

[HKEY_CLASSES_ROOT\*\shell\WhisperWSL]
@="Transkribera med Whisper (WSL)"
"Icon"="wsl.exe"

[HKEY_CLASSES_ROOT\*\shell\WhisperWSL\command]
@="\"F:\\viktigt\\Private\\Linux-Scripts\\Whisper.bat\" \"%1\""

installer för WSL/Linux (with 1060-compatibilty and pre-uninstaller)

To make sure stuff are removed properly before reinstalling there is a -u switch for this in the script. In case you make it wrong the first time, this switch is there to make sure you can reinstall it a second time without conflicts.

#!/usr/bin/env bash
set -euo pipefail

VENV_DIR="${VENV_DIR:-$HOME/.venvs/whisper}"
MODE="install"

# --- Parse args ---
while getopts ":u" opt; do
case "$opt" in
u) MODE="uninstall" ;;
*)
echo "Usage: $0 [-u]"
exit 1
;;
esac
done

echo "==> Whisper installer (GTX 1060 compatible)"
echo "==> Mode: $MODE"

# --- Sanity ---
if [[ ! -d "$VENV_DIR" ]]; then
echo "Error: venv not found: $VENV_DIR"
exit 1
fi

# shellcheck disable=SC1090
source "$VENV_DIR/bin/activate"

python -m pip install --upgrade pip setuptools wheel

# ==================================================
# UNINSTALL MODE (-u)
# ==================================================
if [[ "$MODE" == "uninstall" ]]; then
echo "==> Uninstalling incompatible packages ONLY (-u)"

pip uninstall -y torch torchvision torchaudio || true
pip uninstall -y numpy || true

echo ""
echo "Done."
echo "Uninstall completed. Nothing else touched."
exit 0
fi

# ==================================================
# INSTALL MODE (DEFAULT)
# ==================================================

echo "==> Installing compatible stack (no forced uninstall)"

pip install \
numpy==1.26.4 \
torch==1.13.1+cu116 \
torchvision==0.14.1+cu116 \
torchaudio==0.13.1 \
--extra-index-url https://download.pytorch.org/whl/cu116

# --- Verify ---
echo "==> Verifying environment"
python - << 'EOF'
import torch, numpy
print("Torch:", torch.__version__)
print("NumPy:", numpy.__version__)
print("CUDA available:", torch.cuda.is_available())
if torch.cuda.is_available():
print("GPU:", torch.cuda.get_device_name(0))
print("Capability:", torch.cuda.get_device_capability(0))
EOF

echo ""
echo "Done."
echo "Install completed without destructive actions."

The script itself

The script can run without any switches – and only with the audio file intended to be transcribed (but as you can see, it can do a bit more).

#!/usr/bin/env bash
set -euo pipefail

# whisper-run.sh
# Usage:
# whisper <input.extension> [model] [language]
#
# Output:
# <input-filename>.txt (same directory)
#
# Behaviour:
# - Refuses to overwrite existing .txt
# - Stops execution if output exists

if [[ $# -lt 1 ]]; then
echo "Usage: whisper <input.extension> [model] [language]"
exit 1
fi

INPUT="$1"
MODEL="${2:-small}"
LANGUAGE="${3:-}"

if [[ ! -f "$INPUT" ]]; then
echo "Error: Input file not found: $INPUT"
exit 1
fi

BASENAME="$(basename "$INPUT")"
STEM="${BASENAME%.*}"
OUTDIR="$(dirname "$INPUT")"
OUTPUT="$OUTDIR/$STEM.txt"

# --- Refuse overwrite ---
if [[ -f "$OUTPUT" ]]; then
echo "Error: Output file already exists:"
echo " $OUTPUT"
echo "Aborting to avoid overwrite."
exit 1
fi

# Prefer venv whisper if installed via install script
WHISPER_VENV="${WHISPER_VENV:-$HOME/.venvs/whisper}"
WHISPER_BIN="whisper"
if [[ -x "$WHISPER_VENV/bin/whisper" ]]; then
WHISPER_BIN="$WHISPER_VENV/bin/whisper"
fi

if [[ "$WHISPER_BIN" == "whisper" ]] && ! command -v whisper >/dev/null 2>&1; then
echo "Error: whisper not found in PATH or venv."
exit 1
fi

TMPDIR="$(mktemp -d)"
cleanup() { rm -rf "$TMPDIR"; }
trap cleanup EXIT

echo "==> Transcribing:"
echo " input: $INPUT"
echo " output: $OUTPUT"
echo " model: $MODEL"
echo " lang: ${LANGUAGE:-auto}"

ARGS=(
"$INPUT"
--model "$MODEL"
--output_dir "$TMPDIR"
--output_format txt
--task transcribe
--verbose False
--fp16 False
)

if [[ -n "$LANGUAGE" ]]; then
ARGS+=( --language "$LANGUAGE" )
fi

"$WHISPER_BIN" "${ARGS[@]}"

GENERATED_TXT="$TMPDIR/$STEM.txt"
if [[ ! -f "$GENERATED_TXT" ]]; then
FOUND_TXT="$(find "$TMPDIR" -maxdepth 1 -type f -name "*.txt" | head -n 1 || true)"
if [[ -z "${FOUND_TXT:-}" ]]; then
echo "Error: No .txt output produced."
exit 1
fi
GENERATED_TXT="$FOUND_TXT"
fi

# --- Final move (no overwrite possible due to earlier check) ---
mv "$GENERATED_TXT" "$OUTPUT"

echo "==> Done:"
echo " $OUTPUT"

History (2 versions shown )

Changes

From 2025-12-23 11:26:07 (discovered: 2026-02-05 14:24:03) hash: 30b1980e02b98f24cf08ff2a3b59ce922f5c1d2d
To 2025-12-23 11:26:07 (discovered: 2026-03-19 13:50:20) hash: b0fbb9c4287dd26aa452f1adc93e224e681051e1
Title
The Struggle: Transcribe stuff for free with Whisper and WSL/Linux – With a GTX 1060
Description
I’ve been struggling with transcription issues for quite some time, for a variety of reasons. Examples: I need a text transcribed to be pasted into Suno, that only exists as a m4a-file (i.e. music, that sometimes has hardcoded subtitles that...
Content
I’ve been struggling with transcription issues for quite some time, for a variety of reasons. Examples: I need a text transcribed to be pasted into Suno, that only exists as a m4a-file (i.e. music, that sometimes has hardcoded subtitles that has to be manually transcribed). Etc. I first found a Samsung app that could handle transcription, but it quickly became clear that it was limited to its own ecosystem. In practice, you could only transcribe audio that had been recorded inside that specific app. Since then, I’ve been looking around on and off, and more recently I picked it up again as the need increased – partly to get correct transcriptions, but also to be able to process any audio files I download or record. Samsung’s app is decent, but the quality varies. Right after recording, it performs a quick transcription, but the result is noticeably worse than if you re-run the transcription once the audio file is fully finalized. At that point I came across “Whisper Transcribe” for Windows. It works, but it requires an account and, of course, paid credits to continue transcribing. You get a small number of free credits at first, but once those run out, you’re you’re expected to pay quite a bit just to keep going. I already knew that there must be software capable of doing this completely locally. I had previously discovered that Whisper exists in an open-source form as well (I’m (I’m not even sure whether the Windows application actually builds on that or not). So today I decided to finally figure out how to do it properly myself. The end result was the following (thanks to ChatGPT): A Whisper installer for WSL/Linux, with explicit support for NVIDIA GTX 1060 – something newer Python libraries clearly no longer handle well. A Whisper runner for WSL/Linux: run whisper and get a .txt transcript generated from the audio file. A Windows Registry file that allows transcription to be executed directly from Windows Explorer via right-click. A batch file that bridges Windows and WSL so everything runs cleanly, including proper handling of spaces and non-ASCII characters in file names. The result is a fully local, offline transcription setup that works on any audio file, without accounts, credits, or vendor lock-in. WSL uses python and pip… Table of Contents Toggle whisper.batwhisper.reg (explorer right clicks)installer för för WSL/Linux (with 1060-compatibilty and pre-uninstaller)The script itself whisper.bat @echo off setlocal EnableExtensions REM Force UTF-8 codepage (fixes Ã¥ ä ö) å ä ö) chcp 65001 >nul REM File passed from Explorer set "WIN_FILE=%~1" REM Convert Windows path to WSL path (UTF-8 safe now) for /f "delims=" %%i in ('wsl wslpath "%WIN_FILE%"') do set "WSL_FILE=%%i" REM Run whisper on that file wsl bash -lc "/usr/local/tornevall/whisper \"%WSL_FILE%\"" endlocal whisper.reg (explorer right clicks) Windows Registry Editor Version 5.00 [HKEY_CLASSES_ROOT\*\shell\WhisperWSL] @="Transkribera med Whisper (WSL)" "Icon"="wsl.exe" [HKEY_CLASSES_ROOT\*\shell\WhisperWSL\command] @="\"F:\\viktigt\\Private\\Linux-Scripts\\Whisper.bat\" \"%1\"" installer för för WSL/Linux (with 1060-compatibilty and pre-uninstaller) To make sure stuff are removed properly before reinstalling there is a -u switch for this in the script. In case you make it wrong the first time, this switch is there to make sure you can reinstall it a second time without conflicts. #!/usr/bin/env bash set -euo pipefail VENV_DIR="${VENV_DIR:-$HOME/.venvs/whisper}" MODE="install" # --- Parse args --- while getopts ":u" opt; do case "$opt" in u) MODE="uninstall" ;; *) echo "Usage: $0 [-u]" exit 1 ;; esac done echo "==> Whisper installer (GTX 1060 compatible)" echo "==> Mode: $MODE" # --- Sanity --- if [[ ! -d "$VENV_DIR" ]]; then echo "Error: venv not found: $VENV_DIR" exit 1 fi # shellcheck disable=SC1090 source "$VENV_DIR/bin/activate" python -m pip install --upgrade pip setuptools wheel # ================================================== # UNINSTALL MODE (-u) # ================================================== if [[ "$MODE" == "uninstall" ]]; then echo "==> Uninstalling incompatible packages ONLY (-u)" pip uninstall -y torch torchvision torchaudio || true pip uninstall -y numpy || true echo "" echo "Done." echo "Uninstall completed. Nothing else touched." exit 0 fi # ================================================== # INSTALL MODE (DEFAULT) # ================================================== echo "==> Installing compatible stack (no forced uninstall)" pip install \ numpy==1.26.4 \ torch==1.13.1+cu116 \ torchvision==0.14.1+cu116 \ torchaudio==0.13.1 \ --extra-index-url https://download.pytorch.org/whl/cu116 # --- Verify --- echo "==> Verifying environment" python - /dev/null 2>&1; then echo "Error: whisper not found in PATH or venv." exit 1 fi TMPDIR="$(mktemp -d)" cleanup() { rm -rf "$TMPDIR"; } trap cleanup EXIT echo "==> Transcribing:" echo " input: $INPUT" echo " output: $OUTPUT" echo " model: $MODEL" echo " lang: ${LANGUAGE:-auto}" ARGS=( "$INPUT" --model "$MODEL" --output_dir "$TMPDIR" --output_format txt --task transcribe --verbose False --fp16 False ) if [[ -n "$LANGUAGE" ]]; then ARGS+=( --language "$LANGUAGE" ) fi "$WHISPER_BIN" "${ARGS[@]}" GENERATED_TXT="$TMPDIR/$STEM.txt" if [[ ! -f "$GENERATED_TXT" ]]; then FOUND_TXT="$(find "$TMPDIR" -maxdepth 1 -type f -name "*.txt" | head -n 1 || true)" if [[ -z "${FOUND_TXT:-}" ]]; then echo "Error: No .txt output produced." exit 1 fi GENERATED_TXT="$FOUND_TXT" fi # --- Final

Versions

  1. 2025-12-23 11:26:07
    Discovered: 2026-03-19 13:50:20 Hash: b0fbb9c4287dd26aa452f1adc93e224e681051e1
    Title:
    The Struggle: Transcribe stuff for free with Whisper and WSL/Linux – With a GTX 1060
    Description:
    I’ve been struggling with transcription issues for quite some time, for a variety of reasons. Examples: I need a text transcribed to be pasted into Suno, that only exists as a m4a-file (i.e. music, that sometimes has hardcoded subtitles that...
    Content
    I’ve been struggling with transcription issues for quite some time, for a variety of reasons. Examples: I need a text transcribed to be pasted into Suno, that only exists as a m4a-file (i.e. music, that sometimes has hardcoded subtitles that has to be manually transcribed). Etc.

    I first found a Samsung app that could handle transcription, but it quickly became clear that it was limited to its own ecosystem. In practice, you could only transcribe audio that had been recorded inside that specific app.

    Since then, I’ve been looking around on and off, and more recently I picked it up again as the need increased – partly to get correct transcriptions, but also to be able to process any audio files I download or record. Samsung’s app is decent, but the quality varies. Right after recording, it performs a quick transcription, but the result is noticeably worse than if you re-run the transcription once the audio file is fully finalized.

    At that point I came across “Whisper Transcribe” for Windows. It works, but it requires an account and, of course, paid credits to continue transcribing. You get a small number of free credits at first, but once those run out, you’re expected to pay quite a bit just to keep going.

    I already knew that there must be software capable of doing this completely locally. I had previously discovered that Whisper exists in an open-source form as well (I’m not even sure whether the Windows application actually builds on that or not). So today I decided to finally figure out how to do it properly myself.

    The end result was the following (thanks to ChatGPT):

    A Whisper installer for WSL/Linux, with explicit support for NVIDIA GTX 1060 – something newer Python libraries clearly no longer handle well.

    A Whisper runner for WSL/Linux: run whisper <input-file> and get a .txt transcript generated from the audio file.

    A Windows Registry file that allows transcription to be executed directly from Windows Explorer via right-click.

    A batch file that bridges Windows and WSL so everything runs cleanly, including proper handling of spaces and non-ASCII characters in file names.

    The result is a fully local, offline transcription setup that works on any audio file, without accounts, credits, or vendor lock-in.

    WSL uses python and pip…

    Table of Contents
    Toggle
    whisper.batwhisper.reg (explorer right clicks)installer för WSL/Linux (with 1060-compatibilty and pre-uninstaller)The script itself
    whisper.bat

    @echo off
    setlocal EnableExtensions

    REM Force UTF-8 codepage (fixes å ä ö)
    chcp 65001 >nul

    REM File passed from Explorer
    set "WIN_FILE=%~1"

    REM Convert Windows path to WSL path (UTF-8 safe now)
    for /f "delims=" %%i in ('wsl wslpath "%WIN_FILE%"') do set "WSL_FILE=%%i"

    REM Run whisper on that file
    wsl bash -lc "/usr/local/tornevall/whisper \"%WSL_FILE%\""

    endlocal

    whisper.reg (explorer right clicks)

    Windows Registry Editor Version 5.00

    [HKEY_CLASSES_ROOT\*\shell\WhisperWSL]
    @="Transkribera med Whisper (WSL)"
    "Icon"="wsl.exe"

    [HKEY_CLASSES_ROOT\*\shell\WhisperWSL\command]
    @="\"F:\\viktigt\\Private\\Linux-Scripts\\Whisper.bat\" \"%1\""

    installer för WSL/Linux (with 1060-compatibilty and pre-uninstaller)

    To make sure stuff are removed properly before reinstalling there is a -u switch for this in the script. In case you make it wrong the first time, this switch is there to make sure you can reinstall it a second time without conflicts.

    #!/usr/bin/env bash
    set -euo pipefail

    VENV_DIR="${VENV_DIR:-$HOME/.venvs/whisper}"
    MODE="install"

    # --- Parse args ---
    while getopts ":u" opt; do
    case "$opt" in
    u) MODE="uninstall" ;;
    *)
    echo "Usage: $0 [-u]"
    exit 1
    ;;
    esac
    done

    echo "==> Whisper installer (GTX 1060 compatible)"
    echo "==> Mode: $MODE"

    # --- Sanity ---
    if [[ ! -d "$VENV_DIR" ]]; then
    echo "Error: venv not found: $VENV_DIR"
    exit 1
    fi

    # shellcheck disable=SC1090
    source "$VENV_DIR/bin/activate"

    python -m pip install --upgrade pip setuptools wheel

    # ==================================================
    # UNINSTALL MODE (-u)
    # ==================================================
    if [[ "$MODE" == "uninstall" ]]; then
    echo "==> Uninstalling incompatible packages ONLY (-u)"

    pip uninstall -y torch torchvision torchaudio || true
    pip uninstall -y numpy || true

    echo ""
    echo "Done."
    echo "Uninstall completed. Nothing else touched."
    exit 0
    fi

    # ==================================================
    # INSTALL MODE (DEFAULT)
    # ==================================================

    echo "==> Installing compatible stack (no forced uninstall)"

    pip install \
    numpy==1.26.4 \
    torch==1.13.1+cu116 \
    torchvision==0.14.1+cu116 \
    torchaudio==0.13.1 \
    --extra-index-url https://download.pytorch.org/whl/cu116

    # --- Verify ---
    echo "==> Verifying environment"
    python - << 'EOF'
    import torch, numpy
    print("Torch:", torch.__version__)
    print("NumPy:", numpy.__version__)
    print("CUDA available:", torch.cuda.is_available())
    if torch.cuda.is_available():
    print("GPU:", torch.cuda.get_device_name(0))
    print("Capability:", torch.cuda.get_device_capability(0))
    EOF

    echo ""
    echo "Done."
    echo "Install completed without destructive actions."

    The script itself

    The script can run without any switches – and only with the audio file intended to be transcribed (but as you can see, it can do a bit more).

    #!/usr/bin/env bash
    set -euo pipefail

    # whisper-run.sh
    # Usage:
    # whisper <input.extension> [model] [language]
    #
    # Output:
    # <input-filename>.txt (same directory)
    #
    # Behaviour:
    # - Refuses to overwrite existing .txt
    # - Stops execution if output exists

    if [[ $# -lt 1 ]]; then
    echo "Usage: whisper <input.extension> [model] [language]"
    exit 1
    fi

    INPUT="$1"
    MODEL="${2:-small}"
    LANGUAGE="${3:-}"

    if [[ ! -f "$INPUT" ]]; then
    echo "Error: Input file not found: $INPUT"
    exit 1
    fi

    BASENAME="$(basename "$INPUT")"
    STEM="${BASENAME%.*}"
    OUTDIR="$(dirname "$INPUT")"
    OUTPUT="$OUTDIR/$STEM.txt"

    # --- Refuse overwrite ---
    if [[ -f "$OUTPUT" ]]; then
    echo "Error: Output file already exists:"
    echo " $OUTPUT"
    echo "Aborting to avoid overwrite."
    exit 1
    fi

    # Prefer venv whisper if installed via install script
    WHISPER_VENV="${WHISPER_VENV:-$HOME/.venvs/whisper}"
    WHISPER_BIN="whisper"
    if [[ -x "$WHISPER_VENV/bin/whisper" ]]; then
    WHISPER_BIN="$WHISPER_VENV/bin/whisper"
    fi

    if [[ "$WHISPER_BIN" == "whisper" ]] && ! command -v whisper >/dev/null 2>&1; then
    echo "Error: whisper not found in PATH or venv."
    exit 1
    fi

    TMPDIR="$(mktemp -d)"
    cleanup() { rm -rf "$TMPDIR"; }
    trap cleanup EXIT

    echo "==> Transcribing:"
    echo " input: $INPUT"
    echo " output: $OUTPUT"
    echo " model: $MODEL"
    echo " lang: ${LANGUAGE:-auto}"

    ARGS=(
    "$INPUT"
    --model "$MODEL"
    --output_dir "$TMPDIR"
    --output_format txt
    --task transcribe
    --verbose False
    --fp16 False
    )

    if [[ -n "$LANGUAGE" ]]; then
    ARGS+=( --language "$LANGUAGE" )
    fi

    "$WHISPER_BIN" "${ARGS[@]}"

    GENERATED_TXT="$TMPDIR/$STEM.txt"
    if [[ ! -f "$GENERATED_TXT" ]]; then
    FOUND_TXT="$(find "$TMPDIR" -maxdepth 1 -type f -name "*.txt" | head -n 1 || true)"
    if [[ -z "${FOUND_TXT:-}" ]]; then
    echo "Error: No .txt output produced."
    exit 1
    fi
    GENERATED_TXT="$FOUND_TXT"
    fi

    # --- Final move (no overwrite possible due to earlier check) ---
    mv "$GENERATED_TXT" "$OUTPUT"

    echo "==> Done:"
    echo " $OUTPUT"
  2. 2025-12-23 11:26:07
    Discovered: 2026-02-05 14:24:03 Hash: 30b1980e02b98f24cf08ff2a3b59ce922f5c1d2d
    Title:
    The Struggle: Transcribe stuff for free with Whisper and WSL/Linux – With a GTX 1060
    Description:
    I’ve been struggling with transcription issues for quite some time, for a variety of reasons. Examples: I need a text transcribed to be pasted into Suno, that only exists as a m4a-file (i.e. music, that sometimes has hardcoded subtitles that...
    Content
    I’ve been struggling with transcription issues for quite some time, for a variety of reasons. Examples: I need a text transcribed to be pasted into Suno, that only exists as a m4a-file (i.e. music, that sometimes has hardcoded subtitles that has to be manually transcribed). Etc.

    I first found a Samsung app that could handle transcription, but it quickly became clear that it was limited to its own ecosystem. In practice, you could only transcribe audio that had been recorded inside that specific app.

    Since then, I’ve been looking around on and off, and more recently I picked it up again as the need increased – partly to get correct transcriptions, but also to be able to process any audio files I download or record. Samsung’s app is decent, but the quality varies. Right after recording, it performs a quick transcription, but the result is noticeably worse than if you re-run the transcription once the audio file is fully finalized.

    At that point I came across “Whisper Transcribe” for Windows. It works, but it requires an account and, of course, paid credits to continue transcribing. You get a small number of free credits at first, but once those run out, you’re expected to pay quite a bit just to keep going.

    I already knew that there must be software capable of doing this completely locally. I had previously discovered that Whisper exists in an open-source form as well (I’m not even sure whether the Windows application actually builds on that or not). So today I decided to finally figure out how to do it properly myself.

    The end result was the following (thanks to ChatGPT):

    A Whisper installer for WSL/Linux, with explicit support for NVIDIA GTX 1060 – something newer Python libraries clearly no longer handle well.

    A Whisper runner for WSL/Linux: run whisper <input-file> and get a .txt transcript generated from the audio file.

    A Windows Registry file that allows transcription to be executed directly from Windows Explorer via right-click.

    A batch file that bridges Windows and WSL so everything runs cleanly, including proper handling of spaces and non-ASCII characters in file names.

    The result is a fully local, offline transcription setup that works on any audio file, without accounts, credits, or vendor lock-in.

    WSL uses python and pip…

    Table of Contents
    Toggle
    whisper.batwhisper.reg (explorer right clicks)installer för WSL/Linux (with 1060-compatibilty and pre-uninstaller)The script itself
    whisper.bat

    @echo off
    setlocal EnableExtensions

    REM Force UTF-8 codepage (fixes å ä ö)
    chcp 65001 >nul

    REM File passed from Explorer
    set "WIN_FILE=%~1"

    REM Convert Windows path to WSL path (UTF-8 safe now)
    for /f "delims=" %%i in ('wsl wslpath "%WIN_FILE%"') do set "WSL_FILE=%%i"

    REM Run whisper on that file
    wsl bash -lc "/usr/local/tornevall/whisper \"%WSL_FILE%\""

    endlocal

    whisper.reg (explorer right clicks)

    Windows Registry Editor Version 5.00

    [HKEY_CLASSES_ROOT\*\shell\WhisperWSL]
    @="Transkribera med Whisper (WSL)"
    "Icon"="wsl.exe"

    [HKEY_CLASSES_ROOT\*\shell\WhisperWSL\command]
    @="\"F:\\viktigt\\Private\\Linux-Scripts\\Whisper.bat\" \"%1\""

    installer för WSL/Linux (with 1060-compatibilty and pre-uninstaller)

    To make sure stuff are removed properly before reinstalling there is a -u switch for this in the script. In case you make it wrong the first time, this switch is there to make sure you can reinstall it a second time without conflicts.

    #!/usr/bin/env bash
    set -euo pipefail

    VENV_DIR="${VENV_DIR:-$HOME/.venvs/whisper}"
    MODE="install"

    # --- Parse args ---
    while getopts ":u" opt; do
    case "$opt" in
    u) MODE="uninstall" ;;
    *)
    echo "Usage: $0 [-u]"
    exit 1
    ;;
    esac
    done

    echo "==> Whisper installer (GTX 1060 compatible)"
    echo "==> Mode: $MODE"

    # --- Sanity ---
    if [[ ! -d "$VENV_DIR" ]]; then
    echo "Error: venv not found: $VENV_DIR"
    exit 1
    fi

    # shellcheck disable=SC1090
    source "$VENV_DIR/bin/activate"

    python -m pip install --upgrade pip setuptools wheel

    # ==================================================
    # UNINSTALL MODE (-u)
    # ==================================================
    if [[ "$MODE" == "uninstall" ]]; then
    echo "==> Uninstalling incompatible packages ONLY (-u)"

    pip uninstall -y torch torchvision torchaudio || true
    pip uninstall -y numpy || true

    echo ""
    echo "Done."
    echo "Uninstall completed. Nothing else touched."
    exit 0
    fi

    # ==================================================
    # INSTALL MODE (DEFAULT)
    # ==================================================

    echo "==> Installing compatible stack (no forced uninstall)"

    pip install \
    numpy==1.26.4 \
    torch==1.13.1+cu116 \
    torchvision==0.14.1+cu116 \
    torchaudio==0.13.1 \
    --extra-index-url https://download.pytorch.org/whl/cu116

    # --- Verify ---
    echo "==> Verifying environment"
    python - << 'EOF'
    import torch, numpy
    print("Torch:", torch.__version__)
    print("NumPy:", numpy.__version__)
    print("CUDA available:", torch.cuda.is_available())
    if torch.cuda.is_available():
    print("GPU:", torch.cuda.get_device_name(0))
    print("Capability:", torch.cuda.get_device_capability(0))
    EOF

    echo ""
    echo "Done."
    echo "Install completed without destructive actions."

    The script itself

    The script can run without any switches – and only with the audio file intended to be transcribed (but as you can see, it can do a bit more).

    #!/usr/bin/env bash
    set -euo pipefail

    # whisper-run.sh
    # Usage:
    # whisper <input.extension> [model] [language]
    #
    # Output:
    # <input-filename>.txt (same directory)
    #
    # Behaviour:
    # - Refuses to overwrite existing .txt
    # - Stops execution if output exists

    if [[ $# -lt 1 ]]; then
    echo "Usage: whisper <input.extension> [model] [language]"
    exit 1
    fi

    INPUT="$1"
    MODEL="${2:-small}"
    LANGUAGE="${3:-}"

    if [[ ! -f "$INPUT" ]]; then
    echo "Error: Input file not found: $INPUT"
    exit 1
    fi

    BASENAME="$(basename "$INPUT")"
    STEM="${BASENAME%.*}"
    OUTDIR="$(dirname "$INPUT")"
    OUTPUT="$OUTDIR/$STEM.txt"

    # --- Refuse overwrite ---
    if [[ -f "$OUTPUT" ]]; then
    echo "Error: Output file already exists:"
    echo " $OUTPUT"
    echo "Aborting to avoid overwrite."
    exit 1
    fi

    # Prefer venv whisper if installed via install script
    WHISPER_VENV="${WHISPER_VENV:-$HOME/.venvs/whisper}"
    WHISPER_BIN="whisper"
    if [[ -x "$WHISPER_VENV/bin/whisper" ]]; then
    WHISPER_BIN="$WHISPER_VENV/bin/whisper"
    fi

    if [[ "$WHISPER_BIN" == "whisper" ]] && ! command -v whisper >/dev/null 2>&1; then
    echo "Error: whisper not found in PATH or venv."
    exit 1
    fi

    TMPDIR="$(mktemp -d)"
    cleanup() { rm -rf "$TMPDIR"; }
    trap cleanup EXIT

    echo "==> Transcribing:"
    echo " input: $INPUT"
    echo " output: $OUTPUT"
    echo " model: $MODEL"
    echo " lang: ${LANGUAGE:-auto}"

    ARGS=(
    "$INPUT"
    --model "$MODEL"
    --output_dir "$TMPDIR"
    --output_format txt
    --task transcribe
    --verbose False
    --fp16 False
    )

    if [[ -n "$LANGUAGE" ]]; then
    ARGS+=( --language "$LANGUAGE" )
    fi

    "$WHISPER_BIN" "${ARGS[@]}"

    GENERATED_TXT="$TMPDIR/$STEM.txt"
    if [[ ! -f "$GENERATED_TXT" ]]; then
    FOUND_TXT="$(find "$TMPDIR" -maxdepth 1 -type f -name "*.txt" | head -n 1 || true)"
    if [[ -z "${FOUND_TXT:-}" ]]; then
    echo "Error: No .txt output produced."
    exit 1
    fi
    GENERATED_TXT="$FOUND_TXT"
    fi

    # --- Final move (no overwrite possible due to earlier check) ---
    mv "$GENERATED_TXT" "$OUTPUT"

    echo "==> Done:"
    echo " $OUTPUT"