Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Subtitle OCR

Bitmap subtitle formats (PGS/VobSub/DVD) are not directly compatible with MP4 Direct Play in many client stacks. direct_play_nice can OCR bitmap subtitles into text tracks using AI OCR backends (PP-OCR/Tesseract). This path is meant for bitmap subtitle streams; text subtitles are muxed directly when compatible.

For official GPU architecture/provider references and compatibility links, see Hardware Acceleration.

Defaults

  • --sub-mode auto
  • --ocr-engine auto
  • --ocr-format srt

Common overrides

  • --sub-mode skip disable subtitle processing
  • --sub-mode force force subtitle processing
  • --ocr-engine pp-ocr-v4 force PP-OCR v4 pipeline
  • --ocr-engine pp-ocr-v3 fallback for older GPU/runtime combinations
  • --ocr-format ass request ASS (may be downgraded in MP4)
  • --ocr-write-srt-sidecar write .srt sidecars in addition to embedded output

GPU behavior

The OCR runtime attempts provider fallback when available (for example CUDA, DirectML, CoreML, then CPU). You can force behavior with:

  • DPN_OCR_REQUIRE_GPU=1
  • DPN_OCR_FORCE_CPU=1

ONNX engines:

  • --ocr-engine pp-ocr-v4 for modern GPU/runtime stacks
  • --ocr-engine pp-ocr-v3 for legacy/older GPU compatibility cases

Linux runtime notes:

  • Ensure CUDA/cuDNN and ONNX Runtime are version-compatible.
  • ORT_DYLIB_PATH=/path/to/libonnxruntime.so can be used if ONNX Runtime is not discoverable on default library paths.
  • For older NVIDIA stacks, --ocr-engine pp-ocr-v3 can be more stable than pp-ocr-v4.
  • Use scripts/ocr-tools/check_gpu_env.sh to inspect runtime/library setup.
  • Containerized workloads may need NVIDIA Container Toolkit and exposed runtime libraries.

Model location

Models are downloaded to a default model directory unless DPN_OCR_MODEL_DIR is set.

Default model filenames:

  • v4: ch_PP-OCRv4_det_infer.onnx, ch_ppocr_mobile_v2.0_cls_infer.onnx, en_PP-OCRv4_rec_infer.onnx
  • v3: ch_PP-OCRv3_det_infer.onnx, ch_ppocr_mobile_v2.0_cls_train.onnx, en_PP-OCRv3_rec_infer.onnx

Optional profile rec models are also auto-provisioned (downloaded on first use if missing in the model directory):

  • latin_PP-OCRv3_rec_mobile.onnx
  • japan_PP-OCRv4_rec_mobile.onnx
  • korean_PP-OCRv4_rec_mobile.onnx
  • chinese_cht_PP-OCRv3_rec_mobile.onnx

Override paths for these optional profiles with:

  • DPN_OCR_REC_LATIN_MODEL
  • DPN_OCR_REC_MULTILINGUAL_MODEL
  • DPN_OCR_REC_JAPANESE_MODEL
  • DPN_OCR_REC_KOREAN_MODEL
  • DPN_OCR_REC_CJK_MODEL

DPN_OCR_REC_MULTILINGUAL_MODEL is local-first: if unset, OCR auto-detects a compatible multilingual recognizer already present in the model directory (for example multilingual_PP-OCRv4_rec_infer.onnx) and uses it when script routing targets multilingual coverage. Unlike latin/japanese/korean/cjk profiles, this profile is not downloaded automatically.

Override recognition profile routing (language -> profile) with:

  • DPN_OCR_REC_PROFILE_OVERRIDES Example: spa=latin,rus=multilingual,sr-Latn=latin Script tags are also recognized automatically (for example zh-Hant, sr-Cyrl, sr-Latn).
  • DPN_OCR_LANGUAGE_SCRIPT_HINTS Example: rus=Cyrl,ara=Arab,srp=Cyrl
  • DPN_OCR_ROUTING_MANIFEST Path to custom TOML routing manifest (default: config/ocr-routing.toml in the repo source tree).

Config-file example

sub_mode = "auto"           # auto | force | skip
ocr_default_language = "eng"
ocr_engine = "auto"         # auto | tesseract | pp-ocr-v3 | pp-ocr-v4 | external
ocr_format = "srt"          # srt | ass
ocr_write_srt_sidecar = false
ocr_external_command = "python3 /opt/ocr/run.py"