Subtitle OCR
Bitmap subtitle formats (PGS/VobSub/DVD) are not directly compatible with MP4
Direct Play in many client stacks. direct_play_nice can OCR bitmap subtitles
into text tracks using AI OCR backends (PP-OCR/Tesseract). This path is meant
for bitmap subtitle streams; text subtitles are muxed directly when compatible.
For official GPU architecture/provider references and compatibility links, see Hardware Acceleration.
Defaults
--sub-mode auto--ocr-engine auto--ocr-format srt
Common overrides
--sub-mode skipdisable subtitle processing--sub-mode forceforce subtitle processing--ocr-engine pp-ocr-v4force PP-OCR v4 pipeline--ocr-engine pp-ocr-v3fallback for older GPU/runtime combinations--ocr-format assrequest ASS (may be downgraded in MP4)--ocr-write-srt-sidecarwrite.srtsidecars in addition to embedded output
GPU behavior
The OCR runtime attempts provider fallback when available (for example CUDA, DirectML, CoreML, then CPU). You can force behavior with:
DPN_OCR_REQUIRE_GPU=1DPN_OCR_FORCE_CPU=1
ONNX engines:
--ocr-engine pp-ocr-v4for modern GPU/runtime stacks--ocr-engine pp-ocr-v3for legacy/older GPU compatibility cases
Linux runtime notes:
- Ensure CUDA/cuDNN and ONNX Runtime are version-compatible.
ORT_DYLIB_PATH=/path/to/libonnxruntime.socan be used if ONNX Runtime is not discoverable on default library paths.- For older NVIDIA stacks,
--ocr-engine pp-ocr-v3can be more stable thanpp-ocr-v4. - Use
scripts/ocr-tools/check_gpu_env.shto inspect runtime/library setup. - Containerized workloads may need NVIDIA Container Toolkit and exposed runtime libraries.
Model location
Models are downloaded to a default model directory unless DPN_OCR_MODEL_DIR
is set.
Default model filenames:
- v4:
ch_PP-OCRv4_det_infer.onnx,ch_ppocr_mobile_v2.0_cls_infer.onnx,en_PP-OCRv4_rec_infer.onnx - v3:
ch_PP-OCRv3_det_infer.onnx,ch_ppocr_mobile_v2.0_cls_train.onnx,en_PP-OCRv3_rec_infer.onnx
Optional profile rec models are also auto-provisioned (downloaded on first use if missing in the model directory):
latin_PP-OCRv3_rec_mobile.onnxjapan_PP-OCRv4_rec_mobile.onnxkorean_PP-OCRv4_rec_mobile.onnxchinese_cht_PP-OCRv3_rec_mobile.onnx
Override paths for these optional profiles with:
DPN_OCR_REC_LATIN_MODELDPN_OCR_REC_MULTILINGUAL_MODELDPN_OCR_REC_JAPANESE_MODELDPN_OCR_REC_KOREAN_MODELDPN_OCR_REC_CJK_MODEL
DPN_OCR_REC_MULTILINGUAL_MODEL is local-first: if unset, OCR auto-detects a
compatible multilingual recognizer already present in the model directory
(for example multilingual_PP-OCRv4_rec_infer.onnx) and uses it when script
routing targets multilingual coverage. Unlike latin/japanese/korean/cjk
profiles, this profile is not downloaded automatically.
Override recognition profile routing (language -> profile) with:
DPN_OCR_REC_PROFILE_OVERRIDESExample:spa=latin,rus=multilingual,sr-Latn=latinScript tags are also recognized automatically (for examplezh-Hant,sr-Cyrl,sr-Latn).DPN_OCR_LANGUAGE_SCRIPT_HINTSExample:rus=Cyrl,ara=Arab,srp=CyrlDPN_OCR_ROUTING_MANIFESTPath to custom TOML routing manifest (default:config/ocr-routing.tomlin the repo source tree).
Config-file example
sub_mode = "auto" # auto | force | skip
ocr_default_language = "eng"
ocr_engine = "auto" # auto | tesseract | pp-ocr-v3 | pp-ocr-v4 | external
ocr_format = "srt" # srt | ass
ocr_write_srt_sidecar = false
ocr_external_command = "python3 /opt/ocr/run.py"