@
zhongzh 你跑多大的 context ?
oMLX - LLM inference, optimized for your Mac
https://github.com/jundot/omlxBenchmark Model: Qwen3.6-27B-MLX-VL-oQ8-fp16 (DFlash)
================================================================================
Single Request Results
--------------------------------------------------------------------------------
Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem
pp1024/tg128 9841.9 24.30 104.0 tok/s 41.5 tok/s 12.927 89.1 tok/s 31.94 GB
pp4096/tg128 38659.6 23.87 106.0 tok/s 42.2 tok/s 41.691 101.3 tok/s 34.03 GB
pp8192/tg128 77367.7 24.89 105.9 tok/s 40.5 tok/s 80.529 103.3 tok/s 35.27 GB
pp16384/tg128 160222.9 25.85 102.3 tok/s 39.0 tok/s 163.506 101.0 tok/s 37.61 GB
pp32768/tg128 349855.4 49.53 93.7 tok/s 20.3 tok/s 356.146 92.4 tok/s 42.01 GB
pp65536/tg128 801931.3 51.50 81.7 tok/s 19.6 tok/s 808.472 81.2 tok/s 47.38 GB