Tất cả bài viết

// Popular Articles

#quantization

#2962025-07-23

Unsloth sweeps 22/22: Gemma 4 26B-A4B GGUFs are now SOTA

An independent benchmark ranked 80 GGUF quantizations of Google's new Gemma 4 26B-A4B across 6 uploaders. Unsloth's Dynamic 2.0 GGUFs placed #1 in every single one of the 22 tested quant sizes on mean KL divergence — the cleanest sweep we've seen in open-model quantization.

gemma-4unslothgguf

6 phút đọc

#1222025-04-26

FlashDrive: Reasoning VLA cho xe tự lái chạy real-time — 716ms xuống 159ms, zero accuracy loss

Z Lab vừa công bố FlashDrive, framework co-design kéo latency Vision-Language-Action model từ 716ms xuống 159ms trên RTX PRO 6000 (tối đa 5.7× trên RTX 4090), giữ nguyên accuracy. Bốn kỹ thuật ghép lại: streaming inference, DFlash speculative reasoning, adaptive-step flow matching, ParoQuant W4A8.

flashdrivevision-language-actionautonomous-driving

7 phút đọc