Tất cả bài viết

// Popular Articles

#qwen3

#8532026-04-20

Mind DeepResearch 30B của Li Auto vượt Gemini 3.1 trên benchmark deep research

MindDR — multi-agent deep research framework chỉ ~30B tham số do Li Auto phát triển — đạt RACE 51.8 trên MindDR Bench, vượt cả Gemini 3.1 và Gemini 2.5 Pro. Bí quyết: 3 agent chuyên biệt + pipeline training 4 stage, chỉ tốn ~6,000 GPU card-hours.

mind-deepresearchli-autodeep-research-agent

7 phút đọc

#5722025-12-08

DFlash đã chạy được trên llama.cpp: block-diffusion draft, speedup tới 8× cho Qwen3

spiritbuun vừa push bản triển khai DFlash — speculative decoding kiểu block-diffusion — vào fork buun-llama-cpp. Một dòng lệnh --spec-type dflash, draft model 5 layer, block 16 token mỗi forward pass, tốc độ gấp 6–8 lần so với decode thường và hơn EAGLE-3 khoảng 2.5×.

dflashllama-cppspeculative-decoding

6 phút đọc

#2672025-07-08

Maximal Brain Damage: 2 Bit-Flips Can Wipe Out ResNet-50 and Qwen3-30B

Researchers from NVIDIA, Technion and IBM introduce Deep Neural Lesion (DNL) — a data-free, optimization-free attack that flips just 1–2 sign bits to drop ResNet-50 accuracy by 99.8% and crush Qwen3-30B reasoning from 78% to 0%.

deep-neural-lesionbit-flip-attackai-security

7 phút đọc

#2062025-06-07

Qwen3-8B-OpusReasoning: Claude Opus-style thinking on an 8GB GPU for $52

TeichAI distilled 250 Claude Opus 4.5 high-reasoning traces into an 8B Qwen3 model for $52.3. The result: step-by-step Opus-style thinking that runs on consumer hardware via llama.cpp or Ollama.

qwen3claude-opusdistillation

6 phút đọc