Tất cả bài viết

// Popular Articles

#small-language-models

#7822026-03-20

AVB drops a 50-minute GRPO + RLVR deep dive — and you watch logits move in real time

Avishek Biswas (@neural_avb) shipped a 50-minute long-form tutorial that walks through GRPO low-level mechanics, trains sub-1B SmolLM and Qwen3 models on text-based RLVR gym envs, and animates PPO updates so you literally see the policy logits shift. Code included.

grporlvrreinforcement-learning

7 phút đọc

#3622025-08-25

Pioneer ra mắt: AI agent đầu tiên fine-tune & deploy LLM chỉ bằng 1 prompt

Fastino Labs vừa launch Pioneer — agent fine-tune SLM/LLM đầu tiên thế giới, chạy hết 6 giờ, chi phí ~$35/run, cải thiện tới +83.8 điểm accuracy so với base model. Lần đầu tiên có 'adaptive inference': model deploy xong tự retrain từ live traffic.

pioneer-aifastino-labsfine-tuning

7 phút đọc