Khoi Nguyen shares his experience as a Backend Engineer at Dwarves, highlighting the continuous learning opportunities, challenging projects, and knowledge sharing culture

Khoi Nguyen

#25 Khoi Nguyen on continuous learning

Cultivating a proactive and questioning mindset by embracing simple, even "naive," questions when interacting with artificial intelligence.

Learning with AI

Memo is where we share everything we learn, build, and think about product craftsmanship, engineering, and our culture. It's our commitment to learning in public.

Memo

Memo handbook

You will always grow by learning and playing with new and cool technologies. From books to conferences, you’ll get a yearly budget for your learning and development goals.

Continuing education allowance

Leading our labs team to explore, assess, and share the newest technologies across the organization

Learning chair

Design sprint mostly applied to the Exploration phase. Friday is usually Education event or Lab projects at Dwarves Design, learning and continuous professional and personal development are in the core of our DNA. No one wants to settle, everyone wants to take the next step forward.

Design sprint

machine-learning

Gradient descent is a fundamental optimization algorithm in machine learning. It's a way for models to learn from data and improve their accuracy by gradually adjusting their internal settings. Think of it like carefully descending a hill to find the lowest point, each small step you take brings you closer to the best possible solution.

Explaining gradient descent in machine learning with a simple analogy

An exploration of the cyclical nature of AI development, tracing the rise and fall of new technologies within the field, and how this pattern has repeated throughout history.

A grand unified theory of the AI hype cycle

Learn how to fine-tune LLaMA large language models efficiently using PEFT LoRA for cost-effective, private AI customization with step-by-step guidance and open-source tools.

Exploring machine learning approaches for fine tuning Llama models

Proximal policy optimization (PPO) is an algorithm that aims to improve the stability of training by avoiding overly large policy updates. It is a popular and effective method used for training [ reinforcement learning]() models in complex environments. To achieve this, PPO uses a ratio that indicates the difference between the current policy and the old policy and clips this ratio within a specific range, ensuring that the policy updates are not too large and the training process is more stable...

Proximal policy optimization

An introduction to Q-learning, a model-free reinforcement learning algorithm used to learn optimal policies in Markov Decision Processes.

Q learning

An introduction to Reinforcement Learning (RL), a machine learning method where an agent learns to make decisions by interacting with an environment. This article covers the basics of RL, including how it works, common algorithms, and its application in training models with Large Language Models (LLMs).

Introduction to reinforcement learning and its application with LLMs

A Reward model is a critical component in Reinforcement Learning for Large Language Models (LLMs), designed to evaluate and score the quality of generated responses. It plays a key role in aligning LLMs with human values and improving their output through iterative refinement.

Reward model

An overview of Open Assistant, an open-source chat-based AI assistant, and its implementation of Reinforcement Learning from Human Feedback (RLHF). This article covers the three-step process of RLHF, system requirements, and detailed setup instructions for training the model using Supervised Fine-Tuning, Reward Modeling, and Reinforcement Learning.

RLHF with Open Assistant

RFC proposing a novel approach to track concept evolution using TimescaleDB with pgvector/pgvectorscale, enabling historical semantic analysis and preventing catastrophic forgetting in continual learning systems.

Append-only concept embedding log

A collection of both our internal and external events, including the things we do with the Labs team, Consulting team, Operations, team, and the community.

#learning

A

C

D

E

I

K

L

M

P

Q

R