Part 21: Generative AI Fundamentals - LLMs, Embeddings & Vector Spaces

Understand transformer architecture, tokenization, temperature, embeddings, cosine similarity, and vector math behind large language models.

Part 21: Generative AI Fundamentals - LLMs, Embeddings & Vector Spaces

← Back to Master Index


1. Why Generative AI in 2026?

Generative AI is the hottest skill in tech. Engineers with GenAI expertise command 50-100% higher salaries than traditional developers.

Key Areas

  • LLM Engineering: Model integration, prompt engineering
  • RAG Systems: Retrieval-augmented generation
  • Vector Databases: Embedding storage and search
  • Agent Systems: Autonomous AI workflows

2. Transformer Architecture

Attention Mechanism

import torch
import torch.nn as nn
import math

class MultiHeadAttention(nn.Module):
    def __init__(self, d_model, num_heads):
        super().__init__()
        self.d_model = d_model
        self.num_heads = num_heads
        self.d_k = d_model // num_heads
        
        self.w_q = nn.Linear(d_model, d_model)
        self.w_k = nn.Linear(d_model, d_model)
        self.w_v = nn.Linear(d_model, d_model)
        self.w_o = nn.Linear(d_model, d_model)
    
    def forward(self, q, k, v, mask=None):
        # Split into heads
        q = self.w_q(q).view(-1, self.num_heads, self.d_k)
        k = self.w_k(k).view(-1, self.num_heads, self.d_k)
        v = self.w_v(v).view(-1, self.num_heads, self.d_k)
        
        # Scaled dot-product attention
        scores = torch.matmul(q, k.transpose(-2, -1)) / math.sqrt(self.d_k)
        
        if mask is not None:
            scores = scores.masked_fill(mask == 0, -1e9)
        
        attention = torch.softmax(scores, dim=-1)
        out = torch.matmul(attention, v)
        
        return self.w_o(out)

Tokenization

# BPE (Byte Pair Encoding) example
def tokenize(text, vocab):
    tokens = []
    words = text.lower().split()
    
    for word in words:
        if word in vocab:
            tokens.append(vocab[word])
        else:
            # Subword tokenization
            subwords = break_into_subwords(word)
            tokens.extend([vocab.get(sw, vocab['<unk>']) for sw in subwords])
    
    return tokens

# Special tokens
SPECIAL_TOKENS = {
    '<bos>': 0,  # Beginning of sequence
    '<eos>': 1,  # End of sequence
    '<pad>': 2,  # Padding
    '<unk>': 3,  # Unknown
}

3. Embeddings & Vector Mathematics

Word Embeddings

import numpy as np

class EmbeddingLayer:
    def __init__(self, vocab_size, embedding_dim):
        self.embedding_matrix = np.random.randn(vocab_size, embedding_dim) * 0.02
    
    def forward(self, indices):
        return self.embedding_matrix[indices]

# Cosine similarity
def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

# Example usage
embedding = EmbeddingLayer(vocab_size=10000, embedding_dim=384)
vec1 = embedding.forward([1, 2, 3])
vec2 = embedding.forward([4, 5, 6])
similarity = cosine_similarity(vec1[0], vec2[0])

Sentence Embeddings

from transformers import AutoTokenizer, AutoModel
import torch

class SentenceEmbedder:
    def __init__(self, model_name='sentence-transformers/all-MiniLM-L6-v2'):
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
        self.model = AutoModel.from_pretrained(model_name)
    
    def embed(self, sentences):
        inputs = self.tokenizer(
            sentences,
            padding=True,
            truncation=True,
            return_tensors='pt'
        )
        
        with torch.no_grad():
            outputs = self.model(**inputs)
        
        # Mean pooling
        embeddings = outputs.last_hidden_state.mean(dim=1)
        return embeddings.numpy()

# Usage
embedder = SentenceEmbedder()
sentences = ["Hello world", "Hi there"]
embeddings = embedder.embed(sentences)

4. LLM Fundamentals

Temperature Parameter

import torch.nn.functional as F

def apply_temperature(logits, temperature=1.0):
    return logits / temperature

def sample_next_token(logits, temperature=1.0, top_k=None, top_p=None):
    logits = apply_temperature(logits, temperature)
    
    if top_k is not None:
        # Top-k sampling
        kth_value = torch.topk(logits, top_k).values.min()
        logits[logits < kth_value] = float('-inf')
    
    if top_p is not None:
        # Top-p (nucleus) sampling
        sorted_logits, sorted_indices = torch.sort(logits, descending=True)
        cumulative_probs = torch.cumsum(F.softmax(sorted_logits, dim=-1), dim=-1)
        sorted_indices_to_remove = cumulative_probs > top_p
        sorted_indices_to_remove[1:] = sorted_indices_to_remove[:-1].clone()
        sorted_indices_to_remove[0] = False
        logits[:, sorted_indices[sorted_indices_to_remove]] = float('-inf')
    
    probs = F.softmax(logits, dim=-1)
    next_token = torch.multinomial(probs, num_samples=1)
    return next_token

Prompt Engineering

# Zero-shot prompting
prompt = """
Classify the sentiment of the following text:
Text: "I love this product! It's amazing."
Sentiment:
"""

# Few-shot prompting
prompt = """
Classify the sentiment of the following text:
Text: "I love this product! It's amazing."
Sentiment: Positive

Text: "This is terrible. I hate it."
Sentiment: Negative

Text: "The product is okay, nothing special."
Sentiment: Neutral

Text: "I'm really disappointed with the quality."
Sentiment:
"""

# Chain-of-thought prompting
prompt = """
Question: If a train travels 60 mph for 2 hours, then 40 mph for 3 hours, what is the total distance?
Let's think step by step:
First, calculate distance for first part: 60 mph * 2 hours = 120 miles
Second, calculate distance for second part: 40 mph * 3 hours = 120 miles
Total distance = 120 + 120 = 240 miles
Answer: 240 miles
"""

5. Resource Directory: Generative AI

Best Books

BookAuthorPriceKey Topics
Natural Language Processing with TransformersTunstall & von WerraPaidHugging Face
Building Generative AI ApplicationsO'ReillyPaidLLM engineering
Hands-On Machine LearningAurélien GéronPaidML fundamentals
Deep LearningIan GoodfellowPaidDeep learning theory

Best Udemy Courses

CourseInstructorPrice (INR)Key Topics
NLP & NLP ProjectsJose Portilla₹2,999-3,999NLP with Python
ChatGPT & GPT-4 APIColt Steele₹1,999-2,999OpenAI API
LangChain & LLMsInstructor₹1,999-2,999LangChain
Vector DatabasesInstructor₹1,499-2,299Pinecone, Chroma

Best O'Reilly Resources

ResourceTopicAccess
Building Generative AI ApplicationsO'ReillyPaid
Learning Hugging FaceO'ReillyPaid
Natural Language ProcessingO'ReillyPaid

Best LinkedIn Learning Courses

CourseInstructorAccess
Generative AI FundamentalsInstructorPaid
Working with LLMsInstructorPaid
AI Prompt EngineeringInstructorPaid

Free Resources

PlatformResourceLink
Hugging Face CourseFree coursehuggingface.co/learn
DeepLearning.AIFree coursesdeeplearning.ai
LLM ZoomcampFree coursegithub.com/alexeygrigorev/llm-zoomcamp
Awesome LLMGitHubgithub.com/StellarCK/awesome-llm

6. Common GenAI Interview Questions

QuestionAnswer
What are embeddings?Dense vector representations of text/data for ML models.
Difference between fine-tuning and prompt engineering?Fine-tuning modifies model weights, prompt engineering guides model behavior.
What is RAG?Retrieval-Augmented Generation combines LLMs with external knowledge.
How to handle hallucinations?Use factual prompts, provide sources, implement fact-checking.
What is temperature in LLMs?Controls randomness - lower = more deterministic, higher = more creative.

7. Part Navigation

Previous Parts

Part 20: Frontend Development

Next Parts

Part 22: Vector Databases · Part 23: RAG Architectures


Proceed to Part 22: Vector Databases →

Comments

Comments are powered by giscus. Set PUBLIC_GISCUS_REPO_ID and PUBLIC_GISCUS_CATEGORY_ID in your environment to enable them.