Part 7: Relational Database Engineering - PostgreSQL Mastery

Master indexing strategies (B-Tree, GIN, GiST), query planning (EXPLAIN ANALYZE), transactions, isolation levels, and migrations in PostgreSQL.

Part 7: Relational Database Engineering - PostgreSQL Mastery

← Back to Master Index


1. Why PostgreSQL in 2026?

PostgreSQL is the world's most advanced open-source relational database, powering major tech companies and offering features comparable to commercial databases at zero cost.

Key Advantages

  • Advanced indexing: B-tree, GiST, GIN, BRIN, Hash
  • JSON support: JSONB for document storage
  • ACID compliance: Full transaction support
  • Extensibility: Custom types, functions, extensions
  • Concurrency: MVCC for high-performance reads

Salary Impact

Backend engineers with PostgreSQL expertise command 15-25% higher salaries than those without advanced database skills.


2. PostgreSQL Fundamentals

Basic Concepts

-- Creating tables with constraints
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    email VARCHAR(255) UNIQUE NOT NULL,
    username VARCHAR(50) UNIQUE NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    is_active BOOLEAN DEFAULT TRUE
);

-- Indexes for performance
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_users_created_at ON users(created_at DESC);

Data Types

TypeDescriptionExample
SERIALAuto-incrementing integerid SERIAL PRIMARY KEY
VARCHAR(n)Variable-length stringname VARCHAR(100)
TEXTUnlimited length stringdescription TEXT
JSONBBinary JSON storagemetadata JSONB
TIMESTAMPDate and timecreated_at TIMESTAMP
NUMERICExact precision decimalprice NUMERIC(10,2)

3. Indexing Strategies

B-Tree Indexes (Default)

-- Basic B-tree index
CREATE INDEX idx_users_email ON users(email);

-- Composite index (order matters!)
CREATE INDEX idx_users_active_created ON users(is_active, created_at DESC);

-- Unique index
CREATE UNIQUE INDEX idx_users_username ON users(username);

GiST (Generalized Search Tree)

-- For geometric data, text patterns
CREATE INDEX idx_posts_title_gist ON posts USING GiST(title gin_trgm_ops);

-- Full-text search
ALTER TABLE articles ADD COLUMN search_vector TSVECTOR;
CREATE INDEX idx_articles_search ON articles USING GiST(search_vector);

GIN (Generalized Inverted Index)

-- For JSONB columns
CREATE INDEX idx_users_metadata_gin ON users USING GIN(metadata);

-- Full-text search with TSVECTOR
CREATE INDEX idx_articles_search_gin ON articles USING GIN(search_vector);

BRIN (Block Range Index)

-- For very large tables with physical ordering
CREATE INDEX idx_large_table_created_brin ON large_table(created_at) USING BRIN;

Index Performance Analysis

-- Analyze query performance
EXPLAIN ANALYZE SELECT * FROM users WHERE email = '[email protected]';

-- Show index usage
SELECT schemaname, tablename, indexname, idx_tup_read, idx_tup_fetch 
FROM pg_stat_user_indexes 
WHERE tablename = 'users';

4. Query Optimization

EXPLAIN and EXPLAIN ANALYZE

-- Basic explain
EXPLAIN SELECT * FROM users WHERE email = '[email protected]';

-- Detailed analysis with timing
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) 
SELECT u.*, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at > '2024-01-01'
GROUP BY u.id;

Common Optimization Techniques

-- Avoid SELECT *
SELECT id, email, username FROM users;

-- Use LIMIT for pagination
SELECT * FROM users ORDER BY created_at DESC LIMIT 10 OFFSET 20;

-- Use JOINs instead of subqueries
SELECT u.email, o.total
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE o.created_at > '2024-01-01';

-- Instead of:
SELECT email FROM users WHERE id IN (
    SELECT user_id FROM orders WHERE created_at > '2024-01-01'
);

Materialized Views

-- Create materialized view for complex queries
CREATE MATERIALIZED VIEW user_stats AS
SELECT 
    user_id,
    COUNT(*) as total_orders,
    SUM(total) as total_spent
FROM orders
GROUP BY user_id;

-- Index the materialized view
CREATE INDEX idx_user_stats_user_id ON user_stats(user_id);

-- Refresh periodically
REFRESH MATERIALIZED VIEW CONCURRENTLY user_stats;

5. Transactions and Concurrency

Transaction Isolation Levels

-- Read committed (default)
BEGIN ISOLATION LEVEL READ COMMITTED;
SELECT * FROM users WHERE id = 1;
UPDATE users SET last_login = NOW() WHERE id = 1;
COMMIT;

-- Repeatable read
BEGIN ISOLATION LEVEL REPEATABLE READ;
SELECT COUNT(*) FROM orders;  -- Consistent snapshot
-- Other transactions' changes won't affect this query
COMMIT;

Locking Strategies

-- Explicit locking
BEGIN;
SELECT * FROM accounts WHERE id = 1 FOR UPDATE;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;

-- Advisory locks for application-level locking
SELECT pg_advisory_lock(12345);
-- Critical section
SELECT pg_advisory_unlock(12345);

Deadlock Prevention

-- Always lock rows in consistent order
-- Bad:
SELECT * FROM orders WHERE user_id = 1 FOR UPDATE;  -- Locks user
SELECT * FROM users WHERE id = 1 FOR UPDATE;        -- Locks order

-- Good:
SELECT * FROM users WHERE id = 1 FOR UPDATE;        -- Locks user first
SELECT * FROM orders WHERE user_id = 1 FOR UPDATE;  -- Then order

6. Connection Pooling and Performance

Connection Pool Configuration

# Using psycopg2 with connection pooling
import psycopg2
from psycopg2 import pool

# Create connection pool
connection_pool = psycopg2.pool.ThreadedConnectionPool(
    minconn=5,
    maxconn=20,
    host="localhost",
    database="myapp",
    user="postgres",
    password="password"
)

# Get connection from pool
conn = connection_pool.getconn()
try:
    cur = conn.cursor()
    cur.execute("SELECT * FROM users")
    users = cur.fetchall()
finally:
    connection_pool.putconn(conn)  # Return to pool

Query Performance Monitoring

-- Monitor slow queries
SELECT query, mean_time, calls
FROM pg_stat_statements
ORDER BY mean_time DESC
LIMIT 10;

-- Reset statistics
SELECT pg_stat_statements_reset();

-- Check table sizes
SELECT 
    schemaname,
    tablename,
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as size
FROM pg_tables
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;

7. Resource Directory: PostgreSQL

Best Books

BookAuthorPriceKey Topics
PostgreSQL: Up and RunningRegina ObePaidSetup, configuration
Effective PostgreSQLHans-Jürgen SchönigPaidOptimization, performance
PostgreSQL Developer's GuideJohn D. MitchellPaidProgramming with PostgreSQL
The Art of PostgreSQLDimitri FontainePaidAdvanced techniques

Best Udemy Courses

CourseInstructorPrice (INR)Key Topics
PostgreSQL MasterclassStephane Maarek₹1,999-2,999Administration, optimization
PostgreSQL for DevelopersJose Portilla₹999-1,499Query optimization, indexing
Database Design & ManagementColt Steele₹999-1,499Design patterns, normalization
Advanced PostgreSQLSijmen Schoon₹1,499-1,999Advanced features

Best O'Reilly Resources

ResourceTopicAccess
PostgreSQL 14 High PerformanceO'ReillyPaid
Learning PostgreSQLO'ReillyPaid
PostgreSQL Up and RunningO'ReillyPaid

Best LinkedIn Learning Courses

CourseInstructorAccess
PostgreSQL Essential TrainingErin AllardPaid
SQL for ProfessionalsBarron StonePaid
Database DesignAlicia CwankarPaid

Free Resources

PlatformResourceLink
PostgreSQL DocsOfficial documentationpostgresql.org/docs
Explain.depesz.comQuery explain analyzerexplain.depesz.com
PGMustelidaeExtensions listpgmustelidae.com
Awesome PostgreSQLGitHub collectiongithub.com/davidwiley/awesome-postgresql

8. Common PostgreSQL Interview Questions

QuestionAnswer
Difference between UUID and SERIAL?UUID is globally unique identifier, SERIAL is auto-incrementing integer
What is VACUUM?Recovers space and updates statistics, VACUUM ANALYZE does both
How to optimize slow queries?Use EXPLAIN ANALYZE, add indexes, rewrite query
What is MVCC?Multi-Version Concurrency Control allows simultaneous reads/writes
Difference between JSONB and JSON?JSONB is binary, faster to process, supports indexing

9. Part Navigation

Previous Parts

Part 6: TypeScript & Node.js

Next Parts

Part 8: MongoDB Deep Dive · Part 9: Redis & Caching


Proceed to Part 8: MongoDB Deep Dive →

Comments

Comments are powered by giscus. Set PUBLIC_GISCUS_REPO_ID and PUBLIC_GISCUS_CATEGORY_ID in your environment to enable them.