PostgreSQL

PostgreSQL Interview Questions and Answers

Section 1: PostgreSQL Architecture & Internals (20+ Questions)

1. What is the architecture of PostgreSQL?
PostgreSQL follows a client-server model. It consists of the following key components:

  • Postmaster (master process)

  • Backend Server Processes (handle queries)

  • Shared Buffers

  • WAL (Write-Ahead Logging)

  • Background Writer and Checkpointer

2. What is a WAL (Write-Ahead Log)?
WAL ensures durability. All changes are first written to WAL before the actual database. In case of a crash, WAL can be used to redo the operations.

3. Explain MVCC in PostgreSQL.
MVCC (Multi-Version Concurrency Control) allows concurrent access by maintaining multiple versions of a row. It uses xmin/xmax system columns.

4. What is the purpose of vacuuming in PostgreSQL?
VACUUM reclaims storage by removing obsolete row versions created by MVCC.

5. Difference between VACUUM and VACUUM FULL?

  • VACUUM cleans dead tuples.

  • VACUUM FULL compacts tables and releases space back to the OS.

6. What are shared buffers in PostgreSQL?
They are memory buffers used to cache data blocks, reducing disk I/O.

7. How does PostgreSQL handle crash recovery?
Using WAL, checkpointing, and the crash recovery process that replays WAL entries.

8. What is a Checkpointer?
Flushes all dirty pages from shared buffers to disk periodically to reduce recovery time.

9. What are background writer and autovacuum daemons?

  • Background Writer writes dirty pages.

  • Autovacuum triggers VACUUM and ANALYZE automatically.

10. What is a visibility map?
It helps VACUUM skip pages where all tuples are visible (no updates/deletes).

11. How is a query processed in PostgreSQL?
Query stages: Parsing → Rewriting → Planning → Optimization → Execution

12. What is the System Catalog?
PostgreSQL maintains metadata in system tables (e.g., pg_class, pg_attribute).

13. Explain the role of pg_stat_activity.
It tracks all current server activity including running queries.

14. How does PostgreSQL implement isolation levels?
Using MVCC and transaction snapshots.

15. What is pg_wal directory?
Stores WAL segment files.

16. Difference between logical and physical replication?

  • Physical: Byte-level replication of WAL.

  • Logical: Row-level replication using logical decoding.

17. What is a CLOG (Commit Log)?
It tracks the commit status of transactions.

18. What is the purpose of transaction ID wraparound?
To prevent transaction ID exhaustion; requires VACUUM.

19. How is memory managed in PostgreSQL?
Memory areas:

  • Shared Buffers

  • Work Mem

  • Maintenance Work Mem

  • Temp Buffers

20. What is the role of pg_control?
Stores metadata about the database system state.

21. What is a WAL segment size and how can you modify it?
Default is 16MB; change requires recompilation with --with-wal-segsize.


Section 2: Advanced SQL Features in PostgreSQL (20+ Questions)

1. What are Common Table Expressions (CTEs)?
CTEs allow writing modular queries. Syntax:

WITH cte AS (
  SELECT * FROM users WHERE age > 30
)
SELECT * FROM cte;

2. What is a window function?
Window functions compute values across a set of rows related to the current row:

SELECT name, salary, RANK() OVER (ORDER BY salary DESC) FROM employees;

3. How do you use UPSERT in PostgreSQL?

INSERT INTO users(id, name) VALUES (1, 'John')
ON CONFLICT (id) DO UPDATE SET name = EXCLUDED.name;

4. How do you implement full-text search in PostgreSQL?
Use tsvector and tsquery:

SELECT * FROM articles WHERE to_tsvector('english', body) @@ to_tsquery('postgres');

5. What is JSONB and how is it different from JSON?

  • JSON: Text-based

  • JSONB: Binary format with indexing capabilities

6. How to query JSONB data?

SELECT data->'name' FROM users WHERE data @> '{"active": true}';

7. How to create and use partial indexes?

CREATE INDEX idx_active_users ON users(id) WHERE active = true;

8. What is a materialized view?
A precomputed result set stored on disk.

CREATE MATERIALIZED VIEW mv_users AS SELECT * FROM users;
REFRESH MATERIALIZED VIEW mv_users;

9. What are generated columns?
Automatically computed:

CREATE TABLE t (a int, b int GENERATED ALWAYS AS (a * 2) STORED);

10. How do you use array functions?

SELECT * FROM users WHERE 3 = ANY(numbers);

11. Explain the DISTINCT ON clause.

SELECT DISTINCT ON (category) * FROM products ORDER BY category, price;

12. What are set-returning functions?
Functions that return a set of rows.

13. How does PostgreSQL handle NULLs in comparison?
NULL IS NULL is true, but NULL = NULL is false.

14. How to create custom data types?

CREATE TYPE mood AS ENUM ('sad', 'ok', 'happy');

15. What is TABLESAMPLE?
Used to get a random sample:

SELECT * FROM users TABLESAMPLE SYSTEM (10);

16. How do you create a function in PostgreSQL?

CREATE FUNCTION add(a int, b int) RETURNS int AS $$
BEGIN
  RETURN a + b;
END;
$$ LANGUAGE plpgsql;

17. What is lateral join?
Allows subqueries to refer to outer query columns:

SELECT u.name, p.* FROM users u, LATERAL (SELECT * FROM posts p WHERE p.user_id = u.id) p;

18. How do you use COALESCE?
Returns the first non-null value:

SELECT COALESCE(name, 'Unknown') FROM users;

19. What is the difference between ROW_NUMBER, RANK, and DENSE_RANK?

  • ROW_NUMBER - unique sequence

  • RANK - allows gaps

  • DENSE_RANK - no gaps

20. What is the purpose of the FILTER clause in aggregates?

SELECT COUNT(*) FILTER (WHERE active = true) FROM users;

21. How do you use ILIKE?
Case-insensitive LIKE:

SELECT * FROM users WHERE name ILIKE 'john%';

Section 3: Indexing & Performance Tuning (20+ Questions)

1. What are the different types of indexes in PostgreSQL?

  • B-Tree (default)

  • Hash

  • GIN (Generalized Inverted Index)

  • GiST (Generalized Search Tree)

  • SP-GiST

  • BRIN (Block Range INdexes)

2. When should you use GIN indexes?
For full-text search and array operations.

3. How do you create a GIN index?

CREATE INDEX idx_gin ON documents USING GIN(to_tsvector('english', content));

4. What is the use of EXPLAIN and EXPLAIN ANALYZE?
They show the execution plan and actual run time of a query.

5. How do you identify slow queries?
Using pg_stat_statements, auto_explain, and query logs.

6. What is the role of work_mem?
Defines the memory used for internal sort operations and hash joins.

7. What is shared_buffers and how should it be set?
Typically 25%-40% of system memory; caches data blocks.

8. How does PostgreSQL use multicolumn indexes?
Supports queries using a left-prefix of indexed columns.

9. What are covering indexes?
Include all columns used by a query so PostgreSQL doesn’t need to read the table.

10. How do you monitor index usage?
Using pg_stat_user_indexes and pg_stat_all_indexes.

11. What is index-only scan?
When PostgreSQL can return data just from the index without reading the table.

12. How do you detect and avoid sequential scans?
Use EXPLAIN, optimize queries, and add proper indexes.

13. What are HOT updates?
Heap-only tuples: updates that don’t modify indexed columns and don’t require index updates.

14. How to monitor and manage bloat?
Use pgstattuple, pg_repack, and autovacuum tuning.

15. How do you profile a query?
Enable auto_explain and use EXPLAIN ANALYZE.

16. How do indexes impact write performance?
Each insert/update/delete may also update one or more indexes.

17. What is sequential scan?
When PostgreSQL reads every row in the table.

18. How can VACUUM help performance?
Reclaims dead tuples and updates visibility.

19. What is parallel query execution?
PostgreSQL supports parallel scans, joins, and aggregates in supported queries.

20. How can you force PostgreSQL to use an index?
Use SET enable_seqscan = off; for testing.

21. What is random_page_cost and how does it affect planner decisions?
Affects cost estimation of non-sequential I/O; lower it if you use SSDs.


Section 4: Backup, Security & Administration (20+ Questions)

1. How do you take a full backup in PostgreSQL?
Use pg_basebackup or pg_dumpall.

2. What is the difference between pg_dump and pg_basebackup?

  • pg_dump: Logical backup

  • pg_basebackup: Physical binary backup

3. How do you restore a PostgreSQL backup?

psql -f backup.sql dbname

4. How do you implement role-based access control?
Use CREATE ROLE, GRANT, and REVOKE.

5. How to enable SSL for PostgreSQL?
Update postgresql.conf and provide certificates.

6. How to check database size?

SELECT pg_size_pretty(pg_database_size('dbname'));

7. How do you rotate logs in PostgreSQL?
Use log_rotation_age and log_rotation_size settings.

8. What is the purpose of pg_hba.conf?
Controls host-based authentication.

9. How do you implement encryption at rest?
Use file system encryption or third-party tools.

10. How do you secure PostgreSQL from SQL injection?
Use prepared statements and parameterized queries.

11. How do you monitor PostgreSQL?
Use pg_stat_* views, Prometheus exporters, and pgAdmin.

12. How do you limit user connections?
Set max_connections and use connection poolers like PgBouncer.

13. How to find and kill long-running queries?

SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE state='active' AND now() - query_start > interval '5 minutes';

14. How do you check current locks?

SELECT * FROM pg_locks;

15. What is logical decoding?
Feature for change-data-capture; needed for logical replication.

16. How to schedule periodic backups?
Use cron with pg_dump.

17. How to grant superuser privileges?

ALTER ROLE username WITH SUPERUSER;

18. How do you perform point-in-time recovery (PITR)?
Use restore_command, recovery.signal, and WAL archive.

19. How do you list all active users?

SELECT usename FROM pg_user;

20. How do you create an encrypted password for a user?

CREATE USER user1 WITH ENCRYPTED PASSWORD 'securepass';

Section 5: Replication & Clustering (20+ Questions)

1. What are the types of replication supported?

  • Streaming replication

  • Logical replication

2. How does streaming replication work?
Standby servers stream WAL from the primary server.

3. What is replication slot?
Prevents WAL from being recycled until replicas consume it.

4. How do you set up logical replication?
Use CREATE PUBLICATION and CREATE SUBSCRIPTION.

5. What is the role of wal_level in replication?
Must be set to replica or logical for respective replication types.

6. How do you monitor replication lag?
Use pg_stat_replication view.

7. How do you promote a standby server?

touch $PGDATA/standby.signal
pg_ctl promote

8. What is cascading replication?
A standby server acts as a source for another standby.

9. What is synchronous replication?
WAL is written to all synchronous standbys before commit completes.

10. What are the failover and switchover strategies?
Use tools like Patroni, repmgr, or manual scripts.

11. What is BDR (Bi-Directional Replication)?
Allows multi-master replication setup.

12. What are the advantages of logical replication?
Allows row-level filtering and schema evolution.

13. Can you replicate partial data?
Yes, with logical replication using row filters.

14. What is archive_mode?
Enables WAL archiving.

15. How do you implement load balancing in a replicated setup?
Use PgBouncer or HAProxy with read/write splitting.

16. How does PostgreSQL ensure data consistency in replication?
Through WAL streaming and synchronous commit options.

17. What is the role of primary_conninfo?
Defines how the standby connects to the primary server.

18. What are replication conflicts?
Occur in logical replication due to concurrent updates.

19. How do you monitor WAL file size and usage?
Use pg_current_wal_lsn() and pg_wal_lsn_diff().

20. How to re-synchronize a failed replica?
Recreate using base backup or use tools like pgBackRest.


Scroll to Top