Database Modeling

Think of database modeling like figuring out the best way to structure and organize our information in a database. It invloves -

  • Identifying what data needs to be stored
  • How data relate to each other
  • And how to stucture it for efficiency and consistency

There are three level of data models -

  1. Conceptual Data Model - A high level view of what entities would exist in the database and how they relate to each other.

    Example : For a blog platform

    • Entities - User, Blog, Comment
    • Relationships - 1. A user writes blogs 2. A blog has comments 3. A comment belongs to one user and one blog

  1. Logical Data Model - At this stage, we add more structure by defining fields, data types, constraints, relationships etc. to conceptual data model. In this satge we should stay database-agnostic

    1. User
      • id : integer → Primary key
      • name : string
      • email : string → Unique
    2. Blog
      • id : integer → Primary key
      • user_id : integer → Foreign Key(User)
      • title : string
      • content : text
    3. Comment
      • id : integer → Primary key
      • user_id : integer → Foreign Key(User)
      • blog_id : integer → Foreign Key(Blog)
      • content : text

  1. Physical Data Model - This is stage, where the blueprint is turned into an actual database implementation with tables, indexes, constraints, and DBMS-specific settings

        -- User Table
        CREATE TABLE users (
        id SERIAL PRIMARY KEY,
        name VARCHAR(100),
        email VARCHAR(150) UNIQUE
        );
    
        -- Blog Table
        CREATE TABLE blogs (
        id SERIAL PRIMARY KEY,
        user_id INTEGER REFERENCES users(id),
        title VARCHAR(255),
        content TEXT
        );
    
        -- Comment Table
        CREATE TABLE comments (
        id SERIAL PRIMARY KEY,
        user_id INTEGER REFERENCES users(id),
        blog_id INTEGER REFERENCES blogs(id),
        content TEXT
        );
        -- Indexes for performance optimization
        CREATE INDEX idx_user_id_on_blogs ON blogs(user_id);
        CREATE INDEX idx_user_id_on_comments ON comments(user_id);
        CREATE INDEX idx_blog_id_on_comments ON comments(blog_id);
    


Common Jargon and usecase

For better understanding, we will use Intergalactic Pet Shop example. In Intergalactic Pet Shop we sell exotic alien creatures from different planets to space-traveling people. In our pet shop we'll sell creatures to space traveling customers and track thier purchases.

1. Entity or Relation

An entity is basically just anything from the real world that we want to store data about. It becomes a table in the database.

Examples - Creatures, Customers, Purchases


2. Attribute

Properties or characteristics of an entity. It becomes column in a table.

Examples - email in the Customers table. name in the Creatures table.


3. Record or Row or Tuple

Single row in the table

Examples - A single creature's data: id: 1, name: Blorb, origin_planet: Neptune-5


4. Datatypes

A data type defines the kind of data that can be stored in a column of a table. Every attribute in a table must have a specific data type. It helps database to interpret and validata that data.

Common datatypes - VARCHAR, TEXT, INT, DECIMAL, TIMESTAMP, BOOLEAN, ENUM, DATE etc.


5. keys

Keys are set of attributes used to uniquely identify records and define relationships between tables. Keys ensure data integrity and make it possible to relate different tables efficiently.

Type of keys -

  1. Super Key - It a set of attributes that can identify each records uniquely. It is not restricted to have any specific number of attributes. Example -

    creatures(id, name, species, origin_planet, is_dangerous, about, size, price). Here Id and name will be unique

    combinations of super keys -

    • id → already unique
    • name → already unique
    • id, name
    • name, species, origin_planet etc.
  2. Candidate Key - A minimal super key that can identify each records uniquely is called as a candidate key. No extra, unnecessary columns. There can be multiple candidate keys in a table

    Example - id and name

  3. Primary Key - A primary key is a candidate key that we select while desiging database model. A relation is allowed to have only one primary key.

    Example - id or name can be Primary Key

  4. Foreign Key - A foreign key is a column or a set of columns in one table that refers to the primary key of another table. It is used to create a relationship between two tables. It connects related data across tables.

    Example -

    creatures(id, name, species, origin_planet, is_dangerous, about, size, price) and

    purchases(id, creature_id, purchase_date)

    creature_id is a foreign key in purchases table, which references the id which is primary key in the creatures table. It ensures that we can’t have record of purchase for a creature that doesn’t exist.


6. Constraints

Constraints are rules applied to attributes to enforce data integrity and its validity.

  1. Domain Constraint

    It sets rules for what kind of values an attribute can have in a table, like data types, allowed range, format, or specific choices.

    Example 1 : temperament can only be 'playful', 'aggressive', or 'lazy'

    temperament VARCHAR(50) CHECK (temperament IN ('playful', 'aggressive', 'lazy'));
    

    Only the listed temperaments are allowed in the creatures table. We can not insert "angary" or "friendly" as temperament.

    Example 2 : lifespan_years must be a positive integer

    lifespan_years INT CHECK (lifespan_years > 0)
    

    Example 3 : is_dangerous should only be TRUE or FALSE.

    is_dangerous BOOLEAN DEFAULT FALSE
    

  1. Tuple Uniqueness Constraint

    A tuple uniqueness constraint ensures that no two tuples in a table are similar across specified columns.

    Example 1

    CREATE TABLE customers (
    id INT PRIMARY KEY,
    email VARCHAR(100) UNIQUE
    );
    

    This ensure that two customers can't have the same email


  1. Key Constraint

    In any relation key constraint tells that, all the values of candidate key or primary key must be unique and minimal. For example no two creatures can have the same ID.


  1. Entity Integrity Constraint

    In any relation entity integrity constraint tells that, That the choosen primary key column(s) cannot be NULL and must exist for every tuple.

    CREATE TABLE creatures (
    id INT PRIMARY KEY, -- This can never be NULL
    name VARCHAR(100)
    );
    

  1. Referential Integrity Constraint

    When one table is linked to another using a foreign key. This rule make sure that

    • The value in the foreign key column must already exist in the referenced (parent) table.
    • If not, the value can be NULL (only if the column allows it).

    Example -

    -- purchases table
    ...
    FOREIGN KEY (creature_id) REFERENCES customers(id)
    

    In purchases, creature_id must match an existing creature’s ID. We can not purchase a creature that doesn’t exist in creatures table.