DBMS Numericals Practice


Identifying relationships between tables.

Example 1

Table A: Students

StudentID Name Email
1 Alice alice@example.com
2 Bob bob@example.com
3 Charlie charlie@example.com

Table B: Student_ID_Cards

CardID StudentID IssueDate ExpiryDate
101 1 2024-01-01 2028-01-01
102 2 2024-02-15 2028-02-15
103 3 2024-03-10 2028-03-10

1:1 relationship

Relationship table:

StudentID CardID
1 101
2 102
3 103
StudentID Name Email CardID
1 Alice alice@example.com 101
2 Bob bob@example.com 102
3 Charlie charlie@example.com 103

Example 2

Table A: Authors

AuthorID Name
A1 Jane Doe
A2 John Smith
A3 Amy Lin

Table B: Books

BookID Title AuthorID
B1 Learning SQL A1
B2 Advanced Databases A1
B3 Distributed Systems A2
B4 Data Structures in C++ A3
B5 Data Science 101 A1

1:N relationship

AuthorID BookID
A1 B1
A1 B2
A2 B3
A3 B4
A1 B5

Merged table:

BookID Title AuthorID
B1 Learning SQL A1
B2 Advanced Databases A1
B3 Distributed Systems A2
B4 Data Structures in C++ A3
B5 Data Science 101 A1

Example 3 – Many-to-Many Relationship

Table A: Students

StudentID Name
S1 Alice
S2 Bob
S3 Charlie

Table B: Courses

CourseID CourseName
C101 Database Systems
C102 Operating Systems
C103 Artificial Intelligence

N:N relationship since a single student can be enrolled in multiple courses, or multiple students can be enrolled in the same course, or multiple students can be enrolled in multiple courses

A potential relationship table would be:

StudentID CourseID
S1 C101
S2 C101
S2 C102
S3 C101
S3 C103

Since this is a complex N:N relationship with multiple possible relationship tables, it's not possible to merge the two tables and reduce them into a single merged table.


Example 4 – Mixed Relationships

Table A: Teachers

TeacherID Name Department
T1 Dr. Sharma Computer Sci.
T2 Ms. Verma Information Tech
T3 Mr. Rao Electronics

Table B: Courses

CourseID CourseName TeacherID
C101 Database Systems T1
C102 Operating Systems T2
C103 Digital Electronics T3
C104 Artificial Intelligence T1

Table C: Students

StudentID Name CourseID
S1 Alice C101
S2 Bob C102
S3 Charlie C101

Additional Info:


First of all primary keys:

Table A: TeacherID
Table B: CourseID
Table C: StudentID

Now, since each course can only have one teacher, but a teacher can teach multiple courses, have a 1:N relationship for Teacher:Course

We can get a relationship table between tables A and B as :

TeacherID CourseID
T1 C101
T2 C102
T3 C103
T1 C104

So can these two tables be merged? Yes, using the "many" table as the base, we already have a reduced table as Table B itself:

CourseID CourseName TeacherID
C101 Database Systems T1
C102 Operating Systems T2
C103 Digital Electronics T3
C104 Artificial Intelligence T1

Now, for tables B and C:

Since Students can enroll in multiple courses, and each course can have many students,

We have a N:N relationship for Student:Course.

A possible relationship table for these two would be:

StudentID CourseID
S1 C101
S2 C101
S2 C102
S3 C101
S3 C103

And since there are way too many possibilities, we can't merge or reduce these two tables together.

Now, due to the fact that Table A is related to Table B and Table B is related to Table C, using a transitive dependency we can derive this relation:

"Teachers can teach multiple courses, and each course can have multiple students"

So

                - S1
               /
              /
		 - C101 -- S2
		  /    \
		 /      \
		/        - S3
       / 
Teacher -- C102
      \
       \
        \
         - C103

So a single teacher can teach multiple students, i.e, a 1:N relationship between Table A and Table C

A potential relationship table could be:

TeacherID StudentID
T1 S1
T2 S2
T3 S2
T1 S3
T1 S2
T2 S3
T3 S1

Can these two tables be merged?

Yes, using the "many table" as the base and using Course as a conduit in between:

TeacherID CourseID StudentID Name
T1 C101 S1 Alice
T2 C102 S2 Bob
T3 C103 S2 Bob
T1 C104 S3 Charlie
T1 C105 S2 Bob
T2 C104 S3 Charlie
T3 C101 S1 Alice

This part is what we call a derived attribute in the ER model.


Example 4

Table: Employee

EmployeeID Name ManagerID
E1 Alice NULL
E2 Bob E1
E3 Charlie E1
E4 Diana E2
E5 Eva E2

This table's primary key is EmployeeID.

From what I see, there's a null entry in ManagerID for employee E1, possibly indicating that E1 herself is the top manager or just simply doesn't have a manager.

This also indicates that ManagerID is a foreign key to a possible Manager table, which is why it doesn't have a NOT NULL constraint on it.

Employee table also shows a 1:N self-recurring relationship since a single employee can be a manager to multiple other employees, however each single employee can have only 1 manager.

This table already appears to be merged from two possible parent Employee and Manager tables, so I highly doubt that it can be merged and reduced further.


Relational Algebra Practice

Say we have these 3 Schemas:

πŸ“˜ Schema (reused, slightly trimmed for relevance):

Employee(EmpID, Name, DeptID, Salary)

Department(DeptID, DeptName, Location)

Project(ProjectID, ProjectName, DeptID)

And 3 tables based on these schemas

πŸ“‹ Table: Employee

EmpID Name DeptID Salary
E001 Alice D001 60000
E002 Bob D002 48000
E003 Charlie D001 52000
E004 Diana D003 45000
E005 Ethan D004 70000

πŸ“‹ Table: Department

DeptID DeptName Location
D001 IT Delhi
D002 HR Mumbai
D003 Finance Kolkata
D004 IT Kolkata
D005 Sales Mumbai

πŸ“‹ Table: Project

ProjectID ProjectName DeptID
P001 Apollo D001
P002 Mercury D003
P003 Artemis D005
P004 Gemini D002
P005 Skylab D004

And we have the following questions to do:

Firstly we will do relational algebra queries on these then practice on the console.

Employee(EmpID, Name, DeptID, Salary)

Department(DeptID, DeptName, Location)

Project(ProjectID, ProjectName, DeptID)

  1. Ο€NameΒ (ΟƒEmployee.DeptIDΒ =Β Department.DeptID ∧ DeptNameΒ =Β "IT")(EmployeeΒ Γ—Β Department)
  2. Ο€DeptName(ΟƒLocationΒ =Β "Delhi")(Department)
  3. Ο€(EmpID,Β Name)Β (ΟƒSalary>50000)(Employee)
  4. Ο€(EmpID,Β Name)Β (ΟƒEmployee.DeptIDΒ =Β Department.DeptID ∧ LocationΒ =Β "Kolkata")(EmployeeΒ Γ—Β Department)
  5. Ο€(Name,Β DeptName)Β (ΟƒEmployee.DeptIDΒ =Β Department.DeptID)(EmployeeΒ Γ—Β Department)
  6. Ο€EmpIDΒ (ΟƒEmployee.DeptIDΒ =Β Department.DeptID ∧ LocationΒ β‰ Β "Mumbai")(EmployeeΒ Γ—Β Department)
  7. Ο€NameΒ (ΟƒEmployee.DeptIDΒ =Β Department.DeptID ∧ DeptIDΒ β‰ Β "D005")(EmployeeΒ Γ—Β Department)
  8. Ο€DeptNameΒ (ΟƒLocationΒ =Β "Delhi")(Department)Β βˆ©Β Β Ο€DeptNameΒ (ΟƒLocationΒ =Β "Mumbai")(Department)
  9. Ο€DeptName(ΟƒLocationΒ =Β "Delhi"Β Β βˆͺΒ Β LocationΒ =Β "Mumbai")(Department)

Relational Calculus

Some practice questions for Tuple Relational Calculus and Domain Relational Calculus

So we have these two schema again

πŸ“ Schema Reminder:

We have this question:

  1. Retrieve the names of employees who work in departments located in Kolkata.

Using Tuple relational Calculus (TRC):

E.Name ∣ E∈EmployeeΒ βˆ§Β βˆƒDΒ (D∈Department ∧ E.DeptIDΒ =Β D.DeptID ∧ D.LocationΒ =Β "Kolkata")

Using Domain relational Calculus (DRC):

NameΒ βˆ£Β βˆƒEmpIDΒ βˆƒNameΒ βˆƒDeptIDΒ βˆƒSalaryΒ βˆƒLocationΒ (<EmpID,Name,DeptID,Salary>∈Employee) ∧ <DeptID,DeptName,Location>∈Department ∧ LocationΒ =Β "Kolkata"
  1. Retrieve the names of employees with a salary greater than 50000.

Using TRC:

E.Name ∣ E∈Employee ∧ E.Salary>50000

Using DRC:

NameΒ βˆ£Β βˆƒEmpIDΒ βˆƒNameΒ βˆƒDeptIDΒ βˆƒSalary(<EmpID,Name,DepID,Salary>∈Employee ∧ Salary>50000)
  1. Find the names of the employees who work in the "IT" department.

Using TRC:

E.Name ∣ E∈EmployeeΒ βˆ§Β βˆƒDΒ (D∈Department ∧ E.DeptIDΒ =Β D.DeptID ∧ D.DeptNameΒ =Β "IT")
Using DRC:

NameΒ βˆ£Β βˆƒEmpIDΒ βˆƒNameΒ βˆƒDeptIDΒ βˆƒSalaryΒ βˆƒLocationΒ (<EmpID,Name,DeptID,Salary>∈Employee) ∧ <DeptID,DeptName,Location>∈Department ∧ DeptNameΒ =Β "IT"
  1. Get the project names of projects that belong to departments located in Delhi

Using TRC:

P.ProjectName ∣ P∈ProjectΒ βˆ§Β βˆƒDΒ (D∈Department ∧ P.DeptIDΒ =Β D.DeptID ∧ LocationΒ =Β "Delhi")
Using DRC:

ProjectNameΒ βˆ£Β βˆƒProjectIDΒ βˆƒProjectNameΒ βˆƒDeptIDΒ βˆƒDeptNameΒ βˆƒLocationΒ (<ProjectID,ProjectName,DeptID>∈Project) ∧ <DeptID,DeptName,Location>∈Department ∧ LocationΒ =Β "Delhi"
  1. Retrieve employee IDs of employees who do not work in the "HR" department.

Using TRC:

E.EmpID ∣ E∈EmployeeΒ βˆ§Β βˆ€DΒ (D∈Department ∧ E.DeptIDΒ =Β D.DeptID ∧ D.DeptNameΒ β‰ Β "HR")

Using DRC:

EmpIDΒ βˆ£Β βˆƒEmpIDΒ βˆƒNameΒ βˆƒDeptIDΒ βˆƒSalaryΒ βˆ€DeptNameΒ βˆ€LocationΒ (<EmpID,Name,DeptID,Salary>∈Employee) ∧ <DeptID,DeptName,Location>∈Department ∧ DeptNameΒ β‰ Β "HR"

Normalization

πŸ”§ Practice Example 1:

Relation: Student(CourseID, StudentID, InstructorName, CourseName, DeptName)

Functional Dependencies:

βœ… Task: Normalize this relation up to 3NF.

Firstly we would need to convert this to 1NF. And since we don't really have tables here to work with, I will just assume that we are already working with 1NF.

To convert to 2NF first we need to identify the candidate key, and it turn, the prime attributes.

Let's say we have :

AΒ =Β CourseID,Β BΒ =Β StudentID,Β CΒ =Β InstructorName,Β DΒ =Β CourseName,Β EΒ =Β DeptName

And we have the functional dependencies as:

AΒ β†’Β D,Β C,Β E

By decomposition we get:

AΒ β†’Β D
AΒ β†’Β C
AΒ β†’Β E

B,AΒ β†’Β A,Β B,Β C,Β D,Β E

Similarly, by decomposition:

BΒ β†’Β A
BΒ β†’Β B
BΒ β†’Β C
BΒ β†’Β D
BΒ β†’Β E

AΒ β†’Β A
AΒ β†’Β B
AΒ β†’Β C
AΒ β†’Β D
AΒ β†’Β E

So let's say we did a closure of A

A+Β =Β {D,C,E}

And closure of B

B+Β =Β {A,B,C,D,E}

Closure of C:

C+Β =Β {C}

Closure of D:

D+Β =Β {D}

Closure of E:

E+Β =Β {E}

If we were to group the attributes, let's say A and B and take their closure:

AB+Β =Β {A,B,C,D,E}

So AB or {CourseID, StudentID} is the candidate key.

So, the remaining attributes C, D, E or InstructorName, CourseName and DeptName

were not involved in the creation of the candidate key, so they are the non-prime attributes.

Using these non-prime attributes we find the partial dependencies (non-prime attributes which are only linked to a part of the candidate key, not the whole).

We see that:

AΒ β†’Β D or CourseID -> InstructorName
AΒ β†’Β C or CourseID -> CourseName
AΒ β†’Β E or CourseID -> DeptName

These non-prime attributes are only linked to A or CourseID, which is just a part of the candidate key.

Again these non-prime attributes are derived by B or StudentID.

In better terms, C, D and E are proper subsets of the Candidate Key, AB and they are non-prime attributes.

So we found the partial dependencies.

Now to create 2NF we need to separate the schemas on the basis of the partial dependencies:

We do that by usually :

Creating a schema where both LHS and RHS are parts of the Candidate Key.

or/and

Creating a schema where RHS is dependent on LHS.

Over here we had the original schema as:

Student(CourseID, StudentID, InstructorName, CourseName, DeptName)

We will split this into 4 tables:

StudentID CourseID
CourseID InstructorName
CourseID CourseName
CourseID DeptName

And now we have the schema in 2NF.

Or to keep things more clean :

CourseID InstructorName CourseName DeptName

This can be interpreted as a rule of thumb as:

If multiple non-prime attributes are functionally dependent on the same determinant (like a single part of the candidate key), and they don’t introduce new transitive dependencies, you can and should merge them into a single relation.

We saw that:

AΒ β†’Β D or CourseID -> InstructorName
AΒ β†’Β C or CourseID -> CourseName
AΒ β†’Β E or CourseID -> DeptName

And none of the non-prime attributes in between were deriving other non-prime attributes, so this was a suitable merger.

And we have the other table as:

StudentID CourseID

3NF reduction.

Now time to reduce these:

StudentID CourseID
CourseID InstructorName CourseName DeptName

Into 3NF (3rd Normal Form).

Now the pre-requisite to 3NF is that our schema must be in 2NF, which we have achieved.

And to successfully convert into 3NF we must ensure that there are no transitive dependencies amongst the non-prime attributes.

We already covered this before:

AΒ β†’Β D or CourseID -> InstructorName
AΒ β†’Β C or CourseID -> CourseName
AΒ β†’Β E or CourseID -> DeptName

And none of the non-prime attributes derive each other.

So this format:

StudentID CourseID
CourseID InstructorName CourseName DeptName

is already in 3NF.


BCNF Conversion.

For BCNF the condition is that:

By non-trivial dependency, it means that if we have XΒ β†’Β Y, Y ⊊ X or Y is not a subset of X.

We have our functional dependencies as:

AΒ β†’Β D
AΒ β†’Β C
AΒ β†’Β E

And AB is the candidate key.

Key difference between Non-Trivial and Trivial Attributes

Trivial Dependency:

Non-Trivial Dependency:

More Clarification:

So we see that:

AΒ β†’Β D or CourseID -> InstructorName
AΒ β†’Β C or CourseID -> CourseName
AΒ β†’Β E or CourseID -> DeptName

All three are non-trivial functional dependencies and their LHS A is part of a candidate key. Or for this specific schema:

CourseID InstructorName CourseName DeptName

A , CourseID is the super key meaning it can identify (or derive) all other attributes.

So this means that the schemas:

StudentID CourseID
CourseID InstructorName CourseName DeptName

are in BCNF.


Example 2

Student_Course (StudentID, CourseID, Instructor, InstructorPhone, CourseName)

Functional Dependencies:

  1. StudentIDΒ β†’Β Instructor,Β InstructorPhone
  2. CourseIDΒ β†’Β CourseName
  3. StudentID,CourseIDΒ β†’Β Instructor,InstructorPhone,CourseName

{StudentID,Β CourseID} is the primary key.

For 1NF we will just assume that there is no multi-valued attribute.

Now to reduce for 2NF.

Let's name these some alphabets:

AΒ =Β StudentID,Β BΒ =Β CourseIDΒ ,Β CΒ =Β Instructor,Β DΒ =Β InstructorPhone,Β EΒ =Β CourseName

AΒ β†’Β C,Β D
BΒ β†’Β E
A,BΒ β†’Β C,D,E

Also AΒ β†’Β A and BΒ β†’Β B

So

A+Β =Β A,C,DB+Β =Β B,EC+Β =Β CD+Β =Β DE+Β =Β E

So AB+Β =Β A,B,C,D,E

So AB is the candidate key.

The non-prime attributes are C,D,E

Partial dependencies:

AΒ β†’Β C or StudentID -> Instructor
AΒ β†’Β D or StudentID -> InstructorPhone
BΒ β†’Β E or CourseID -> CourseName

So, splitting into 2NF:

StudentID CourseID
StudentID Instructor InstructorPhone

since C and D have no transitive dependencies.

We can merge these tables even further since CourseID has no transitive dependencies with Instructor or InstructorPhone and since StudentID can identify all the other attributes in the above two tables.

StudentID CourseID Instructor InstructorPhone

And lastly :

CourseID CourseName

3NF

For 3NF we see that there are no transitive dependencies among the non-prime attributes:

Instructor, InstructorPhone, CourseName.

So

StudentID CourseID Instructor InstructorPhone

and

CourseID CourseName

is in 3NF


For BCNF

We have the functional dependencies:

StudentID -> Instructor
StudentID -> InstructorPhone
CourseID -> CourseName

First dependency is the candidate key itself.

The other ones are all non-trivial dependencies since the RHS is not the subset of LHS.

However, StudentID alone can't identify all the tuples, if this were true, this would mean that a student can take only 1 course at a time.

Same for CourseID, since multiple students can be present in a single course.

So we need to decompose this further for BCNF satisfaction

StudentID CourseID Instructor InstructorPhone

We split out the dependencies that caused the violation of BCNF rules:

StudentID -> Instructor
StudentID -> InstructorPhone

into a new schema:

StudentID Instructor InstructorPhone

and keep the remaining ones:

StudentID CourseID

Now,

StudentID -> Instructor, InstructorPhone , StudentID is a super key since a student can have only 1 instructor and an instructor can have only one phone number.

StudentID, CourseID is the candidate key.

and we had this previously.

CourseID CourseName

So, finally:

StudentID Instructor InstructorPhone
StudentID CourseID

and

CourseID CourseName

is in BCNF.


Equivalence of Functional Dependency

Example 1

We have two dependencies:

  1. F : {AΒ β†’Β B,Β BΒ β†’Β C}
  2. G: {AΒ β†’Β C,Β AΒ β†’Β B}

From F:

A+Β =Β {ABC}B+Β =Β {BC}

So GΒ βŠ†Β F

Now, from G:

A+Β =Β {ABC}B+Β =Β B

which doesn't match the set of attributes in F.

So F ⊊ G


Example 2

Again, we have two dependencies:

  1. F : {AΒ β†’Β BC,Β BΒ β†’Β D}
  2. G: {AΒ β†’Β B,Β AΒ β†’Β C,Β BΒ β†’Β D}

From F:

A+Β =Β {ABCD}B+Β =Β {BD}

So, GΒ βŠ†Β F

From G:

A+Β =Β {ABCD}B+Β =Β {BD}

So, FΒ βŠ†Β G

Hence, we can say that the dependencies F and G are equivalent to each other.


Dependency Preserving Decomposition (Lossless decomposition)

Example 1

Let’s say we have a relation:
R(A, B, C)

and a set of functional dependencies:
F = {A β†’ B, B β†’ C}

Suppose we decompose R into:

Check whether this decomposition preserves all functional dependencies in F.

Taking the union of R1 and R2:

R1Β βˆͺΒ R2Β =Β {A,B,C}Β =Β R

So this decomposition preserves all the attributes of R.


Example 2:

We have a relation R(A,B,C,D)

And functional dependencies in F : {AΒ β†’Β B,Β CΒ β†’Β D}

This was decomposed into two relation tables:

R1(A,B), where AΒ β†’Β B
R2(C,D) where, CΒ β†’Β D

Well, if we do R1Β βˆͺΒ R2 when we get {AΒ β†’Β B,Β CΒ β†’Β D}

So this decomposition is dependency preserving.

However,

The lossless condition states that:

For a decomposition to be lossless,

R1 ∩ R2Β β†’Β R1 or R1 ∩ R2Β β†’Β R2

Now,

R1 ∩ R2Β β†’Β Ο•

Since there are no common dependencies among R1 and R2

So this decomposition is not lossless.


Joins

Natural Join

πŸ”Ή Example 1

Table A: Students

StudentID Name CourseID
1 Alice C1
2 Bob C2
3 Charlie C3

Table B: Courses

CourseID CourseName
C1 Math
C2 Physics
C4 Chemistry

Task: Perform a natural join between Students and Courses.

Output:

MariaDB [test]> SELECT * from employees, departments WHERE employees.DeptID = departments.DeptID;
+-------+-------+--------+--------+----------+-----------+
| EmpID | Name  | DeptID | DeptID | DeptName | Location  |
+-------+-------+--------+--------+----------+-----------+
|   101 | Ravi  | D1     | D1     | HR       | Mumbai    |
|   102 | Priya | D2     | D2     | IT       | Bangalore |
+-------+-------+--------+--------+----------+-----------+
2 rows in set (0.004 sec)

MariaDB [test]> SELECT * from employees Natural Join departments;
+--------+-------+-------+----------+-----------+
| DeptID | EmpID | Name  | DeptName | Location  |
+--------+-------+-------+----------+-----------+
| D1     |   101 | Ravi  | HR       | Mumbai    |
| D2     |   102 | Priya | IT       | Bangalore |
+--------+-------+-------+----------+-----------+
2 rows in set (0.001 sec)

Example 2

Table A: Students

StudentID Name CourseID
1 Alice C1
2 Bob C2
3 Charlie C3

Table B: Courses

CourseID CourseName
C1 Math
C2 Physics
C4 Chemistry

Task: Perform a natural join between Students and Courses.

Output:

MariaDB [test]> SELECT * FROM STUDENTS, COURSES WHERE STUDENTS.CourseID = COURSES.CourseID;
+-----------+---------+----------+----------+------------+
| StudentID | Name    | CourseID | CourseID | CourseName |
+-----------+---------+----------+----------+------------+
|         1 | Alice   | C1       | C1       | Math       |
|         2 | Bob     | C2       | C2       | Physics    |
|         3 | Charlie | C3       | C3       | Chemistry  |
+-----------+---------+----------+----------+------------+
3 rows in set (0.000 sec)

MariaDB [test]> SELECT * FROM STUDENTS Natural Join COURSES;
+----------+-----------+---------+------------+
| CourseID | StudentID | Name    | CourseName |
+----------+-----------+---------+------------+
| C1       |         1 | Alice   | Math       |
| C2       |         2 | Bob     | Physics    |
| C3       |         3 | Charlie | Chemistry  |
+----------+-----------+---------+------------+
3 rows in set (0.000 sec)

Self Join

Example 1

Table: EMPLOYEES

EmpID Name ManagerID
1 Alice NULL
2 Bob 1
3 Charlie 1
4 David 2
5 Eva 3

πŸ“ Task:

Write a query that shows each employee and their manager’s name.

MariaDB [test]> SELECT E.Name as Employee, M.Name as Manager FROM Employees2 E LEFT JOIN Employees2 M ON E.EmpID = M.ManagerID;
+----------+---------+
| Employee | Manager |
+----------+---------+
| Alice    | Bob     |
| Alice    | Charlie |
| Bob      | David   |
| Charlie  | Eva     |
| David    | NULL    |
| Eva      | NULL    |
+----------+---------+
6 rows in set (0.009 sec)

Now you will notice that we did a LEFT JOIN on top of a SELF JOIN.

Why?

Since we need all the values on the left hand side, i.e the employees, and the RHS can have null values, so it's not a problem.

In this case we need a LEFT JOIN.

And the alias part via select query did the job of SELF JOIN.


Example 2

Table: COURSES

CourseID CourseName PrerequisiteID
C1 Math NULL
C2 Physics C1
C3 Chemistry C2
C4 Biology C1

πŸ“ Task:

Write a query that shows each course name and its prerequisite course name.

Output:

SELECT C.CourseName AS Course , P.CourseName as Prerequisite FROM courses2 C LEFT JOIN courses2 P ON C.CourseID = P.PrerequisiteID;
+-----------+--------------+
| Course    | Prerequisite |
+-----------+--------------+
| Math      | Physics      |
| Physics   | Chemistry    |
| Math      | Biology      |
| Chemistry | NULL         |
| Biology   | NULL         |
+-----------+--------------+
5 rows in set (0.000 sec)

I assumed we will need a LEFT JOIN here too since we need all the course names which are on the left side despite having NULL values on the right meaning that a course can have no prerequisites but we do need all the course names.


Equi Join

Example 1

MariaDB [test]> select * from employees;
+-------+--------+--------+
| EmpID | Name   | DeptID |
+-------+--------+--------+
|   101 | Ravi   | D1     |
|   102 | Priya  | D2     |
|   103 | Nikhil | D4     |
+-------+--------+--------+
3 rows in set (0.002 sec)

MariaDB [test]> select * from departments;
+--------+----------+-----------+
| DeptID | DeptName | Location  |
+--------+----------+-----------+
| D1     | HR       | Mumbai    |
| D2     | IT       | Bangalore |
| D3     | Finance  | Hyderabad |
+--------+----------+-----------+
3 rows in set (0.004 sec)

List employee names along with their department names using an Equi Join.

Output:

SELECT Name, DeptName FROM employees, departments WHERE employees.DeptID = departments.DeptID;
+-------+----------+
| Name  | DeptName |
+-------+----------+
| Ravi  | HR       |
| Priya | IT       |
+-------+----------+
2 rows in set (0.001 sec)

Do note that if we run a Natural Join query here:

SELECT Name, DeptName FROM employees NATURAL JOIN departments;
+-------+----------+
| Name  | DeptName |
+-------+----------+
| Ravi  | HR       |
| Priya | IT       |
+-------+----------+
2 rows in set (0.001 sec)

We do end up getting the same output since in this case both tables have only one common attribute.

In short:

Every Natural Join is an Equi Join, but not every Equi Join is a Natural Join.

If you want full control over join conditions (especially in production queries), prefer Equi Joins.
If you're doing quick joins on clean, well-named schemas β€” Natural Join is fine too.


Example 2

πŸ“˜ Tables

Customers

CustomerID Name
C001 Alice
C002 Bob
C003 Charlie

Orders

OrderID CustomerID Amount
O1 C001 500
O2 C002 750
O3 C004 300

πŸ” Goal:

Show each order’s ID, amount, and the customer’s name.

Output:

SELECT Name, OrderID, Amount FROM Customers, Orders WHERE Customers.CustomerID = Orders.CustomerID;
+-------+---------+--------+
| Name  | OrderID | Amount |
+-------+---------+--------+
| Alice | O1      |    500 |
| Bob   | O2      |    750 |
+-------+---------+--------+
2 rows in set (0.001 sec)

Left Outer Join

Pasted image 20250313184314.png

Left-Outer Join returns the matching rows and the rows which are only present in the left table, but not in the right table.

Example 1

Tables:

Employees3

EmpID Name ProjectID
201 Anil P1
202 Sunita P2
203 Rakesh NULL

Projects

ProjectID ProjectName
P1 Alpha
P2 Beta
P3 Gamma

πŸ” Task:

List all employees along with their project names. If an employee isn't assigned to any project, still show them.

Output:

Select Name, ProjectName FROM Employees3 LEFT OUTER JOIN Projects ON (Employees3.ProjectID = Projects.ProjectID);
+--------+-------------+
| Name   | ProjectName |
+--------+-------------+
| Anil   | Alpha       |
| Sunita | Beta        |
| Rakesh | NULL        |
+--------+-------------+
3 rows in set (0.004 sec)

The ON clause defines the condition used to match rows between two tables when you're performing a JOIN.

Think of it like this:

πŸ”— "ON tells SQL how to join the tables β€” which columns to match rows by."


Example 2:

Tables:

Students2

StudentID Name
1 Alice
2 Bob
3 Charlie

Submissions

StudentID Assignment
1 A1
2 A2

πŸ” Task:

Show all students along with the assignments they submitted. If a student hasn't submitted anything, still include them.

Output:

SELECT Name, Assignment FROM Students LEFT OUTER JOIN Assignments ON (Students.StudentID = Assignments.StudentID);
+---------+------------+
| Name    | Assignment |
+---------+------------+
| Alice   | A1         |
| Bob     | A2         |
| Charlie | NULL       |
+---------+------------+
3 rows in set (0.000 sec)

RIGHT OUTER JOIN

Pasted image 20250313202000.png

Example 1

EmpID Name DeptID
101 Ravi D1
102 Priya D2

departments2

DeptID DeptName
D1 HR
D2 IT
D3 Finance

Perform a right outer join on this table and display the employee names along with their department names.

Output:

SELECT Name, DeptName FROM Employees RIGHT OUTER JOIN Departments ON (Employees.DeptID = Departments.DeptID);
+-------+----------+
| Name  | DeptName |
+-------+----------+
| Ravi  | HR       |
| Priya | IT       |
| NULL  | Finance  |
+-------+----------+
3 rows in set (0.001 sec)

Example 2

students2

StudentID Name
1 Alice
2 Bob

assignments2

StudentID Assignment
1 A1
2 A2
3 A3

Perform a right outer join and display the names of the students and their submitted assignments.

Output:

SELECT Name, Assignment FROM Students RIGHT OUTER JOIN Assignments ON (Students.StudentID = Assignments.StudentID);
+-------+------------+
| Name  | Assignment |
+-------+------------+
| Alice | A1         |
| Bob   | A2         |
| NULL  | A3         |
+-------+------------+
3 rows in set (0.000 sec)

πŸ”— JOIN TYPES COMPARISON

Join Type Returns Matching Required Missing Side Fills With
INNER JOIN Only matching rows from both tables βœ… Required ❌ Not shown
NATURAL JOIN Like INNER JOIN, but auto-matches columns with same name βœ… Required ❌ Not shown
EQUI JOIN INNER JOIN with explicit condition βœ… Required ❌ Not shown
LEFT OUTER JOIN All left table rows + matching right rows ❌ Not always βœ… Right = NULL if missing
RIGHT OUTER JOIN All right table rows + matching left rows ❌ Not always βœ… Left = NULL if missing
SELF JOIN A table joined with itself (can be any join type) Contextual Based on condition
FULL OUTER JOIN All rows from both tables (match when possible) ❌ Not always βœ… Both sides can be NULL

πŸ” Visual Comparison:

Table A:         Table B:

 A1  A2          B1  B2

    INNER JOIN:
     A β‹ˆ B       β†’ Matches Only

    NATURAL JOIN:
     A β‹ˆ B       β†’ Like INNER JOIN, auto-detects join column(s)

    LEFT OUTER JOIN:
     A βŸ• B       β†’ All A, matched B or NULL

    RIGHT OUTER JOIN:
     A βŸ– B       β†’ All B, matched A or NULL

    FULL OUTER JOIN:
     A βŸ— B       β†’ All A + All B, matched or NULL

    SELF JOIN:
     A β‹ˆ A       β†’ A joins to itself (e.g., employee-manager)


🧠 Mnemonics:


Difference between Primary Key, Candidate Key and Super Key.

Among these keys you will hear this phrase quite often:

"Can uniquely identify a row".

So if you are like me who is still confused to as to what this means to this day, here's a clear breakdown which I hope will clear up the meaning of this phrase first.

πŸ” What does β€œuniquely identify a row” mean?

In a table, each row represents one real-world object or entity β€” like one student, one employee, one order, etc.

To uniquely identify a row means:

You should be able to find one and only one row based on the value(s) in certain column(s).


🧠 Why is this important?

Without unique identification:


🧾 Let’s take an example:

πŸ“‹ STUDENTS Table

StudentID Name Email
101 Alice alice@mail.com
102 Bob bob@mail.com
103 Alice alice123@mail.com

Let’s now test some columns:

πŸŸ₯ Case 1: Use only Name to identify a row:

You search for:
"WHERE Name = 'Alice'"

β›” Result: You get 2 rows. So this does NOT uniquely identify a row.


βœ… Case 2: Use StudentID:

You search for:
"WHERE StudentID = 102"

βœ… Result: Only 1 row β€” Bob.
This uniquely identifies the row.


βœ… Case 3: Use Email:

"WHERE Email = 'alice@mail.com'"

βœ… One row. Also uniquely identifies a row.

So:


πŸ”‘ How does this connect with Keys?

If a column (or combo of columns) can always be used to find one and only one row, it qualifies as a key.


πŸ“Œ Real-Life Analogy:

Think of a college student list:

So even if names repeat, student ID is like a fingerprint β€” no two students share it.

That's unique identification.


Summary

🟒 A column (or combination) uniquely identifies a row if:


Now, I hope that has cleared up what do you mean by "uniquely identifying a row".

So now let's proceed to the keys themselves.

πŸ”‘ 1. What is a Key in general?

A key is a column (or a set of columns) used to uniquely identify each row in a table.


πŸ”Ή 2. Super Key – The Superset

A super key is any combination of columns that can uniquely identify rows.

Example:

StudentID Name Email
101 Alice alice@mail.com
102 Bob bob@mail.com
103 Charlie charlie@mail.com

Possible Super Keys:

Note: As long as it uniquely identifies a row, it's a super key. Even if it's got extra unnecessary columns.


πŸ”Ή 3. Candidate Key – The Minimalist

A candidate key is a minimal super key, meaning:
➀ It still uniquely identifies rows,
➀ But contains no unnecessary columns.

From the table above:

So here:


πŸ”Ή 4. Primary Key – The Chosen One

A primary key is one of the candidate keys that you choose to be the main identifier.

Rules:

So from above, you might pick:

The others (like Email) remain candidate keys but not the primary.


πŸ”„ Summary Table:

Term Is Unique? Minimal? Nullable? Count Per Table
Super Key βœ… ❌ βœ… Many
Candidate Key βœ… βœ… βœ… Many
Primary Key βœ… βœ… ❌ Only One

🧠 Mnemonic to Remember: