Module 2-- Database Management Systems


Index

  1. 1. Relational Algebra
  2. 1. Select Operation
  3. 2. Project Operation
  4. 3. Set Union Operation
  5. 4. Set Difference Operation
  6. 5. Cartesian Product Operation
  7. 6. Rename Operation
  8. 2. Relational Calculus
  9. 1. Tuple Relational Calculus
  10. 2. Domain Relational Calculus
  11. 3. SQL basics, SQL3 and it's enhancements
  12. 6.2. Armstrong's Axioms (Inference Rules)
  13. 7. Normalization
  14. 7.1. First Normal Form (1NF)
  15. 7.2 Second Normal Form (2NF)
  16. What is a partial dependency??
  17. How Do We Decide Functional Dependencies (FDs)?
  18. 7.3 Third Normal Form (3NF)
  19. 7.4 Boyce-Codd Normal Form (BCNF)
  20. 11. Join operation and it's types.
  21. 2. Natural Join
  22. 3. Self Join
  23. 4. Equi Join
  24. 5. Left-Outer Join
  25. 6. Right-Outer Join

1. Relational Algebra

Relational algebra is a procedural query language. It consists of a set of operations that take one or more relations (tables) as input and produce a new relation as output. These operations allow you to express queries in a step-by-step manner.

1.1 Fundamental Operators

σsalary>50000(Employee)

(returns all employees with a salary greater than 50,000.

πname, department(Employee) RS

- Example:
- Combining two relations R and S that store similar data.

Pasted image 20250308183917.png

RS

- Example:
- If R is the set of all employees and S is the set of employees in the IT department, then RS gives you employees not in IT.

Pasted image 20250308184738.png

RS=R(RS)

Pasted image 20250308201720.png

R×S

- Example:
- If R has 3 tuples and S has 4 tuples, then

R×S$$willhave$$3×4=12$$tuples.Note:Cartesianproductisthebasisformorerefinedoperationslikejoins.JoinOperations(Advanced,butimportantforcontext)Purpose:Combinesrelatedtuplesfromtworelationsbasedonacondition.Types:Equijoin,naturaljoin,etc.Example:AnaturaljoinbetweenEmployeeandDepartmentonacommonattributelikedepartmentID.Rename(ρ)Purpose:Renamesarelationoritsattributes,whichisusefulforclarityortoperformselfjoins.Notation:$$ρnewName(R)$$or$$ρnewName(attr1,attr2,)(R)

Worked out examples

1. Select Operation

https://www.youtube.com/watch?v=Givp56x6vbg&list=PLdnwl-gHn1DFIbW82OIyO21lke98MAOKk&index=2

Pasted image 20250307202646.png

Pasted image 20250308182937.png

σdept_name = "Finance"(Instructor)

Pasted image 20250308183033.png

σsalary > 87000(Instructor)

2. Project Operation

https://www.youtube.com/watch?v=QY2X_TlLkqo&list=PLdnwl-gHn1DFIbW82OIyO21lke98MAOKk&index=3

Pasted image 20250308183216.png

Pasted image 20250308183200.png

πID, Name, Salary(Instructor)

Pasted image 20250308183417.png

πName(σDept_Name = "Computer Science"(Instructor))

3. Set Union Operation

Pasted image 20250308183846.png

Pasted image 20250308183854.png

πName(Depositor)  πName(Borrower)

Pasted image 20250308184117.png

Pasted image 20250308184126.png

For Fall 2009 Semester :

πCourse_Id(σSemester = "Fall"  Year = 2009(Section))

For Spring 2010 Semester:

πCourse_Id(σSemester = "Spring"  Year = 2010(Section))

To find the set of them both :

πCourse_Id(σSemester = "Fall"  Year = 2009(Section))  πCourse_Id(σSemester = "Spring"  Year = 2010(Section))

4. Set Difference Operation

https://www.youtube.com/watch?v=dS214kONAZA&list=PLdnwl-gHn1DFIbW82OIyO21lke98MAOKk&index=5

Pasted image 20250308184845.png

Pasted image 20250308184851.png

We need to display the customer names which are not in borrower table.

πCu_Name(Depositor)  πCu_Name(Borrower)

Pasted image 20250308184117.png

Pasted image 20250308185023.png

For Fall 2009 Semester :

πCourse_Id(σSemester = "Fall"  Year = 2009(Section))

For Spring 2010 Semester:

πCourse_Id(σSemester = "Spring"  Year = 2010(Section))

For those which were not taught in Spring 2010 :

πCourse_Id(σSemester = "Fall"  Year = 2009(Section))  πCourse_Id(σSemester = "Spring"  Year = 2010(Section))

5. Cartesian Product Operation

Pasted image 20250308185154.png

Pasted image 20250308185159.png

Cartesian Product will map each row of Depositor to each column of Borrower

(Depositor) × (Borrower)

So output will be:

Pasted image 20250308185239.png


6. Rename Operation

Pasted image 20250308201117.png

Pasted image 20250308201140.png

Pasted image 20250308201417.png


2. Relational Calculus

Relational calculus is a non-procedural (declarative) query language. Instead of specifying the steps to obtain the result (as in relational algebra), you specify what you want, and the DBMS figures out how to compute it.

2.1 Tuple Relational Calculus (TRC)

https://www.youtube.com/watch?v=zhf3nkoz4ds&list=PLBlnK6fEyqRiyryTrbKHX1Sh9luYI0dhX&index=41

2.2 Domain Relational Calculus (DRC)

https://www.youtube.com/watch?v=BmoZEHPPBoI&list=PLBlnK6fEyqRiyryTrbKHX1Sh9luYI0dhX&index=43

2.3 Comparing Relational Algebra and Calculus


Worked out Examples

1. Tuple Relational Calculus

Pasted image 20250308183216.png

Pasted image 20250308203551.png

{t | tInstructor  t[salary] > 80000}

Pasted image 20250308203749.png

So we are asked to retrieve the ID only.

In this case, we will use an existential quantifier.

It will be of this format :

 t  r(Q(t))

which reads as "there exists a tuple t in a relation r", then the predicate (Q(t)) should be true.

t is the list of all tuples retrieved from the table.

So our solution will be :

{t |  sinstructor(t[ID] = s[ID] salary > 80000)}

which reads as "there exists a tuple s in the relation, that being ID of t equals ID of s and salary is greater than eighty thousand".


Pasted image 20250308204543.png

Note that the data asked in the question was not in the table so we will proceed with some assumptions.

{t |  sinstructor(t[Name] = s[Name]

And for department we will use another tuple variable, that belongs to the department relation

   udepartment(u[dept-name] = s[dept-name]  u[building] = "Watson"))}

So it will be

{t |  sinstructor(t[Name] = s[Name]   udepartment(u[dept-name] = s[dept-name]  u[building] = "Watson"))}

Pasted image 20250308210023.png

Pasted image 20250308184117.png

For Fall 2009 Semester :

{t |  s section(t[course-id] = s[course-id])  s[Semester] = "Fall" s[Year] = 2009}

For Spring 2010 Semester :

{t |  u section(t[course-id] = u[course-id])  u[Semester] = "Fall" u[Year] = 2009}

Together in the "OR" format :

{t |  s section(t[course-id] = s[course-id])  s[Semester] = "Fall" s[Year] = 2009} {t |  u section(t[course-id] = u[course-id])  u[Semester] = "Fall" u[Year] = 2009}


Pasted image 20250308211500.png

Same just now we need to find both of them together, so we use conjunction (AND) instead of disjunction (OR)

{t |  s section(t[course-id] = s[course-id])  s[Semester] = "Fall" s[Year] = 2009} {t |  u section(t[course-id] = u[course-id])  u[Semester] = "Fall" u[Year] = 2009}


2. Domain Relational Calculus

Pasted image 20250308221437.png

Pasted image 20250308183216.png

In domain relational calculus instead of dealing with entire row tuples, we deal with the column attributes.

So in this case, the attributes are ID, Name, Dept_Name and Salary, which can be represented in short as i, n, d and s

So the solution will be:

{<i,n,d,s> | <i,n,d,s>  Instructor  s>80000}

Pasted image 20250309125031.png

Since we only have to find the ID from this table, we will use an existential quantifier.

In this form:

 x (P1(x))

which says that "there exists a variable x which satisfies the given predicate P1(x)"

So it would be this way:

{<i> | n,d,s (<i,n,d,s>  Instructor  s>80000)}

Pasted image 20250310134658.png
Pasted image 20250308184117.png
Pasted image 20250310135503.png

So we need to work with two tables, Instructor and Teaches.

For first table we only need the Name field.

{<n,c> |  i,a (<i,c,a,s,y>  Teaches  d (<i,n,d,s> Instructor Dept-Name = "Physics"))}
Here a is selected for sec_id and s is not used here to avoid ambiguity for using two same variables one after another.


Pasted image 20250310140159.png

Pasted image 20250310140730.png

{<c> |  s (<c,a,s,y,b,r,t>  Section  s = "Fall"  y = 2009)   u (<c,a,s,y,b,r,t>  Section s = "Spring"  y = 2010)}

3. SQL basics, SQL3 and it's enhancements

Part 1: SQL Basics

SQL (Structured Query Language) is the standard language for interacting with relational databases.

1.1 Overview of SQL


1.2 SQL3 and Its Enhancements


1.3 SQL DDL: Creating and Modifying Database Structures

CREATE Statement

CREATE TABLE Students (
    student_id INT PRIMARY KEY,
    name VARCHAR(50) NOT NULL,
    major VARCHAR(50),
    enrollment_date DATE,
    CHECK (enrollment_date <= CURRENT_DATE)
);

ALTER Statement

ALTER TABLE Students
    ADD COLUMN email VARCHAR(100);

DROP Statement

DROP TABLE Students;

TRUNCATE Statement

TRUNCATE TABLE Students;

1.4 SQL DML: Managing and Querying Data

INSERT Statement

INSERT INTO Students (student_id, name, major, enrollment_date, email)
VALUES (101, 'Alice Johnson', 'Computer Science', '2023-08-20', 'alice@example.com');

UPDATE Statement

UPDATE Students
SET major = 'Software Engineering'
WHERE student_id = 101;

DELETE Statement

DELETE FROM Students
WHERE student_id = 101;

SELECT Statement

SELECT student_id, name, major
FROM Students
WHERE major = 'Computer Science';


5. Types of DBMS software

DBMSs come in various types, each with unique strengths. They generally fall into two broad categories: Open Source and Commercial. Additionally, within these categories, there are differences based on the underlying technology and features.

2.1 Open Source DBMS

2.2 Commercial DBMS

2.3 Key Differences and Considerations


6. Relational Database Design

6.1. Domain and Data Dependency

1.1 Domain

1.2 Data Dependency

studentidname,birthdate

6.2. Armstrong's Axioms (Inference Rules)

Armstrong’s Axioms provide a sound and complete set of rules to infer all functional dependencies from a given set. There are three primary axioms:

2.1 Reflexivity (Trivial Dependency)


2.2 Augmentation


2.3 Transitivity


6.3. Derived Inference Rules

Using Armstrong's basic axioms, we can derive additional useful rules. These are not independent of the three axioms but are convenient shortcuts.

3.1 Union Rule (Additivity)


3.2 Decomposition Rule (Projectivity)


3.3 Pseudotransitivity


6.4. Practical Applications in Database Design

Understanding and applying Armstrong’s axioms is crucial during the normalization process in database design:


Worked-out Example

Imagine a relation R(A,B,C,D) with the following functional dependencies:

  1. A  B
  2. B  C
  3. A,D  C

Step-by-Step Analysis:

  1. Reflexivity check: Since A depends on B this means that BA

  2. Augmentation:

    • From AB , if we augment both sides with an arbitrary attribute, let's say Z, then :

      {A,Z}  {B,Z}
  3. Transitivity:

    Since AB and BC then using transitivity we get: AC

  4. Pseudotransitivity Check:

    Since AB , BC, thus AC.

    So A and B are interchangeable.

    So A,DC, can become B,DC

    Now from pseduotransitivity, we reconfirm that :

    Since AC and B,DC, we can treat D here as an extra attribute (W).

    So by pseduotransitivity we get A,DC.

These steps help confirm the correctness of the given dependencies and also assist in deriving additional dependencies if needed.


6.5. Summary


7. Normalization

Normalization is a systematic approach to organizing data in a database to minimize redundancy and avoid undesirable anomalies during data operations (insertion, update, deletion). It involves decomposing a table into smaller, well-structured tables while preserving data integrity and ensuring that relationships among the data remain clear. Let’s explore normalization in detail, focusing on the most common normal forms:

7.1. First Normal Form (1NF)

Definition

A relation (table) is in 1NF if:

Why 1NF?

Example

Imagine a table of students with a column that stores multiple phone numbers:

student_id name phone_numbers
1 Alice 123-4567, 234-5678
2 Bob 345-6789

To convert this into 1NF, you’d separate the phone numbers so that each cell holds only one value:

student_id name phone_number
1 Alice 123-4567
1 Alice 234-5678
2 Bob 345-6789

7.2 Second Normal Form (2NF)

https://www.youtube.com/watch?v=tkbAA--wKOc&list=PLxCzCOWd7aiFAN6I8CuViBuCdJgiOkT2Y&index=25

Definition

A table is in 2NF if:

Recap of keys :

Super Key

Candidate Key

Primary Key

Composite Primary Key

Summary Comparison


Why 2NF?

Example

Consider a table storing student enrollments:

student_id course_id student_name course_name
1 101 Alice Math
1 102 Alice History
2 101 Bob Math

Here, the composite key is {studentid,courseid}. Notice that:

To remove these partial dependencies, decompose the table into three:

Student Table:

student_id student_name
1 Alice
2 Bob

Course Enrollment Table:

student_id course_id
1 101
1 102
2 101

Course Table:

course_id course_name
101 Math
102 History

Understanding the steps in detail.

Step 1. List all attributes and the functional dependencies.

So in this table we have the attributes as:

Now for the functional dependencies :

How Do We Decide Functional Dependencies (FDs)?

https://www.youtube.com/watch?v=qn5neFBpU40&list=PLxCzCOWd7aiFAN6I8CuViBuCdJgiOkT2Y&index=24

Not solely based on 1:1 relationships.

Key point:
FDs are determined by how the data logically relates—not just by whether relationships are 1:1. Sometimes, even in a one-to-many or many-to-many scenario, you still have FDs for the parts that are 1:1 (e.g., student_id → student_name or course_id → course_name).

Functional dependencies also have the properties of Armstrong's axioms

This is done by first identifying the domain rules

So in our original table we have :

student_id course_id student_name course_name
1 101 Alice Math
1 102 Alice History
2 101 Bob Math

So we identify the functional dependencies first :

student_idcourse_id (not valid in this case since there is a student with a same id enrolling in two different courses)
course_idstudent_name (not valid since there is no point in linking a course id to a student's name)
student_namecourse_name (not valid since it implies that a student can only take one course where already student id points to multiple course ids)
student_idstudent_name (from transitivity)
student_idcourse_name (from transitivity) (not valid in this case since a single student can take multiple courses)
course_idcourse_name (from transitivity)

student_id -> student_name
course_id -> course_name


Step 2: Find candidate key by performing closure on functional dependencies

https://www.youtube.com/watch?v=bSdvM_0hzgc&list=PLxCzCOWd7aiFAN6I8CuViBuCdJgiOkT2Y&index=23

So we have a relation as R(A,B,C,D)

Let A = student_id ; B = course_id ; C = student_name ; D = course_name

So the functional dependencies will be :

AC
BD

So we start by finding the closure of A

A+ = {C,A}

For B :

B+ = { D, B}

So if we group A and B and take closure of AB

AB+ = {C, D, A, B}

Now while finding Candidate keys it is a good practice to check other possible combinations of candidate keys.

As we see here:

C+ = {C}
D+ = {D}

AC+ = {C,A}
AD+ = {A,D}
BD+ ={B,D}
BC+ = {B,D,C}

So we see that the closure of AB results in all the attributes of relation R.

So the candidate key will be AB or {student_id, course_id}


Step 3: Identify non-prime attributes and partial dependencies

Non-prime attributes are the attributes which are not involved in the formation of the candidate key.

In this instance the non-prime attributes are student_name, course_name

Now for the partial dependencies :

What is a partial dependency??

A partial dependency occurs when a non-prime attribute is dependent on only a part of a candidate key, not the whole.

We see from the table that :

student_id course_id student_name course_name
1 101 Alice Math
1 102 Alice History
2 101 Bob Math

student_name relates to student_id only which is just a part of the candidate key.

course_name relates to course_id only which is just a part of the candidate key.

So there are two partial dependencies happening here.

Or a short way of identify partial dependencies are if :

Pasted image 20250312181404.png

LHS of F.D is a proper subset of Candidate Key (i.e part of candidate key, not the whole candidate key itself)

or

RHS of F.D is a non-prime attribute.

Sometimes we need to check for both of these conditions to find out a partial dependency.


Step 4: Separate the table out based on the partial dependencies

To ensure that there are no partial dependencies, we split the original table into three separate tables

Course enrollment table, containing both parts of the candidate key.

student_id course_id
1 101
1 102
2 101

Student table where student_name fully depends on the candidate key student_id .

student_id student_name
1 Alice
1 Alice
2 Bob

Course table where course_name fully depends on candidate key course_id ,

course_id course_name
101 Math
102 History
101 Math

Now this in 2NF form where the tables are already in 1NF form and all non-prime attributes are fully dependent on the candidate key.


7.3 Third Normal Form (3NF)

https://www.youtube.com/watch?v=IeSai2JVm78&list=PLxCzCOWd7aiFAN6I8CuViBuCdJgiOkT2Y&index=26

Definition

A table is in 3NF if:

Why 3NF?

Example

Consider a table that stores student information along with their advisor and the advisor's office location:

student_id student_name advisor advisor_office
1 Alice Dr. Smith Room 101
2 Bob Dr. Jones Room 102

Here, while student_id uniquely determines student_name, it also determines advisor, and advisor determines advisor_office. The dependency $$student_id \rightarrow advisor \rightarrow advisor_office$$

To achieve 3NF, decompose the table:

Student Table:

student_id student_name advisor
1 Alice Dr. Smith
2 Bob Dr. Jones

Advisor Table:

advisor advisor_office
Dr. Smith Room 101
Dr. Jones Room 102

7.4 Boyce-Codd Normal Form (BCNF)

https://www.youtube.com/watch?v=mf_PbWPo7VM&list=PLxCzCOWd7aiFAN6I8CuViBuCdJgiOkT2Y&index=27

Definition

A table is in BCNF if:

What are trivial and non-trivial functional dependencies??

Trivial Dependency

A functional dependency XYis trivial if the set Y is a subset of X. In other words, the dependency does not provide any new information because the dependent attributes are already part of the determinant.

Non-trivial Dependency

A functional dependency XY is non-trivial if Y is not a subset of X. This means that X determines some attribute(s) that are already not included in X, thereby adding new information.

Why BCNF?

Example

Consider a table with the following dependencies:

course instructor room
Math Dr. Smith 101
Math Dr. Jones 102
History Dr. Jones 102

Suppose:

To achieve BCNF, you might need to decompose the table further so that every determinant is a candidate key. The exact decomposition depends on the specific functional dependencies involved.


Summary of all normal forms

Pasted image 20250312181239.png

https://www.youtube.com/watch?v=EGEwkad_llA&list=PLxCzCOWd7aiFAN6I8CuViBuCdJgiOkT2Y&index=30 (Ignore 4NF and 5NF as those are out of syllabus)

https://www.youtube.com/watch?v=wTJjpH2RUcQ&list=PLxCzCOWd7aiFAN6I8CuViBuCdJgiOkT2Y&index=33


7.5. Advantages and Trade-Offs of Normalization

Advantages:

Trade-Offs:


8. Query Processing

1. Overview of Query Processing

When a user submits an SQL query, the DBMS performs several key steps before returning the results. The process generally involves:


9. Relational Algebra and Query Equivalence

Relational algebra provides a formal foundation for query processing. Through a set of operations (like selection, projection, and joins), we can express SQL queries in a mathematical form.


Equivalence of Functional dependencies

https://www.youtube.com/watch?v=eIXC6NfKno4&list=PLxCzCOWd7aiFAN6I8CuViBuCdJgiOkT2Y&index=36 (Equivalence of Functional dependencies)

So basically we check the equivalence of two functional dependencies by checking if let's say we have two relations X and Y, so we check whether these two are equivalent to each other if

X covers Y i.e XY AND Y covers X i.e YX or

X+ = Y+

Example 1 :

Pasted image 20250312190134.png

So we have two relations :

X = {AB, BC}
Y = {AB, BC, AC}

So first we need to check if X covers Y.

So we start with the closure of the attributes of X.

A+ = {A,B,C}
B+ = {B,C}

In the attributes of Y we see that, from X, A derives B and C. B derives C as well.

So X covers Y or XY

Now to check if Y covers X.

A+ = {A,B,C}
B+ = {B,C}

So we see that in the attributes of X, from Y, A derives both B and C. And B derives C as well.

So Y covers X or YX.

So, since both X and Y cover each other, we can say that X=Y or both the functional dependencies are equivalent.


10. Dependency Preserving Decomposition (Lossless decomposition)

https://www.youtube.com/watch?v=0oeap0QDslY&list=PLxCzCOWd7aiFAN6I8CuViBuCdJgiOkT2Y&index=37

So when we normalize relations and break them down (decompose) into further smaller tables and relations, we must ensure that all the original attributes are present in the decomposed tables such that taking a union of all the decomposed tables will result in all the original attributes from the table.

Pasted image 20250312192102.png

So if we have a relation R(ABCD) and it's decomposed into two children tables R1 and R2 then the decomposition should be in such a way that R1  R2 results in the attributes of R.

Pasted image 20250312192452.png

So let's say we have this relation R with the given F.Ds

If we were to splice this into three sub tables R1, R2 and R3

The rules would be as follows :

So Let's say in R1

Pasted image 20250312193034.png

We initially write a bunch of F.Ds

Now immediately we see that in AB and BB, these are trivial attributes.

So these won't be considered.

For AB it's present in R so it's considered in R1.
BA is neither directly present not derivable via transitive property so it's not considered.

Similarly for tables R2 and R3

Pasted image 20250312193326.png

For R2

BC works as it's directly given in R.

CD  ;  DB  ;CB works

For R3

BC  ;  CD  ;  BD works.

DB works as it's directly given in R

And now if we were to take a union, i.e R1  R2  R3

it would result in : {AB,BC,CB,BD,DB}

Now checking our original set R

Pasted image 20250312192452.png

We see if this union covers R (is equivalent to R)

So :

AB present in R, works
BC present in R, works
CB, not directly present in R, but derivable by transitive property, so it works.
B  D, not directly present in R, but derivable by transitive property, so it works.
DB present in R, works.

So the decomposition of R into R1, R2 and R3 was a successful lossless decomposition.

https://www.youtube.com/watch?v=jxENwUU9j7w&list=PLxCzCOWd7aiFAN6I8CuViBuCdJgiOkT2Y&index=38 (Example 2)


11. Join operation and it's types.

https://www.youtube.com/watch?v=zYH-e6tUYbw&list=PLxCzCOWd7aiFAN6I8CuViBuCdJgiOkT2Y&index=39

Joins are fundamental operations in relational databases that allow you to combine rows from two or more tables based on a related column between them. There are several join techniques—each with its own algorithm, performance characteristics, and use cases. Let’s break down the concept and study the major types of joins in detail.

1. What Is a Join?

A join operation merges tuples (rows) from two tables by matching values of one or more common attributes (columns). For example, if you have an Employee table and a Department table, you might join them on a shared attribute like department_id to retrieve each employee along with their department details.

Joins can be categorized in two ways:

For our syllabus we only have the following join types


2. Natural Join

https://www.youtube.com/watch?v=jRxEjmjIIFs&list=PLxCzCOWd7aiFAN6I8CuViBuCdJgiOkT2Y&index=40

A natural join is a standard join operation involving two tables, where we join the two tables via their cross product and fetch some data from the cross product based on some condition.

The cross product is basically the Cartesian Product operation.

Pasted image 20250313174040.png

Let's say we have these two tables here.

Pasted image 20250313174058.png

And we have been tasked to find the employee names who are working in a department.

As the image above says:

Join = Cross Product + Condition

And the second condition for a join to be possible is that both tables must have a common attribute.

So we see that ENo is a common attribute in both the tables.

So we will start with the cross product :

SELECT E_Name from emp, dept

This is the first part of the Join. Now we need the condition.

We do this as follows :

SELECT E_Name from emp, dept WHERE emp.E_Name = dept.E_name

So this way we will fetch the names of all the employees from the employee table which are assigned to a department in the department table.

So the output table will be :

E_No E_Name Dept_No E_No
1 Ram D1 1
2 Varun D2 2
4 Amrit D3 4

Note that :

Pasted image 20250313175712.png

Employee number 3 Ravi was not assigned to any departments so he wasn't fetched in the Join operation.

Now the original way to write this query would be:

SELECT E_Name from emp Natural Join dept

Here we skip writing the condition since we are using the Natural Join operator straightaway.


3. Self Join

https://www.youtube.com/watch?v=6DQpvfdj6EE&list=PLxCzCOWd7aiFAN6I8CuViBuCdJgiOkT2Y&index=41

Now let's say that we have this table right here :

Pasted image 20250313180221.png

And we are given this task :

Pasted image 20250313180328.png

Now we see that there's a problem, that we have only 1 table. And a typical join operation works on a cross product that involves two tables. So how do we solve this dilemma?

Simple, we use an "alias". (pronounced "a-lee-us"). We create two aliases of the same table, essentially resulting in two separate copies of the same table.

So we do that using :

SELECT _ FROM Study as T1, Study as T2

And now for the condition :

SELECT _ FROM Study as T1, Study as T2 WHERE T1.S_id = T2.S_id AND T1.C_id <> T2.C_id

What this condition means is that we look for those tuples whose student ids match, but their course ids don't. This means that the query has to result in those tuples where students have taken more than one course, fulfilling the constraint of getting at least two courses.

So this would result in the output table as :

Sid Cid Since
S1 C1 2016
S1 C3 2017

which shows that student id 1 has enrolled in two separate courses in years 2016 and 2017.


4. Equi Join

https://www.youtube.com/watch?v=lUiPjkOQG9w&list=PLxCzCOWd7aiFAN6I8CuViBuCdJgiOkT2Y&index=42

So let's say we have these two tables now :

Pasted image 20250313182832.png

And we are tasked with this :

Pasted image 20250313182854.png

This might look similar to Natural Join, but there is a key difference between Natural Join and Equi Join: In Natural Join we check for equality between the common attributes of the two tables, but in Equi Join we can also check for equality among other, non-common attributes of the two tables.

So starting off, we would have our Natural Join query as :

SELECT E_Name from emp, dept WHERE emp.E_Name = dept.E_Name

and this is the point the query becomes an Equi Join query. We check whether two non-common attributes, i.e Address from emp table and Location from dept table are equal or not.

So:

SELECT E_Name from emp, dept WHERE emp.E_Name = dept.E_Name AND emp.Address = dept.Location

Now this is an Equi Join query.

So the final output table for this query will be :

ENo EName Address DepNo Location ENo
1 Ram Delhi D1 Delhi 1

Note that although employee 4 Ravi had the address as Delhi but his department location was in Patna so there was a mismatch in the second parameter of the condition and thus he was not included in the output table.


5. Left-Outer Join

https://www.youtube.com/watch?v=unxn0KnzBzk&list=PLxCzCOWd7aiFAN6I8CuViBuCdJgiOkT2Y&index=43

So let's say we again have these two tables here

Pasted image 20250313183642.png

Left-Outer Join returns the matching rows and the rows which are only present in the left table, but not in the right table.

We can clear this concept with the help of a Venn Diagram.

Pasted image 20250313184314.png

So Left-outer-Join works on top of the standard natural join query.

So let's say if we wanted to perform a Left-Outer Join on these two tables, it would be in this way.

SELECT emp_no, e_name, d_name, loc from emp LEFT OUTER JOIN dept on (emp.dept_no = dept.dept_no)

This time dept_no is the common attribute in the two tables so we go for those. The part after on is the natural join part.

Now this query will return us a table :

Eno Ename Dname Loc
E1 Varun IT Delhi
E2 Amrit HR Hyd
E3 Ravi IT Delhi
E4 Nitin __ __

The first three rows were the matching rows as a result of the Natural Join operation.

But since the the actual operation is a Left-Outer Join, the fourth row is added to the table as well since it belongs to the left table despite there being no match for employee 4 in the right table.


6. Right-Outer Join

The reverse of Left-Outer Join.

Pasted image 20250313202000.png

Queries are the same as well, we just replace left outer join with right outer join.

Pasted image 20250313211023.png

SELECT emp_no, e_name, d_name, loc from emp RIGHT OUTER JOIN dept on (emp.dept_no = dept.dept_no)

So output table will be:

Eno Ename Dname Loc
E1 Varun IT Delhi
E2 Amrit HR Hyd
E3 Ravi IT Delhi
__ __ Testing Noida

Note that this output table contains the department name of Testing and it's location Noida since it's on the right table.


. Query Optimization Techniques

1. Cost-Based Optimization

2. Heuristic-Based Rewriting

3. Example Scenario

Consider a query that joins Orders and Customers on customer_id:

The optimizer estimates the cost for each plan and selects the one with the lowest estimated cost.


Connected Pages
On this page
  • Index
  • 1. Relational Algebra
    1. 1.1 Fundamental Operators
    2. Worked out examples
      1. 1. Select Operation
      2. 2. Project Operation
      3. 3. Set Union Operation
      4. 4. Set Difference Operation
      5. 5. Cartesian Product Operation
      6. 6. Rename Operation
  • 2. Relational Calculus
  • 2.1 Tuple Relational Calculus (TRC)
  • 2.2 Domain Relational Calculus (DRC)
  • 2.3 Comparing Relational Algebra and Calculus
  • Worked out Examples
    1. 1. Tuple Relational Calculus
  • {t | ∃ s∈instructor(t[Name] = s[Name] ∧ ∃ u∈department(u[dept-name] = s[dept-name] ∧ u[building] = "Watson"))}
    1. 2. Domain Relational Calculus
  • { | ∃ s ( ∈ Section ∧ s = "Fall" ∧ y = 2009) ∨ ∃ u ( ∈ Section ∧s = "Spring" ∧ y = 2010)}
  • 3. SQL basics, SQL3 and it's enhancements
    1. Part 1: SQL Basics
      1. 1.1 Overview of SQL
      2. 1.2 SQL3 and Its Enhancements
      3. 1.3 SQL DDL: Creating and Modifying Database Structures
        1. CREATE Statement
        2. ALTER Statement
        3. DROP Statement
        4. TRUNCATE Statement
      4. 1.4 SQL DML: Managing and Querying Data
        1. INSERT Statement
        2. UPDATE Statement
        3. DELETE Statement
        4. SELECT Statement
  • 5. Types of DBMS software
    1. 2.1 Open Source DBMS
    2. 2.2 Commercial DBMS
    3. 2.3 Key Differences and Considerations
  • 6. Relational Database Design
    1. 6.1. Domain and Data Dependency
      1. 1.1 Domain
      2. 1.2 Data Dependency
    2. 6.2. Armstrong's Axioms (Inference Rules)
      1. 2.1 Reflexivity (Trivial Dependency)
      2. 2.2 Augmentation
      3. 2.3 Transitivity
    3. 6.3. Derived Inference Rules
      1. 3.1 Union Rule (Additivity)
      2. 3.2 Decomposition Rule (Projectivity)
      3. 3.3 Pseudotransitivity
    4. 6.4. Practical Applications in Database Design
    5. Worked-out Example
    6. 6.5. Summary
  • 7. Normalization
    1. 7.1. First Normal Form (1NF)
      1. Definition
      2. Why 1NF?
      3. Example
    2. 7.2 Second Normal Form (2NF)
      1. Definition
      2. Recap of keys :
        1. Super Key
        2. Candidate Key
        3. Primary Key
        4. Composite Primary Key
      3. Summary Comparison
      4. Why 2NF?
      5. Example
      6. Understanding the steps in detail.
        1. Step 1. List all attributes and the functional dependencies.
        2. How Do We Decide Functional Dependencies (FDs)?
        3. Step 2: Find candidate key by performing closure on functional dependencies
        4. Step 3: Identify non-prime attributes and partial dependencies
          1. What is a partial dependency??
        5. Step 4: Separate the table out based on the partial dependencies
    3. 7.3 Third Normal Form (3NF)
      1. Definition
      2. Why 3NF?
      3. Example
    4. 7.4 Boyce-Codd Normal Form (BCNF)
      1. Definition
        1. Trivial Dependency
        2. Non-trivial Dependency
      2. Why BCNF?
      3. Example
      4. Summary of all normal forms
    5. 7.5. Advantages and Trade-Offs of Normalization
      1. Advantages:
      2. Trade-Offs:
  • 8. Query Processing
    1. 1. Overview of Query Processing
  • 9. Relational Algebra and Query Equivalence
    1. Equivalence of Functional dependencies
  • 10. Dependency Preserving Decomposition (Lossless decomposition)
  • 11. Join operation and it's types.
    1. 1. What Is a Join?
    2. 2. Natural Join
    3. 3. Self Join
    4. 4. Equi Join
    5. 5. Left-Outer Join
    6. 6. Right-Outer Join
  • . Query Optimization Techniques
    1. 1. Cost-Based Optimization
    2. 2. Heuristic-Based Rewriting
    3. 3. Example Scenario