Database Normalization

Shivani Nalawade
5 min readApr 25, 2020

--

This is a gentle introduction about Database Normalization, In the Normalization technique avoid redundant data and duplication data into the database

Redundant data means there is multiple ways of expressing the same thing, usually used for backup purpose .

For Example;Entering a country name like USA, America.

  • Duplicate means exactly the same copy of the data. In the database having multiple copies of the same data called as duplicate, this is also used for backup purpose.
  • Normalization is nothing but ,break the data into the smaller logical pieces(Splitting the table)

The purpose of Normalization:

  • Easier to user access and maintain the data.
  • When a data item is modified or update it is modified (Update)in all relations
    where it appears .
  • When data is deleted accidentally, required data is not deleted because duplicate copies are created.

WHY NORMALIZE ?
Normalize are to ensure that, collection of relations do not
hold duplicate data.

  • Data is easily updated on a relation for example their is parent table and child table connected with each other, where as some modification done on parent table automatically child table is updated.

Example:

There are five normal forms which is used in Database Normalization:

  1. First Normal Form(1 NF)
  2. Second Normal Form(2 NF)
  3. Third Normal Form(3 NF)
  4. Boyce-Code Normal Form(BCNF)
  5. Fourth Normal Form(4 NF)

First Normal Form(1 NF):

  • In First Normal Form(1 NF) should not contain repeating groups.
  • A repeating group is a set of logically related fields or values that occur multiple times in one record.
  • 1st NF talks about atomic values and non-repeating groups(single cell having only single value).

Here, the issue is that in the first row, we are storing 2 courses against Prof. shinde. This isn’t the optimal way since that’s now how SQL databases are designed to be used. A better method would be to store the courses separately. For instance:

This way, if we want to edit some information related to CS101, we do not have to touch the data corresponding to CS154. Also, observe that each row stores unique information. There is no repetition. This is the First Normal Form.

Second Normal Form(2 NF)

  • The table should be in first normal form.
  • in every table there is only one primary key is present(Well, a primary key is a set of columns that uniquely identifies a row).
  • There is no any partial dependency. Database should have only one primary key and each attribute in table depends on the primary key.
  • 2NF is based on concept fully functional dependency.

Full functional dependency indicates that:

  • A and B are attribute of a relation, B is fully depend on A if functionally depend on A but no any proper subset of A.
  • 2NF- A relation that is in 1NF and every non-primary key attribute is fully functionally dependent on the primary key.

Example:

Suppose a school wants to store the data of teachers and the subjects they teach. They create a table that looks like this: Since a teacher can teach more than one subjects, the table can have multiple rows for a same teacher.

Candidate Keys: {teacher_id, subject}
Non prime attribute: teacher_age

  • In this table teacher_age is dependent on teacher_id alone which is a proper subset of candidate key. This is the rule for 2NF as the rule says “no non-prime attribute is dependent on the proper subset of any candidate key of the table”.

teacher_details table:

teacher_subject table:

Third Normal Form(3 NF):

  • The table should be in the second normal form and there is no transitive dependency.
  • Transitive dependency means when a table A is functionally dependent on table B and table Bis functionally dependent on table C, then table C is transitively dependent on table A via table B.

Example:

Third normal form avoids this by breaking this into separate tables:

Here, the third column is the ID of the professor who’s taking the course.

Here, in the above table, we store the details of the professor against his/her ID. This way, whenever we want to reference the professor somewhere, we don’t have to put the other details.We can simply use the ID.

Boyce Codd normal form (BCNF):

  • The table is in BCNF when every determinant in a table is candidate key.
  • Boyce-Codd Normal form is a stronger than(higher version of 3NF) the 3NF, but BCNF and 3NF are not same. However, BCNF also has a additional constraint compared with 3NF
  • A relation is in BCNF , if and only if every determinant is candidate key.

Fourth Normal Form(4 NF):

  • A table is said to be in fourth normal form if there is no two or more, independent and multivalued data dependencies.
  • Although BCNF removes anomalies due to funtional dependencies, another type of dependency called a multivalued dependency
  • Multivalued dependency occurs when two attributes in a table are independent of each other but, both depend on a third attribute.

The various forms of database normalization are useful while designing the schema of a database in such a way that there is no data replication which may possibly lead to inconsistencies. While designing the schema for applications, we should always think about how can we make use of these forms.

To better understand this concept please do watch the below video.It will resolve your all doubts:

--

--

No responses yet