Tree Data Structures

Binary Heaps and Heapsort Algorithm

Hello people…! In this post I will talk about the Binary Heaps and Heapsort Algorithm implemented using structures and not arrays. This is post is the successor to my previous post on Binary Heaps. Using arrays to code Binary Heaps is very comfortable for the programmer. But it is not fit for the day-to-day applications. This is because the Binary Heap is used to implement the Priority Queue and it is meant to grow, not stay of fixed size. But given that it is meant to grow, we cannot use STL Vector for this. This is because the Vector keeps re-allocating memory as it grows. So, it is slow. Here we will learn the proper structure based implementation of the Binary Heaps.

Now, currently we have the knowledge of the Binary Heap, its structure, working and other properties. But we implemented this using arrays. Now if we were to implement using a structure, there is one challenge that we will face. That is to ensure the Structural Property of the heap. Why? For insertion into an array, we created a node at the end which itself ensured the Structural Property is not violated and then we bubbled upwards to ensure that the Heap Property is not violated at any point. In the array, we knew exactly where we are supposed to insert the node. But in a tree coded using structures how will you find where to insert the new element such that the tree remains a complete binary tree..? The same goes when we want to delete the root. How do we know where the last node is. For this, we will use the binary representation of the size of the heap to traverse to the location we want.

Size does matter !

While inserting itself we could keep a track of the heap size. So, this takes constant time. While inserting, we must search for the appropriate location. For this, we will convert the size into its binary representation and begin the traversal from the root. How? Keep reading the bits of the binary representation you just generated, if you encounter ‘0’ go to the left child of the parent, if it is a ‘1’ go to the right child of the parent. I know it sounds really weird, but that’s the best thing we can do. Let’s take up an example to understand this better. Consider the Binary Heap below –

Binary Heap Insertion - 1
Binary Heap Insertion – 1

Now consider the binary representation of the number (HeapSize + 1) = 8, which is “1000”. We omit the first bit (left-to-right) because that bit says “come to the root”.

  • The Binary String left is “000”. The leftmost bit is ‘0’. So, go to the Left Child. You come to Node 4.
  • The Binary String left is “00”. The leftmost bit is ‘0’. So, go to the Left Child. You come to Node 1.
  • The Binary String left is “0”. There is only one bit left. This bit doesn’t tell you to “go left” or “go right”, it tells you to “insert it to left” or “insert it to right”. As this bit is “0”, we place the new node as the left child of Node 1.

This is how you must use the binary representation of the (HeapSize + 1) to get to the location you want to insert the node. This is process is depicted in the sketch below –

Binary Heap Insertion - 2
Binary Heap Insertion – 2

Every time you insert a node, you will go all the way down the tree, you will not perform any insertions in the middle, so the traversal always takes O(log n), once inserted, you travel up, ensuring that the heap property holds good, so you have to travel all the way up again, so it is another O(log n) operation. So, the over all process takes O(log n), which is same as in the case of an array implementation.

But for deletion, we are supposed to find something else. We are supposed to find out the last node. In the sketch above, before the insertion of 8, we are supposed to somehow find out that the last node is Node 5. There are a couple of methods by which we can accomplish this –

  1. BFS or DFS – By using the Breadth First Search or the Depth First Search we can find out which is the last node in the Binary Heap by searching the whole tree for a node at the maximum depth which is encountered as last as possible. The time complexity of BFS and DFS is O(|E| + |V|). We would have N vertices and what about the number the edges? We would have N – 1 edges, because every node is pointed by a child pointer, except the root. So, the overall complexity becomes O(N + N), which is O(N).
  2. Binary Traversal – This is the same traversal as we did in the case of Insertion. I just called it Binary Traversal. But the method which we discussed about gave us the Node to be inserted but not the last node. Can you think of a way of modifying the procedure such that it would give us the last node directly…? Think about this for a couple of minutes and you will figure out the simple logic behind it. For those who can’t, don’t worry I will discuss it, when I do, try to figure how silly a thing you missed…! The logic is simple, you used the binary representation of (HeapSize + 1) to get to the location where you were supposed to insert a node, if you use the binary representation of (HeapSize) itself for traversal, you will end up at the last node…! Try working out this on paper and you will realise the logic behind it. As, we would traverse the whole tree from root to leaf, this takes O(log N), which is way faster than BFS and DFS which was an O(N) operation. This kind of Binary Traversal is depicted in the sketch below –
Binary Heap Insertion - 3
Binary Heap Insertion – 3

The next thing that we must know about the structure implementation is that unlike the array implementation, where we could construct the heap using build heap procedure, we cannot do that here. The only way to construct the heap from a given collection of numbers is to keep inserting them one-by-one. As we discussed that in insertion, we would have to go all the way down and come up, inserting a node in the structure implementation will be slower than inserting into a heap implemented using arrays.

So, now let us plan what we would be writing as our code. First things first..! The structure of our nodes in the heap would be –

struct node
{
	int value;
	struct node * parent;
	struct node * leftChild;
	struct node * rightChild;
};

Apart from the standard variables that we see in many structure related codes, we would be needing the parent pointer here. This is because, without that, we cannot travel up the tree, which is needed in insertion and deletion.

Heapsort – If you have understood the concepts of constructing the Binary Heap, finding the Last Node and Extracting maximum element, then Heapsort becomes obvious. You just have to keep extracting the maximum element and store it in an array or a list. Then, you will have a sorted array. This essentially destroys the entire heap.

Take a break and come back to code the binary heap. If you have already coded the binary search tree or a binary tree before this should be relatively easy for you. If you have not, then this will be challenging, and I’m sure you’ll be proud of your efforts once you have successfully coded this data structure. You will spend 80% of the time debugging your code, and trust me that’s the way programming is done. I can’t discuss the issues that you might encounter while coding but you can always comment them if you need help. Once you are able to code Insertion and Deletion, Heap Sort should be easy. If you get stuck, you can always refer to my code below –

    

The code is big and that is how it should be and moreover, it is well commented with information. If you still have doubts regarding the code or any concept, feel free to comment them. I hope my post has helped you to get a complete idea about the Binary Heap. If it did, let me know by commenting…! Keep practising… Happy Coding…! 🙂

2 thoughts on “Binary Heaps and Heapsort Algorithm

  1. Excellent presentation! One minor request, though. As regards my monitor, the formatting doesn’t work. I need to constantly adjust the scroll bar so that I can read the text. The only way I can follow is to copy_paste everything to notepad and read from notepad.

Leave a Reply