Tree Data Structures

Segment Trees

Hello people…! In this post I will talk about one of the most versatile data structure in competitive programming, the Segment Tree. Almost in every contest that you do in the major coding websites, at least one question is asked which uses a Segment Tree in its solution. This is because it is easy to code, and does some useful operations quickly, in logarithmic time. Segment Trees are used where the problem demands these two queries on a given array arr[1 … n] –

  1. Sum of elements in the range arr[L … R]. Where, 1 <= L <= R <= n.
  2. Updating an element arr[i].

You can think of doing this in the naive approach where you directly operate on the input array itself. You could do the first operation in O(n) time by running a loop calculate the sum. The second operation could be done in O(1) time, because, it’s only simply updating. This is good if the number of sum queries asked are very less and the update queries are more.

The next best thing you could think of is, Prefix Sum Array. The Prefix Sum Array is an array where the element arr[i] holds the sum of values from arr[1] to arr[i]. You can construct the Prefix Sum Array in O(n) time and solve the first query in O(1) time. Prefix Sum Array is one of my favourite techniques to find out the sum in a given range. But the update query takes O(n) time. If you want to update the ith element, you’d have to update all the elements from i to n, which takes O(n) in the worst case.

How about doing both of them in O(log n) time…? That’s really quick..! This is the speciality of the Segment Tree. We can get the sum query and the update query done in logarithmic time. The Segment Tree works on the principle of partial sums. If you have read my post on Fenwick Trees (or) Binary Indexed Trees, or if you already know BIT, Segment Tree should be easy for you. For those who don’t know, don’t worry, you’ll have an easier time learning the Binary Indexed Trees than compared to the others. So, let’s get started..!

As I mentioned earlier, the Segment Tree works on the principle of partial sums. What this principle is that we somewhere hold the sum of a few elements and keep building on it. In a Segment Tree, this “partial” is actually “half”. The Segment Tree structurally is a Binary Tree. Every internal node is associated with a range of numbers, whose sum is stored in that node. Now, what we do is partition the array into two halves. The left child will hold the sum of elements in the left partition and the right child will hold the sum of elements in the right partition. You may not get the idea of Segment Tree in the first reading. That’s okay. Give another reading and ponder upon the sketch below for a better understanding.

Structure of Segment Tree
Structure of Segment Tree

I hope it is clear from the picture how the Segment Tree works on the partial sums, which here is the left partition sum and the right partition sum. The leaf nodes represent the individual elements of the input array.

Note that the array we will not store the array in the nodes of our Segment Tree as was depicted in the sketch. This was done in the sketch only to give you the idea. The array will not be stored in the nodes. However, we will store the range limits on which the sum is defined, i.e, we will store the start index and the end index of the span of elements in the input array whose sum is there in the node. We will use a struct for the node which looks like this –

struct node
{
	int value, low, high;
	struct node * leftChild;
	struct node * rightChild;
};

The variable ‘value’ was to store the sum, ‘low’ and ‘high’ is to store the range. The variables ‘leftChild’ and the ‘rightChild’ are to point to the children.

Building the Segment Tree –

The Segment Tree is built in a recursive manner. We could write it in a step-by-step procedure –

  • Allocate memory.
  • Start with the root.
  • Partition the array you are given into two halves.
  • Create nodes for each half.
  • The value of the parent node is the sum of values on its children.
  • This is calculated by recursively calling the procedure on the children.
  • When the array given reduces to one element. The procedure must create a node and return the value.
  • When the parent node has acquired the sum of its children, it must return its current value to carry on the recursion.

The step-by-step procedure may not be clear in the first reading. You will know what I was talking about when you understand the pseudo-code below –

int buildSegmentTree(Array, low, high, SegmentTree)
    if low == high
        SegmentTree->value = Array[low]
        SegmentTree->low = SegmentTree->high = low
 
        return SegmentTree->value
 
    mid = (low + high) / 2
 
    SegmentTree->low = low
    SegmentTree->high = high
    SegmentTree->leftChild = allocate memory using calloc()
    SegmentTree->rightChild = allocate memory using calloc()
    SegmentTree->value = buildSegmentTree(Array, low, mid, SegmentTree)
                       + buildSegmentTree(Array, mid + 1, high, SegmentTree)
 
struct node * getSegmentTree(Array, ArraySize)
    if ArraySize <= 0
        return NULL
 
    struct node * SegmentTree = allocate memory using calloc()
 
    buildSegmentTree(Array, 1, ArraySize, SegmentTree)
 
    return SegmentTree

I made the pseudo-code a little extra-descriptive so that you can understand the step-by-step procedure and the code simultaneously. If you have any doubts, feel free to comment them.

The time complexity of the code is O(n). This is because in the worst case we would end up building a complete binary tree which will have 2 ✗ n – 1 nodes, and we preform calculations that take O(1) time for every node.

Sum Query –

Now, it is time to discuss the two important queries that we started our discussion with. We will begin with the sum query. We are supposed to tell the sum of all the elements in a given range. Now we will use our ‘low’ and ‘high’ variables in each node of the Segment Tree. The way we do this query mimics the Binary Search. We would be given a well defined range and we will keep comparing that with the range in each node of the Tree. If it fits perfectly, then we simply return the value stored in that node. If it doesn’t we have a few cases to handle. To understand the cases, consider the following notations –

  • NodeLeft – This is the left bound of the range whose sum is stored in the node. We actually named this as ‘low’ inn our struct.
  • NodeRight – This is the right bound of the range whose sum is stored in the node. We actually named this as ‘high’ inn our struct.
  • QueryLeft – This is the left bound of the range whose sum we are supposed to calculate.
  • QueryRight – This is the right bound of the range whose sum we are supposed to calculate.
  • Mid – This is the index upon which the partition occurs to assign values to the children nodes. It is essentially the mean of NodeLeft and NodeRight.
  • Value – This is the value that the current node is holding. Same as the ‘value’ variable in our struct.

Now, let’s talk about the cases we would encounter –

  1. (NodeLeft == QueryLeft && NodeRight == QueryRight) – This is the best thing that can happen. As we discussed a moment ago, we simply return the Value.
  2. (NodeLeft >= QueryLeft && NodeRight >= QueryRight) – This is the situation where among the elements that the current node holds the sum of, a few of the elements are not needed. The range that the current node has covers more than that’s needed. Now, to proceed with the recursion, we use Mid –
    • (QueryLeft <= Mid) → getSumQuery(SegmentTree->leftChild, QueryLeft, Mid) – This condition checks if the query demands the sum of elements which are in the left partition. So, we carry on the recursion with query range concerning only the left partition.
    • (QueryRight > Mid + 1) → getSumQuery(SegmentTree->leftChild, QueryLeft, Mid) – This checks if the query wants elements of the right partition. So, we carry on the recursion with query range concerning only the right partition.

    Note that both the conditions must be checked for the sum query to give accurate results. A special case arises when (QueryLeft == QueryRight), which means that we are looking for one element, which would obviously be either in the left or the right partition. In that case, we don’t have the necessity of  checking the second condition if the first condition is already met. This is because if we know the query asks for an element in the left partition, we have nothing to do with the right partition in this case.

This will be more clear once you put this idea to work by pen-and-paper. Coding this is simple too. Once you are able to code it, you will realise its resemblance to Binary Search. In the worst case, we would have to travel all the way down the tree, which takes O(log n).

Update Query –

What do we do if we were to update an element initially given in the input. How would we have to modify our Segment Tree so that it is still correct? We know that the initial input elements can be found at the leaves of the Segment Tree. So, we should be updating that. How do we traverse down to that leaf? We do this similarly as we did it in Sum Query, where we took a range and went down the tree based on it. If we put it in Sum Query terms, QueryLeft is equal to QueryRight. We search for it in a similar fashion. Then, we update the node which we came searching for. We also keep a note of the difference, and we add the difference to all those nodes which we came by. This is because, when you update a leaf of the Segment Tree, the internal nodes which would get affected are those which contain the partial sum in which the leaf is involved. We already met those nodes in our recursive traversal. When we are “coming out” of recursion, we update them by the difference that was caused by the new value. So, basically we go down the tree and come back. So, this operation takes O(2 log n) which is O(log n).

Take a break for a while and come back to start coding the Segment Tree. You will need to do a lot of pen-and-paper work while coding this data structure. And it is totally worth it. To print your Segment Tree, you could use the Tree Walks, In-order or Pre-order or whichever pleases you. If you are getting stuck, you can always refer to my code below –

    

There are many varieties of questions that can be asked based on this Data Structure. I’m sure you’ll start liking and appreciating this data structure as you keep solving many questions on it. If you have any doubts, feel free to comment them. I hope my post has helped you in understanding the Segment Tree. If it did, let me know by commenting…! Keep practising… Happy Coding…! 🙂

14 thoughts on “Segment Trees

    1. It is simple…. Instead of sum everywhere, you just use the minimum… In every recursion, use the minimum from the left and the right subtree… Try it once, if you face any difficulty let me know 🙂

  1. most beautiful explanation…!! Could understand SEGMENT TREE in one go which appeared me to be scary always 🙂 thanks a lot… keep posting new topics!

  2. First of all, you’re doing a great job making these tutorials. You explain it very nicely for someone reading it for the first time.

    Meanwhile,
    I feel that your logic for getSumQuery is wrong.
    Suppose we have an array with n=8 as [1 2 3 4 5 6 7 8]
    And we call getSumQuery(segmentTree,1,3,0)
    Now since low is not equal to high, we enter the else block and again since lowleftChild,1,4,0) which would return a value 10, whereas answer is 6.
    I think you missed the case where the whole range could either lie in the left half or the right half.
    (P.S. I might be wrong, just curious)

  3. I know I am asking for so much but it’s my personal request if you could explain some of segment tree question from spoj…..because as far i have seen that there is different kind of application of segment tree(using structure and all…) in most of the question…more precisely please teach us how to maintain 2-3 data fields for a node….how to build tree and how to combine children in those scenarios…with the help of question..Thank you bro 🙂

Leave a Reply