Transform and Conquer: Instances and Structuring

Either the problem or algorithm can be transformed in one of three ways:

Instance simplification: the instances of the problem can be transformed into an easier instance to solve.
Representation change: the data structure can be transformed so that it is more efficient.
Problem reduction: the problem can be transformed to an easier problem to solve.

This lecture gives examples of instance simplification and representation change.

Presorting: Instance simplification

"Presorting" is a common example of "instance simplification." Presorting is sorting ahead of time, to make repetitive solutions faster. For example if you wish to find many kth statistics in an array then it might make sense to sort the array ahead of time for so that the cost for determining each statistics is constant time.

Presorting is a form of preconditioning. Preconditioning is manipulating the data to make the algorithm faster.

Another example is the problem to determine the uniqueness of array elements. The brute force algorithm would compare each array element with the rest of the array. The cost would be Θ(n²). If we presort the array then the algorithm only needs to compare adjacent elements for uniqueness. The cost for determining uniqueness (without the sorting cost) is Θ(n).

The total cost is

T(n) = T_sort(n) + T_scan(n) ε Θ(n log n) + Θ(n) = Θ(n log n)

Another example is computing the mode of an array. The mode is the most frequent array element. The brute force cost would be quadratic. If the array is sorted then we only need to count runs of value.

Algorithm PresortMode(A[0...n-1])

// assumes that A is sorted

i ← 0

modefrequency ← 0

while i < n do

runlength ← 1

runvalue ← A[i]

while i+runlength < n and A[i+runlength] = runvalue do

runlength++

if runlength > modefequency then

modefrequency ← runlength

modevalue ← runvalue

return modevalue

The algorithm runs in linear time, so the cost is sorting is presorting the array.

Geometrical problems frequently sort the collection of points before solving the problem.

Also diagraphs algorithms frequently do a topological sort before running.

Heaps: Representation change

I assume that you all are well acquainted with the heap.

A heap is a priority queue that is a complete tree. The textbook has the largest key at the root, so parent keys are larger than their children. The data structure finds the largest priority item in constant time. Removing the highest priority item and adding new item are lg(n) time, using bubble down and bubble up algorithm.

Properties of Heaps

1. The height of a heap is floor(lg n).

2. The root contains the highest priority item.

3. A node and all the descendants is a heap

4. A heap can be implemented using an array and all operations are in-place.

5. if index of the root = 1 then index of left child = 2i and right child = 2i+1

6. Level i of the heap has 2ⁱ elements

7. heap order, the parent value is larger than the children

The naive construction of the heap is by adding the items one at a time. The cost is Θ(n lg n).

There is a faster construction in linear time, bottom-up heap construction.

Bottom-up heap construction starts at the last trivial sub heap, floor(n/2), and checks for heap ordering and correctly. Iteratively the algorithm moves up the heap checking the heap ordering and correcting.

Draw a heap and illustrate

Algorithm HeapBottomUp(H[1...n])

for i ← floor(n/2) down to 1 do

k ← i // parent index

v ← H[k] // proposed parent value

heap ← false

while not heap and 2*k ≤ n do

j ← 2*k // left child index

if j < n then // there are children

if H[j] < H[j+1] then j++ // find the larger child value

if v ≥ H[j] then heap ← true // check heap ordering

else // correct the heap ordering

H[k] ← H[j] // bubble down

k ← j // move down the heap

H[k] ← v // insert the value

To determine the cost of the algorithm, consider that in worst case each item would bubble down to the bottom of the heap.

Let h = floor(lg n) the height of the tree.

So at each level, i, in worst case the item would bubble down to the bottom, a total of h-i levels.

There are two comparisons for each item, one compares the children and the other compares the parent with maximum child. So

C_worst(n) = ∑_i₌₀^h^-1∑_{level
i keys} 2(h-i) = ∑_i₌₀^h^-12(h-i)2ⁱ = 2(n-lg(n+1))

using the formula ∑_i₌₁ⁿ i2ⁱ = (n-1)2ⁿ⁺¹+ 2 (See appendix A). The worst case cost is O(n), less than 2n comparisons.