Divide and Conquer Closest Pair and Convex-Hull Algorithms

Closest Pair Problem

Recall the closest pair problem.

The brute force algorithm checks the distance between every pair of points and keep track of the min. The cost is O(n(n-1)/2), quadratic.

The general approach of a merge-sort like algorithm is to sort the points along the x-dimensions then recursively divide the array of points and find the minimum. The only trick is that we must check distance between points from the two sets. This could have quadratic cost if we checked each point with the other. But, is there is only a finite number of points then cost could be less.

Alogrithm Closest Pair

0. Initially sort the n points, P_i = (x_i, y_i) by their x dimensions.

1. Then recursively divide the n points, S₁ = {P₁,...,P_n_/2} and S₂ = {P_n_/2+1,...,P_n}

so that S₁ points are two the left of x = x_n_/2 and S₂ are to the right of x = x_n_/2.

2. Recursively find the closest pair in each set, d₁ of S₁ and d₂ for S₂, d = min(d₁, d₂).

3. We must check all the S₁ points lying in this strip to every S₂ points in the strip, and get closest distance d_between

4. To efficiently do the above, need to sort the points along the y dimensions, using a merge sort approach.

5. Then the minimum distance is minimum distance is min(d, d_between)

Analyzing and Cost:

Alogrithm Closest Pair

0. Initially sort the n points, P_i = (x_i, y_i) by their x dimensions.

1. Then recursively divide the n points, S₁ = {P₁,...,P_n_/2} and S₂ = {P_n_/2+1,...,P_n}

so that S₁ points are two the left of x = x_n_/2 and S₂ are to the right of x = x_n_/2. Cost is O(1) for each recursive call

2. Recursively find the closest pair in each set, d₁ of S₁ and d₂ for S₂, d = min(d₁, d₂). Cost is O(1) for each recursive call.

Note that d is not the solution because the closest pair could be a pair between the sets, meaning on from each set.

These points must lie in the vertical stripe described by x = x_n_/2-d and x = x_n_/2+d. Draw the diagram.

3. We must check all the S₁ points lying in this strip to every S₂ points in the strip, and get closest distance d_between

Note that there can be only 6 S₂ points. Note the points must lie also [y_i - d, y_i + d]. Illustrate the worst case. So the time for this step is Θ(6n/2) = Θ(3n). Draw diagram showing the six points in S₂ with respect to the point in S₁.

4. To accomplish this we also need to sort the points along the y dimensions. We do not want to a sort from scratch for each recursive division. So we use a merge sort approach and the cost is of maintaining the sort along y is O(n).

5. Then the minimum distance is minimum distance is min(d, d_between)

The recursive relation is

T(n) = 2T(n/2) + M(n), where M(n) is linear in n.

Using Master's Theorem (a =2, b = 2, d = 1)

T(n) ε O(n lg n)

Note that it has been shown that the best that can be done is Ω(n lg n). So we have found one of the best solutions.

Convex-Hull Problem

Recall the convex hull is the smallest polygon containing all the points in a set, S, of n points Pi = (x_i, y_i). The set of vertices defines the polygon and the points of the vertices are found in the original set of points.

Recall the brute force algorithm. Make all possible lines from pairs of points and then check if the rest of the points are all on the same side of the line. How much? There are n(n-1)/2 such lines and then we check with n-2 remaining points. So the cost is cubic.

Algorthim quickhull

1. Sort the set of points, S, by the x-dimension with ties resolved by the y-dimension.

2. Identify the first and last points of the sort P₁ and P_n

Note P₁ and P_n are vertices of the hull

The ray P₁P_n divides S into sets of points, by points left (S₁) or right (S₂) of the line, defined later.

We need to find the upper and lower hulls. We'll do this recursively.

Note also that S₁ or S₂ could be empty sets.

3. For S₁ find the P_max which is the maximum distance from line P₁P_n, tires can be resolved by the point that maximizes the angle P_maxP₁P_n.

Note that the ray P₁P_max divides points of S₁ into left and right sets. The left points are S₁₁.

Also P_maxP_n identifies the left points S₁₂ of S₁

P_max is vertex of the hull

The points inside the triangle P₁P_maxP_n cannot be vertices of the hull

There are no points to the left of both P₁P_max and P_maxP_n

4. Recursively find the upper hull of the union of P₁, S₁₁ and P_max, and the union of P_max, S₁₂, and P_n

5. Do the like to find the lower hull

We need to identify if point (x₃, y₃) is left or right of the ray defined by points (x₁, y₁) and (x₂, y₂). We use the sign of the determinate

│x₁ y₁ 1│

│x₂ y₂ 1│

│x₃ y₃ 1│

Which has value of the area of the triangle with sign determine by order of the three points. The sign has the properties we need.

Sorting along the x-dimensions cost Θ(n lg n). Finding P_max cost Θ(n). Cost of determining the sets S₁, S₂, S₁₁, and S₁₂ are each Θ(n).

How many recursive call in the worst case? O(n).

The worst case cost is Θ(n²) which beats the brute force O(n³)

We expect the average case to do much better because of the divide and conquer approach, much like quick sort does. In addition for any reasonable and random distribution of points many points in the triangle are eliminated. In fact for randomly chosen points in a circle the average case cost is linear.