Decrease and Conquer Sorts and Graph Searches

Decrease and Conquer algorithm make the problem smaller by reducing problem at each step. They can reduce the problem by

constant amount
constant factor
variable factor

Decrease and conquer is different from divide and conquer in that not both parts need to be solved. Binary search was really a divide and conquer but rather was decrease and conquer algorithm.

Insertion Sort

Insertion sort is a decrease by 1 algorithm. Because it only decreases by one, we should not expect it to be more efficient than linear.

We scan the array from the sorted part of the array, going from right to left, until we find the element smaller or equal to A[i] and insert A[i] right after that element.

Algorithm InsertionSort(A[0...n-1])

for i ← to n-1 do // move the unsorted portion

v ← A[i] // value to sort

j ← i-1 // end of the sorted array

while j ≥ 0 and A[j] > v do // scan the sorted part of the array for the insertion point

A[j+1] ← A[j] // shift the sort array to make room for the insertion

j--

A[j+1] ← v // insert

Count the number of comparisons, in the worst case is an array that that is sorted in decreasing value.

C_worst(n) = ∑_i₌₁ⁿ^-1 ∑_j=0^i-1 1 = n(n-1)/2 ε Θ(n²)

On average the insertion will occur at (i-1)/2, meaning only half the comparisons will be made, so the average case cost is

C_avg(n) ≈ n²/4 ε Θ(n²)

The best case is a sorted array then

C_best(n) = ∑_i₌₁ⁿ^-1 1 = n-1 ε Θ(n)

The exact cost is n + k, where k is the number of inversions in the original array order. k ε Θ(n²). The attractiveness of insertion sort is the good performance on nearly sorted array. This makes it an attractive base case sorting for recursive sorts.

Depth-First Search

The algorithm recursively visit adjacent vertices as deep as possible until it cannot find a new vertex to visit and then back ups.

An algorithm for a graph that might not be connected and number the vertices.

Algorithm DFS(G)

count ← 0

for each vertex v in V do

if v is marked 0 do

dfs(v)

Algorithm dfs(v)

count++

for each vertex w adjacent to v do

if w adjacent to v is marked 0 then dfs(w) // not visited yet

The efficiency is size of the data structure of the graph, Θ(|V|²) for adjacency matrix and Θ(|V| + |E|) for adjacency list.

Why is it Θ(|V|²) for adjacency matrix? Because the cost finding adjacent vertices in an adjacency matrix is Θ(|V|)

The edges can be marked as discovery and back.

The discovery edges make a forest.

Why the name back edges? Because

DFS can determine

Spanning Tree/Forest
Connectivity - Single tree forest
Cycles - back edges
Path - a long path

Breath-First Search

Breadth-first search visit the adjacent vertices before going one level deeper.

Algorithm BFS(G)

count ← 0

for each vertex v in V do

if v is marked 0 do

bfs(v)

Algorithm bfs(v)

count++

mark v with count

add v to queue

while queue is not empty do

for each vertex w adjacent to front of queue do

if w is marked 0 then // not visited

count++

add w to queue

remove front from queue

The efficiency is size of the data structure of the graph, Θ(|V|²) for adjacency matrix and Θ(|V| + |E|) for adjacency list.

Edges can be marked discovery or tree and cross

Check the properties of cross and why it is called cross.

BFS can determine

DFS can determine

Spanning Tree/Forest
Connectivity/path
Cycles/cross edges
Shortest Path (by number of edges)
Levels

Although the DFS was present recursively, how can we write it iteratively?

Directed Graphs and Topological Sorting

Directed Graph

Directed graphs, di-graph, have edges with directions, arrows.

In real life the directed edges can represent dependencies.

A DFS on a di-graph follows the direction of the edges.

The edges can be marked

discovery or tree

back

forward

What does a forward edge imply about dependency?

Cycles in a di-graph can be detected by back edges not forward edges.

What would a cycle imply if the graph is a dependency? Deadlock

Directed Acyclic Graph (DAG)

Di-graph without cycles

So no back edges in the DFS

Topological Sort

Topological sort is an ordering of the vertices that is not contrary to the dependencies. In other words, if you were to perform one job at a time, a topological sort is one order that you could perform the jobs.

How to generate the topological sort?

From the definition (same method as your the DS text book)

There must be a source vertex, the vertex with no dependencies, i.e. in-degree is zero.

Why must there be a source? Other wise, the graph would have a cycle or be infinite.

The algorithm uses an incounter initially equal to the in-degree of the vertices. When incounter goes to zero for a vertex it is placed on the list for the topological sort and all dependent adjacent vertices' incounter is reduced by one.

Basically, the algorithm reduces the size of the graph by one each time a source is removed.

Algorithm TopoSource(G(V, E))

make empty Container of vertices, C

make empty List of vertices, S

for each vertex v in V do

incounter(v) ← in-deg(v)

if incounter(v) == 0 then C.add(v)

while C is not empty do

u ← C.remove()

S.add(u)

for each edge (u, w) do // edge is out from u to w

incounter(w) ← incounter(w)-1

if incounter(w) == 0 then C.add(w)

if L.size() == V.size() then return S

else return "G has a cycle"

What is the cost? The size of the data structure for G.

There is another technique with use a DFS of the dag.

When the vertices are popped off the recursive stack they are added to the list of the topological sort. Then the list is read in reverse order.

Is the algorithm correct? What is last vertex to be popped of the stack? The source.

What is the first vertex to be popped off the stack? A dead end

The next vertex to be popped off is a dead end of the reduced graph (graph less the vertex that has been popped off).

This is always the case or the vertex.

What is the cost of the algorithm?