Algorithm Efficiency and Asymptotic Notation

We want to determine or identify the algorithm's space and time efficiency

We'll call it cost. Historically, it costs money to run programs on the main frame.

How to measure?

Run on machines and time

Count instructions

Analyze algorithm

 

Problems with timing code

Same code but different machines run different times -> run on standard machine or various machines

Same code but different input data -> run series of data with larger size 

But data the same size runs different times -> run on random data sets or specify the input

Same algorithm but different programmer run different time -> ?

Same programmer but different language -> ?

 

We want the measure to be independent of

Machine

Programmer and programming language

Also the measure should indicate the time for large problems

 

If analyze the pseudo code then it will be independent of programming language and the machine.

We still need a framework to indicate the time for larger problems.

 

Analysis Framework

Measuring an Input Size

Problem size = n, could be the input data

For example sorting an array, then input size is array size

Squaring numbers, then input size would be number of digits

 

Sometimes the input cannot be represented by a single number

For example string searcher has text length and target length

multiplying two numbers

 

This is ok, we just use the two numbers

Or we use the bigger of the two numbers. 

Running time unit of measure

Input Size, n

From the pseudo code, we pick a basic operation (adding or comparison) assign a cost to it, cop.

And then count it in the pseudo code, C(n)

We determine the cost by T(n) ~ copC(n) where C(n) is the basic operation count

Example use, C(n) = an2 + bn, we could approximate as  an2

What is the growth? T(2n)/T(n) = ...  so we do not even care about the coefficient, a.

 

We only care about the functional family of C(n)

What functional families are there? constant, linear, ...

 

Show Table 2.1

 

constant is great, linear not bad, but logarithmic is better. Exponential really bad, but factorial is worst. 

 

 

What should we do about "same problem size run different times"?

We call this different problem instances.

Worst case, best case, average case

We could calculate running time for

Average case

Worst case

even Best case

 

Example: Sequential Search

 

 

Algorithm SequentialSearch (A[0...n-1], K)

i ← 0

while i < n and A[i] ≠ K do

i ++

if i < n return i

else return -1

 

 

 

What should be the basic operation? The comparison

Why, because it is essential to the algorithm and occurs more than any other operation

 

discuss worst and best case

work through average case

p = probability will be that it is in the array then what is (1-p)? Probability not in the array.

If it is in the array assume uniform then p/n is the probability that it is at a particular location or index

Cavg = Cost if in the array + Cost if is not in array

= [1•p/n + 2•p/n + ... + np/n] + n•(1-p)

= p(n + 1)/2 + n•(1-p)

 

There is also amortize cost

Amortized efficiency - time for a sequence of operations, e.g. growing an array

 

 

How to analyze only large problems? How large is large?

We need could look at trends. What is the mathematical word? Asymptotic Analysis

Asymptotic Notation

Big-O

function t(n) ε O(g(n)) iff exist no ε N and c ε R such that

t(n) ≤ cg(n) for all nno  

 

Note that t(n) ε O(g(n)) implies set membership and O(g(n)) is a set of functions

 

Example using the definition:

t(n) = 100n + 5 discover that 100n + 5 < 100n + n for n > 5 so 100n + 5 < 101n

no = 5 and c = 101

 

Graphical illustration

Upper bounds

 

What is the purpose of no and c?

no implies only interested in large problems

c implies were not interested in the particular machine or basic operation cost

Big-Omega

function t(n) ε Ω(g(n)) iff exist no ε N and c ε R such that

t(n) ≥ cg(n) for all nno  

 

Graphical illustration

Lower bounds

Big-Theta

function t(n) ε Θ(g(n)) iff exist no ε N and c1, c2 ε R such that

c2g(n) ≤ t(n) ≤ c1g(n) for all nno  

 

Graphical illustration

tight bounds

t(n) ε O(g(n)) and t(n) ε Ω(g(n)) => t(n) ε Θ(g(n))

 

Summing Properties

t1(n) ε O(g1(n)) and t2(n) ε O(g2(n)) => t1(n) + t2(n) ε O(max(g1(n) + g2(n)))

 

Could you prove?

Limit Rules

limn→∞ t(n)/g(n) = 0 => t(n) ε O(g(n))

limn→∞ t(n)/g(n) = c > 0 => t(n) ε Θ(g(n))

limn→∞ t(n)/g(n) = ∞ => t(n) ε Ω(g(n))

 

Also can use

L'Hopital rule: limn→∞ t(n)/g(n) = limn→∞ t'(n)/g'(n)

Stirling's formula: n! ≈ √(n)(n/e)n for large n

 

Examples

compare n(n-1) to n2 using limit rule

comapren to lg n using L'Hoptial