Are a form of maps common in compilers for symbol tables (tables storing variable's names and address) and operating systems for registry of environment variables (variables defining the shell options).

Bucket Arrays // where the items are stored.

Hash Functions (algorithm for storing items in the hash table) // How we locate the items.

An array A with size N

If keys are integers then

*insert*will put*entry e*in array rank*k*, A[*k*] =*e,*where*entry*= (*k*ey,*value*)*.*We could then recover the*value*given*k,*with get(*k*) by accessing A[*k*].Any bucket with array cell not assigned an element should contain a special NO_SUCH_KEY object. Naturally when the bucket array is first created with no keys assigned all cells contain NO_SUCH_KEY

The map methods get(k) and insert(k,e) should take O(1).

There are problems:

What if the keys are not unique?

What if the keys are not integers?

If there are many keys or if N is too small then a collision can
occur. We have to take care of this. If the keys are not
integers then we need a function form keys to integers: a *hash
function*.

A hash function maps general keys to a finite range of integers.

Hash function: general keys --> finite set of integers

The hash function has two parts:

Hash code, maps general keys to integers.

Hash code: general keys --> Whole range of integers (Not really.)

Compression map, maps the infinite set of integers to a finite range of integers.

Compression map: Whole range of integers --> finite range of integers

We like for the hash code to avoid *collisions*, but this is
not always possible. **Collisions** are when distinctly
different keys map to the same hash values. Why is this
possible? Collisions are unavoidable because:

finite set of integers even if we use int

algorithm may get too complicated otherwise

We require that if keys are equal then the hash code should return
the same integer value. If hash code function is *h*,
then:

ifk_{1}=k_{2 }thenh(k_{1}) = h(k_{2})

For efficiency of the hash table we would like for the hash code to spread out the mapping uniformly across the integers.

Casting to Integer:

This works great for int and char, but not very well for doubles or Strings. Why? Chopping would give the same value for the hashcode.

Summing components:

static int hashCode(long i) { return (int)((i >> 32) + (int) i):} // good for long

In general Sum(

*x*) for all_{k}*k*where*x*are the components._{k }Good for longs mixes significant an insignificant parts, collisions still occur. Not good for Strings. Why? Hint programmers like to use similar variable names.

Polynomial hash codes:

Assume the key is of the form of a Object with multiple components, (

*x*_{0},*x*_{1},*x*_{2}, ...,*x*_{k}_{-1}), for example a String, the individual objects characters. The polynomial:

*x*_{0}*a*+^{k-1}*x*_{1}*a*+ ... +^{k-2}*x*+^{k-2}a*x*_{k-1}where a is a constant of the hash code, mixes all the components. Horner's form is more efficient to use:

*x*+_{k-1}*a*(*x*+_{k-2}*a*(*x*+ ... +_{k-3}*a*(*x*_{2}+*a*(*x*_{1}+ ax_{0})) ... ))This is a very common hash code. The author has checked for the number of collisions for 50,000 English words as a function of the constants

*a*. They have found that*a*= 33, 37, and 41 have less then 7 collisions. Java uses*a*= 31, which I suspect also gives low collision rates.Cyclic shift hash codes:

A variant of the polynomial hash code but replaces multiplication of a with cyclic shifts. Again mixes the components.

We need to compress the hash value so we can find the proper
bucket in a **finite** array.

The Division Method:

*h*(*k*) = |*k*| mod NN, the size of the array, should be a prime number so that it spreads out the generated codes for typical sequence of integers. Consider N=10 and sequence 10, 20 , 30, ... or 5, 10, 15, 20, ...

We want to different integers two have probability 1/N for generating a bucket value.

The Mad Method:

Multiply-Add-Divide

*h*(k) = |*ak*+*b*| mod N