Assignment #6 - More Sort Comparison
Objectives:
The main objectives of this assignment are:
- Developing timing tests for other types of sorting algorithms
- Improve our understanding of sorting algorithms and big-O notation
- Better comprehension of True vs. Expected O(n lg n) time and "long" O(1) time.
The Assignment:
We will be creating and timing some specific operations and measuring their
performance on three new sorting implementations (MergeSort, QuickSort, and
RadixSort). We will then graph the performance as the data size (n)
increases.
Programming
You will complete three classes, a MergeSort class, a
QuickSort class, and a RadixSort class, each provided for
you in Assignment6.zip. The classes each have
a void sort(E[]) method which you are to complete. Upon completion
the array passed in should be sorted. You may create additional private
methods to break the processes into smaller parts and organize code.
A Main class is also provided which has headers for the test runs
(called testMergeSort(Integer[], int), testQuickSort(Integer[],
int), and testRadixSort(Integer[], int) respectively). You
are to write the sorting algorithms, complete the headers, and run the tests
to collect the data. You must submit all code written for this
assignment.
For Radix Sort you will need to "extract" a particular digit, d, from
an integer, n. The following code can be used to do this:
/**
* Extract a single digit from an integer.
*
* Ex: getDigit(1234, 0) is 4
* getDigit(1234, 1) is 3
*
* @param n - The number to extract a digit from
* @param d - The digit to extract. 0 is the "1's" position, 1 is the "10's" position, etc.
* @return the extracted digit
*/
public static int getDigit(int n, int d) {
if(n<0)
n=-n;
switch(d) {
case 0:
return n%10;
case 1:
return (n/10)%10;
case 2:
return (n/100)%10;
case 3:
return (n/1000)%10;
case 4:
return (n/10000)%10;
case 5:
return (n/100000)%10;
case 6:
return (n/1000000)%10;
case 7:
return (n/10000000)%10;
case 8:
return (n/100000000)%10;
case 9:
return (n/1000000000)%10;
default:
return 0;
}
}
Testing
As in Assignment 2 and 5, we will use System.nanoTime()
for time measurement. We'll also use the java.util.Random class to
get a random number generator.
Remember that timing can be tricky for a variety of reasons. For this
assignment we will design tests and collect samples until a minimum amount
of time has elapsed. Pay attention to the following guidelines:
- The following pseudo-code describes the basic technique
for constructing a test array:
Algorithm: makeRandomTest
Input: n - the number of elements in the random test array
Input: biggestInt - the maximum size of a single element in the array
Output: arr - an array of n random numbers in the range 0 to biggestInt
// Create the test array:
arr = new Integer[n]
java.util.Random random = new java.util.Random();
// Initialize the Array:
for i=0 to n-1
r[i] = random.nextInt(biggestInt);
For the sorted arrays, just create them with this method and sort them.
- A "timing" test should be performed repeatedly for each sample case
until a sufficient amount of time has elapsed. The following pseudo-code
describes a basic algorithm for the timing tests:
Algorithm: testMergeSort
Input: arr - an array of Comparables to use for empirical measurement of merge sort.
Input: minimum time - the minimum amount of time to run tests
Output: average time taken
System.gc() // Force java to Garbage Collect
samples=0 // No samples yet
start = current time // Begin stopwatch
do {
test = arr.clone() // Make a copy to sort
MergeSort.sort(test) // Sort the data
samples++
stop = current time
} while(stop-start < minimum time);
average time = (stop-start)/(1.0*samples)
return average time
For this project use a minimum time of 500,000 nanoseconds (1/2 s).
- Disable Java optimizations prior to running test cases if possible. If you're running Sun's JVM in Eclipse you can:
- Select Run → Run Configurations
- Select the Arguments tab
- Enter -Xint in the VM Arguments: field.
Tests & Questions
- Read through (but don't yet implement) the following experiments:
- A comparison of the three sorting techniques on Random data for
each sort (MergeSort, QuickSort, and RadixSort).
For each of the following sizes, create a random array of values between 0
and 100,000 and run timing tests on all three sorts using clones of the same
array. n=10,25,50,100,1000, 2000,4000,8000,16000, 32000, 64000, 128000,
256000.
This will give you a total of 39 "times", 13 for each sort.
You will need to plot 3 lines, one for each sort, so be sure that the data
is stored or displayed in a way that makes this easy. For example,
something like:
System.out.printf("%4d , %7.5f , %7.5f , %7.5f %n", n, mergeTime, quickTime, radixTime);
will allow you to import the columns of data into a spreadsheet
as a "comma separated value" (CSV) file.
- A comparison of the three sorting techniques on Ordered data for
each sort (MergeSort, QuickSort, and RadixSort).
For each of the following sizes, create an array of values between 0
and n and run timing tests on all three sorts using clones of the
sorted array. n=10,25,50,100,1000,2000,4000,8000,16000, 32000, 64000, 128000, 256000.
Again, you will give you a total of 39 "times", 13 for each
sort.
- A comparison of the three sorting techniques on Reverse ordered
data for each sort (MergeSort, QuickSort, and
RadixSort). For each of the following sizes, create a random array
of values between n and 0 and run timing tests on all three sorts using
clones of the reverse sorted array. n=10,25,50,100,1000,
2000,4000,8000,16000, 32000, 64000, 128000, 256000.
Again, you will give you a total of 39 "times", 13
for each sort.
- A comparison of the three sorting techniques on larger data
values for each sort (MergeSort, QuickSort, and
RadixSort). For each of the following sizes, create a random array
of values between 0 and 1,000,000,000. Then run timing tests on all three
sorts using using clones of the same array. n=10,25,50,100,1000,
2000,4000,8000,16000, 32000, 64000, 128000, 256000.
Again, you will give you a total of 39 "times", 13
for each sort.
- For each of the four tests run, create a line graphs
showing the performance of all three sorts:
- The horizontal (x) axis should be " Size (n)"
- The vertical axis (y) should be "Time (in ms)"
- The title should be "Sorting Times vs. Size for X Data",
where the X is either "Random", "Ordered" "Reverse Ordered", or "Larger
Numbers"
- Be sure it's an X-Y plot, with linear axes for both X and Y.
- Include a legend that clearly indicates which line represents which sort.
- Based on your big-O analysis of the operations draw a sketch of what
you expect each graph to look like. You don't need to include units, but
estimate the shape of lines in each graph. This will be included with your
homework. It is important that you do this before implementing your
tests and performing measurements.
- Implement the tests, perform the measurements, and produce the required graphs.
- Answer the following questions:
- Did your empirical tests match what you expected from analysis (sketches)?
- What, if anything, surprised you about the results?
- What are the advantages or disadvantages of each type of sort?
- What situations would you use each of the three sorting algorithms?
- What size array did you choose as your "base case" for the Divide and Conquer sorting algorithms?
- How did you sort the base cases?
- What does "Expected" time mean to you, based on this data?
- How do these alternate approaches compare to the other sorts you've done?
- Is the big-O time for Radix Sort misleading? Why or Why not?
- What did you learn that was interesting from this assignment?
Submission:
Grading Criteria:
- Code correctness and readability: 20 points
- Sketches of expected results: 20 points
- Graphs showing actual results: 40 points
- Answers to questions: 20 points