Assignment #2 - List Comparison

Objectives:

Empirical (experimental) techniques where we set up tests and make measurements of the time/space used.
Analytical techniques where we count the number of simple operations and express this count using big-O notation. We can then compare their asymptotic behavior.

The main objectives of this assignment are:

Compare empirical measurement to analytic measurement
Practice developing timing tests for algorithms
Improve our understanding of algorithms and big-O notation
Better comprehension of theory vs. practice

The Assignment:

n

We will use System.nanoTime() for time measurement. System.nanoTime() returns the the number of nanoseconds that have passed since some fixed but arbituray time. We use this method to create a "stopwatch". To do this we:

Prepare the testing data
Call System.nanoTime() and save the result as the start time.
Run the tests for many times.
Call System.nanoTime() again and use the new reading as the stop time.
Calculate the avg

There are some problems that often occur trying to perform timing measurements of algorithms:

The timer resolution problem: Modern computers are (thankfully) blindingly fast. But, unfortunately, the timer mechanism and the operating systems scheduler often take more time than the actual operation. This means that many simple operations appear to take the the same amount of time, when, in reality, they might just take so little time that they are difficulty to accurately measure.
The suspension problem: Modern computers multitask, have complex memory caching and virtual memory, and java runs in a virtual machine, which sometimes optimizes programs in ways that interfere with the consistency of time measurements. All this means that our program may be "stopped" to perform another program or task while in the middle of an important measurement. Although it is great that the computers give the illusion of several programs running at once, we would prefer that our measurements not include the spent playing part of an MP3 or at least that this extra time is negligible compared to what we are trying to measure.

For now we will design tests and collect multiple samples - we will start the timer, perform one type of test many times (samples), and then stop the timer. We will then know the total time taken for all the samples. We can then divide the total time by the number of samples to determine the average time per sample. Assuming that we have a large enough sample size, this will be sufficient to overcome both problems.

Read through (but don't yet implement) the following experiments (note there are THREE of them):
- Experiment #1: The timing of getting the n(th) element in a list
  For each type of list (ArrayList and DoublyLinkedList):
  - First, prepare the testing data
  - Second, measure the time of getting the ith item in the list
  - Create a single line graph showing both sets of data:
  - Note 1: 5000 lists depends on the memory available, the version of the JVM you use, and many other factors. If 5000 is too large for your test setup, decrease it to a suitable value. If your setup can handle more than 5000 lists, feel free to increase it.
  - Note 2: DON'T FORGET: YOU ARE TAKING THE AVERAGE! If you have 5,000 samples, you need to divide your total time by 5000!
- Experiment #2: The timing of adding total of i elements (not one element, but i elements) into a list by calling ArrayList's add(0,e) and DoublyLinkedLists's addFirst(e) i times where i is from 1 to 200.
  For each type of list (ArrayList and DoublyLinkedList):
  - Measure the time
  - Create a single line graph for both sets of data:
  - As an example, the data points for n=100 will indicate how long it takes to add 100 items to the beginning of a list continuously for 100 times.
  - Note 1: These tests should not exceed memory availability on any reasonable setup (do not use multiple lists). Each sample should create an empty list and add the appropriate number of items onto the list.
  - Note 2: This experiment measures the time of inserting total of n elements, NOT just one element. The Average time means the total time divides by 5000 samples.
- Experiment #3: Repeat Experiment #2, but add to the end of the list instead of add to the beginnning by calling ArrayList's add(size(),e) and DoublyLinkedLists's addLast(e) i time where i is from 1 to 200.
Based on your big-O analysis of the operations draw a sketch of what you expect each graph to look like. You don't need to include units, but estimate the shape of lines in each graph. This will be included with your homework. It is important that you do this before implementing your tests and performing measurements.
Implement the tests, perform the measurements, and produce the required graphs.
Answer the following questions:
1. Did your empirical tests match what you expected from analysis (sketches)?
2. What, if anything, surprised you about the results?
3. What are the advantages or disadvantages of each type of list?
4. What are the advantages and disadvantages of each type of analysis (empirical vs. analytic)?
5. Identify the big-O notation cost for each line in every graph.

Submission:

A neatly written report is due in the CS2321 mailbox, which is in the CS office, before 5pm on the due date.

Source code (20 points)
Report (80 points)

Everything but the initial sketches should be done eletronically (type answers to questions and use Excel/OpenOffice/Matlab/gnuplot/etc. for graphs).
In order to help ensure a faster return from the grader, please do not submit the answers to the questions in a paragraph format. Just number your answers as follows (as if you were doing a math assignment):

Answer to Question 1.
Answer to Question 2.

Consider the following checklist:

Are the sketched versions of all THREE graphs included (two lines in each graph, one for ArrayList and one for DoublyLinkedList)?
Are the post-testing versions of all THREE graphs included (two lines in each graph, one for ArrayList and one for DoublyLinkedList)?
Just to double check, do you have SIX graphs total (three printed; three sketched)?
Are all the questions answered?
Do the graphs match what was requested?
Are you displaying AVERAGE times (elapsed/samples) rather than the TOTAL times (elapsed)?
Are the horizontal and vertical axis correct?
Are the horizontal and vertical axis labeled?
Do the graphs have titles?
Are lines clearly marked and easy to read?
Is all work neat and legible? (The grader will not grade work that is hard to read)

Grading Criteria:

Sketches of expected results: 20 points
Graphs showing actual results: 40 points
Answers to questions: 20 points
Source code : 20 points