Algorithm Analysis

"Beware of bugs in the above code; I have only proved it correct, not tried it." -- Donald Knuth

Beginning Analysis

The point of analyzing algorithms is to be able to say that one algorithm is better than the other.

What does it mean to be better?
If all algorithms were the same (none are better than any other), we wouldn't be here.
Implement the algorithm (in a programming language) and then decide?
Design it on paper and then decide?
If it's going to run on a computer, which one?
Which language will prove to be best? Which compiler?
How do we measure? What data do we use?

Example 1: Assume that we have an array of random integers and that we wish to find out where x is in the array. When the while loop ends, n will be the position of x in the array. Also assume that x is in the array.

  // Assume a declaration like: int a[SIZE];
  int n = 0;
  while (x != a[n])
    n++;

To find the position of x in the array:

What is the least number of iterations?
What is the most number of iterations?
What is the average number of iterations? (Assume a uniform distribution)
What search method did you use to arrive at these numbers?
Aside: How would you create an unsorted array of unique integers?

Simple code example

Suppose the array was sorted. Now how would you answer these questions:

What search method would you use?
How would the search method affect these questions:
- What is the least number of iterations?
- What is the most number of iterations?

Some points to make:

The simple linear search method is called a linear-time algorithm. (The time is directly proportional to the number of elements.)
Compare this with a binary search algorithm.
What property must the array have to use a binary search method?
Binary search is a logarithmic-time algorithm.

Search code.

Example 2: Assume that we have a linked list of random integers and that we wish to find out where x is in the list. When the while loop ends, n will be the position of x in the list.

Assume this struct	Search code
struct Node { int data; Node *next; };	// Assume the list has SIZE nodes int n = 0; Node node = head; // head of the linked list* while (node) { n++; if (node->data == x) break; node = node->next; }

Assume this struct

Search code

struct Node
{
  int data;
  Node *next;
};

// Assume the list has SIZE nodes
int n = 0;
Node *node = head; // head of the linked list
while (node)
{
  n++;
  if (node->data == x)
    break;
  node = node->next;
}

To find the position of x in the list:

What is the least number of iterations?
What is the most number of iterations?
What is the average number of iterations? (Assume a uniform distribution)
What search method did you use to arrive at these numbers?

Suppose the list was sorted. Now how would you answer these questions:

What search method would you use?
How would the search method affect these questions:
- What is the least number of iterations?
- What is the most number of iterations?

Analysis points:

Informally, we want to figure out the number of steps required to perform the computation.
The goal is to write a formula (equation) for the computation time in terms of the size of the problem.
Some parts of the formula are constant. (Constants of proportionality)
The growth rate is the part of the formula that varies with the size of the problem.
The dominant term is all that matters to us. We can throw away the rest.
The notation is O( ), called any number of things such as "Big Oh", "growth rate", "order of", "proportional to", to name a few. "Big Oh" is what we'll use.
A technical name for the big O notation is worst case asymptotic time complexity. (This is the upper-bound on the algorithm. Think frames per second in games.)
- There's also "Big Omega" (Ω), which is a lower-bound. (We won't concern ourselves with that.)

Let's compare the worst case of the algorithms for searching through an unsorted array and a sorted array:

 Number of     Linear    Binary
  elements     search    search
-------------------------------
       10          10       4
      100         100       7
    1,000       1,000      10
   10,000      10,000      14
  100,000     100,000      17
1,000,000   1,000,000      20

New Example:

Suppose we have 2 algorithms whose running times are described as: (where N is the number of elements in the data set)

5 N² steps
50 N lg N steps

Which one is faster? (Assume each step takes 1 microsecond, which is 1 millionth of a second. Also, lg is shorthand for log₂).

     N           5 N² microsecs       50 N lg N microsecs
-------------------------------------------------------------
       10     0.0005 sec               0.002 sec  
      100     0.05 sec                  0.03 sec   
    1,000     5 sec                      0.5 sec 
   10,000     500 sec       = 8 min      6.6 sec
  100,000     50,000 sec    = 14 hr       83 sec
1,000,000     5,000,000 sec = 58 days    997 sec = 17 min

f(x) = 5x²
g(x) = 50x(lg x)

The algorithms are equal at about x = 58.77. BTW, this graph was generated by gnuplot, a free cross-platform tool for generating graphs of any type plus a lot more. Check out some demos.

Suppose we find an algorithm whose runtime is described by: f(n) = n² + 2n + 100

Values of the dominant term:

    n    |        f(n)              n²      n²-term as % of total
---------+------------------------------------------------------------
     10  |            220             100        45.455
    100  |         10,300          10,000        97.087
  1,000  |      1,002,100       1,000,000        99.790
 10,000  |    100,020,100     100,000,000        99.980
100,000  | 10,000,200,100  10,000,000,000        99.999

Now, add a cubic term: f(n) = n³ + n² + 2n + 100

Values of the dominant term:

    n    |        f(n)                           n³        n³-term as % of total
---------+-------------------------------------------------------------------------
     10  |                  1,220                   1,000      81.967213
    100  |              1,010,300               1,000,000      98.980501
  1,000  |          1,001,002,100           1,000,000,000      99.899890
 10,000  |      1,000,100,020,100       1,000,000,000,000      99.989999
100,000  |  1,000,010,000,200,100   1,000,000,000,000,000      99.999000

Now, add an exponential term: f(n) = 2ⁿ + n³ + n² + 2n + 100

Values of the dominant term:

    n    |        f(n)                  2ⁿ          2ⁿ-term as % of total
---------+-------------------------------------------------------------------------
    10   |             2,244               1,024        45.632799
    20   |         1,057,116           1,048,576        99.192142
    30   |     1,073,769,884       1,073,741,824        99.997387
    40   | 1,099,511,693,556   1,099,511,627,776        99.999994

It quickly becomes apparent why we are only concerned with the dominant term. The lesser terms are so insignifcant that they can be completely ignored. This greatly simplifies the comparisons among different functions.

Common growth rates

O(k)           Constant, (not affected by the size of the data, sometimes written as O(1) or O(c))
O(lg N)        Logarithmic (usually base 2)
O(N)           Linear (directly proportional to N)
O(N lg N)      No formal name, but just spoken as "N log N" 
O(N²)          Quadratic (proportional to square of N)
O(N³)          Cubic (proportional to cube of N)
O(N^k)          Polynomial (proportional to N to the power k)
O(2^N)          Exponential (proportional to 2 to power of N)

The growth rate for some common formulas as a function of the size of the problem, N (logarithmic scale)

Larger picture.

lgN (lgN)²  N^1/2       N          NlgN        N(lgN)²         N^3/2          N²
---------------------------------------------------------------------------------
 3     9      3         10           30            90           30           100
 6    36     10        100          600         3,600        1,000        10,000
 9    81     31      1,000        9,000        81,000       31,000     1,000,000
13   169    100     10,000    1,300,000     1,690,000    1,000,000   100,000,000
16   256    316    100,000    1,600,000    25,600,000   31,600,000    10 billion
19   361  1,000  1,000,000   19,000,000   361,000,000    1 billion    1 trillion

f(n) = (dominant term) + (lesser terms)
O(f(n)) = O(dominant term + lesser terms) = O(dominant term)

Relationship of growth rates:

O(1) < O(lg N) < O(N) < O(N lg N) < O(N²) < O(N³) < O(2^N)

Sometimes you 'll see it written this way:

O(1) = O(lg N) = O(N) = O(N lg N) = O(N²) = O(N³) = O(2^N)

The equals sign "=" does not mean equality, but rather should be read as "is contained in". I prefer to use the "greater than" and "less than" symbols to avoid confusion.

Given these rules:

O(a) = O(b)                    (a and b are both constants)
O(N + c) = O(N) + O(c) = O(N)  (c is a contant)
O(N) + c = O(N)                (c is a constant)
O(cN) = cO(N) = O(N)           (c is a constant)

Self-check Replace the "?" with <, >, or =

1. =  O(6) ? O(12)  
2. =  O(N + 5) ? O(N + 100)
3. <  O(lg N) ? O(N)
4. =  O(N/2) ? O(2N)
5. <  O(N² + N) ? O(N³)
6. <  O(N lg N + N³) ? O(2^N)
7. =  O(N) + O(lg 2^N) ? O(N)
8. =  O(lg N) ? O(log₁₀ N)
9.    O(N/4) ? O(1/N)  What is different about this one?

Note on #8: Changing the base of the logarithm is a constant factor, so it is discarded in big-O notation.

Some concrete values: Assuming 1 microsecond (1/1,000,000 sec) per element

	           Number of elements in the data set
     Time          ----------------------------------
  complexity       N=10           N=20           N=30
------------------------------------------------------------
      N           0.00001 sec    0.00002 sec    0.00003
      N²          0.0001 sec     0.0004 sec     0.0009 sec
      N³          0.001 sec      0.008 sec      0.027 sec
      N⁵          0.1 sec        3.2 sec       24.3 sec
      2^N          0.001 sec      1.0 sec       17.9 min
      3^N          0.59 sec      58 min          6.5 years

Time to solve very large problems (from Sedgewick page 40)

operations    problem size 1 million        problem size 1 billion
   per       -------------------------------------------------------
 second         N      N lg N     N²         N      N lg N     N²
--------------------------------------------------------------------
  10⁶        seconds  seconds   weeks      hours     hours   never
  10⁹        instant  instant   hours      seconds  seconds  decades
  10¹²       instant  instant  seconds     instant  instant  weeks
--------------------------------------------------------------------

Self-check Sort the complexities from "fastest" to "slowest" as N gets very large.

O(N²)
O(N lg N)
O(N)
O(N³)
O(1)
O(2^N)
O(lg N)

A (slightly) real world situation

Look at the 3 functions below that contain a nested loop. (a is square 2D array)

What does each loop do?
Which one is easier to read?
Which one do you think is more efficient? Why?
What is the difference between the best, average, and worst case?
Which would you prefer to use in your program? Why?


void Test1(int n)
{
  for (int i = 1; i <= n; i++)
  {
    for (int j = 1; j <= n; j++)
    {	
      a[i - 1][j - 1] = (i / j) * (j / i);
    }
  }
}

void Test2(int n)
{
  for (int i = 0; i < n; i++)
  {
    for (int j = 0; j < n; j++)
    {
      if (i == j)
        a[i][j] = 1;
      else
        a[i][j] = 0;
    }
  }
}

void Test3(int n)
{
  for (int i = 0; i < n; i++)
  {
    for (int j = 0; j < n; j++)
    {
      a[i][j] = 0;
    }
    a[i][i] = 1;
  }
}


void Test1(int n)
{
  for (int i = 1; i <= n; i++) {
    for (int j = 1; j <= n; j++) { 
      a[i - 1][j - 1] = (i / j) * (j / i);
    }
  }
}

void Test2(int n)
{
  for (int i = 0; i < n; i++) {
    for (int j = 0; j < n; j++) {
      if (i == j)
        a[i][j] = 1;
      else
        a[i][j] = 0;
    }
  }
}

void Test3(int n)
{
  for (int i = 0; i < n; i++) {
    for (int j = 0; j < n; j++) {
      a[i][j] = 0;
    }
    a[i][i] = 1;
  }
}

Analyze the loops above and see if your initial reactions were correct. (Count the number of assignments, comparisons, multiplications, divisions.)
This ignores many real-world factors, such as the hardware capabilities (cache, locality of reference, etc.)
External factors are considered constants in this class.
In the Real World^TM, you can't assume that an algorithm on paper will perform the same on all computers.

Profiling results will show more details.

Instruction timing examples. Look at pages 10-12 and compare MOV, ADD, SUB, MUL, DIV instructions and the Math section on page 15.

Estimating the growth rate of an algorithm is affected by:

Conditionals
Function calls
Loops (this is the big-ticket item)

Analyze these code fragments to get the feel for the analysis. Code fragments

Consider this loop. Look carefully at the counters. What is the time complexity of these loops?

for (int i = 0; i < N; i++)
{
  for (int j = 0; j < i; j++)
  {	
    [  do something in O(1)    ]
  }
}

for (int i = 0; i < N; i++)
{
  for (int j = i; j < N; j++)
  {	
    [  do something in O(1)    ]
  }
}

In general, a programming construct with k nested loops has performance O(N^k) if the loop counter is dependent on the size of the problem. This is polynomial growth rate.

The sum of the integers from 1 to N can be written as:

                         N(N + 1)                                                     N(N - 1)
1 + 2 + 3 + 4 ... + N = ---------    or similarly  0 + 1 + 2 + 3 + 4 + ... + N - 1 = ---------
                            2                                                            2

This leads to a time complexity of O(N²). You should become familiar with these equations as we will see them pop up in several places.

Self-check To test your analysis of arrays and linked lists, fill in the table below.

Compare the complexity of performing the following operations on Arrays and Linked lists. What search method would you use? What's the best/worst case? What is the average number of accesses? Assume each value has equal probability of being in the array/list. Also, after the operation, the container must still be a valid array/list type.

Operation Unsorted array Sorted array Unsorted linked list Sorted linked list

Finding a particular item
(by value)

Finding the ith element

Finding the maximum item

Adding an item

Deleting a particular item
(by value)

Sorting

Self-check To test your analysis of the sort algorithms, fill in the table below.

Sort Least Comparisons Least Exchanges Most Comparisons Most Exchanges

Bubble sort

Insertion sort

Selection sort

Self-check Suppose you are using a sort algorithm that requires O(N²) compares and O(N²) swaps/exchanges and you are given an unsorted array of 1,000 integers. What is the minimum number of linear searches you would need to perform before deciding it was cheaper to sort the array first, and then perform the same number of binary searches? Assume that compares/swaps/exchanges all take the same time.

Let X be the number of searches to perform. We have one equation and one unknown.

For the unsorted array, the number of comparisons is 1000 · X.

For the sorted array, we have 10 · X + (the cost of the sort). The cost of the sort is N² compares + N² exchanges/shifts/swaps which is 2,000,000. (Reminder: lg 1000 is 10)

This gives us this equation:

N · X = lg N · X + 2N²

plugging in 1000 for N:

1000 · X = 10 · X + 2,000,000

Solving for X gives us 2,020.20. This means that if we are going to search at least 2,021 times, we should sort the array first. A fewer number of searches will be faster without sorting.

Plugging in a couple of values for X we get:

X Unsorted Sorted

2020 2,020,000* 2,020,200

2021 2,021,000 2,020,210*

X	Unsorted	Sorted
2020	2,020,000*	2,020,200
2021	2,021,000	2,020,210*

The lower the number (*) the faster the searches. The values represent how much work/effort is required and you can see that more than 2020 searches will be faster if you first sort the array.

What if there are 256 items in the array?
What if there are 64 items in the array and the sort algorithm requires O(lg(N²)) compares and O(N²) swaps/exchanges?

Real World^TM Considerations

Although we can study algorithms in a vacuum and independent of outside influences, the ultimate goal of algorithm analysis is to apply the results to real situations on real computers. Given that, there are times when knowing the worst-case asymptotic behavior doesn't really help. Some cases are:

The program will only be used a few times.
The size of the data is always small.
A complex algorithm may require too much maintenance from humans.
- Sometimes an "almost" optimal algorithm is better than an optimal one.

Notes about the worst-case:

We study the worst-case performance because often times it is much harder to predict the average-case performance.
This is because we can't make any assumptions about the distribution of the input data.
There are situations when we can make assumptions of the input data.
If we have insight into the data distribution, we can predict (and use) the average-case analysis instead.
- Knowing the data distribution will help us pick the "better" algorithm.

Putting it together

Complexity graphs - These graphs compare several functions and show how they are related. Graphs with the same complexity grow at the same rate. The graphs of different complexity grow at different rates. You can easily see this by looking at the input values, x, and the output values, f(x) and g(x).

Given what you now know about algorithms and their associated complexities at this point, what conclusions can you draw about the performances of the two algorithms in the graph below?

Formal definition of 0-Notation:

f(n) is O(g(n)) if there exists two positive constants, K and n₀, such that f(n) <= K(g(n)) for all n >= n₀.

Stated another way:
An algorithm's complexity is O(g(n)) if there exists a constant factor K and some input size n₀ such that the time required by the algorithm with inputs greater than n₀ is always less than K(g(n)).

Also look at Figure 2.5 on page 48 of the Sedgewick book.

Visually stated:

Given the function: f(n) = 2x² + 8x lg x + 100, we have some candidate functions for g(n): If g(n) = x² then:

K = 10, K(g(n)) = 10x², n₀ = 3.805
K = 4, K(g(n)) = 4x², n₀ = 9.264
K = 3, K(g(n)) = 3x², n₀ = 15.904
K = 2, K(g(n)) = 2x², (no possible value for n₀)
K = 1, K(g(n)) = x², (no possible value for n₀)

With K = 2.1, K(g(n)) = 2.1x², n₀ = 188 (approximate)

The take-away here is that all of the functions above grow at the same rate and are in the same category. (N² algorithms)

Notes

Generally speaking, you should characterize the running time as accurately as possible. For example, although this is correct, it's not as accurate as it could be:
```
3n² is O(5n² + 7n + 30)
```
Don't include constants or lower-order terms in the notation. This is the proper way:
```
3n² is O(n²)
```

Self-check: During the execution of this function, how many statements are evaluated at runtime (f(n), where n is the size of the array)? What is the worst-case time complexity of this function (O(g(n)))?


     void SelectionSort(int array[], int size)
     {
1.     for (int i = 0; i < size; i++)
       {
2.       int pos = i;
3.       for (int j = i + 1; j < size; j++)
4.         if (array[j] < array[pos])
5.           pos = j;
6.       int temp = array[pos];
7.       array[pos] = array[i];
8.       array[i] = temp;
       }
     }

Operation	Unsorted array	Sorted array	Unsorted linked list	Sorted linked list
Finding a particular item (by value)
Finding the ith element
Finding the maximum item
Adding an item
Deleting a particular item (by value)

Algorithm Analysis

New Example:

A (slightly) real world situation

Real WorldTM Considerations

Putting it together

Formal definition of 0-Notation:

Real World^TM Considerations