Introduction to Graphs

Graphs

Introduction

One of the most useful data structures. (A very large topic.)
Related to trees in that a tree is a special kind of graph (Trees are much simpler).
Graphs are more general and have a wider range of use. (Generality trades simplicity.)
Represent problems involving interconnected (dependent) objects.
Graph algorithms are more complex. Need to account for cycles; trees have only one path between nodes.
Graphs can have several (connected) components, a tree is always a single entity.

Topological ordering - all nodes "dependent" on X are in list L before X. (Prerequisites in a school.)
Shortest path - weighted graphs to represent travel destinations and times
Task networks - time it takes to complete a task before starting the next one.
Games - forcing moves by looking ahead.

Terminology

A graph is essentially a collection of points connected by line segments.
The points are referred to as nodes or vertices; the segments are called edges.
If the edges have a direction (arrowheads in a diagram), the graph is a directed graph or digraph.
A graph that has values (weights or costs) assigned to the edges is called a weighted graph (or weighted digraph).


Graph



Directed graph (digraph)



Weighted graph

Some Notation

A graph, G, consists of a set of vertices, V, and edges, E where the edges are constructed from pairs of distinct vertices.
```
G = (V, E)
```
In an undirected graph, each edge is an unordered pair: (note the curly braces for set notation; there is no order)
```
e = {v₁, v₂}
```
In a directed graph, each edge is an ordered pair: (note the parentheses for list notation; order is implied)
```
e = (v₁, v₂)
```
and v₁ is the origin (source) and v₂ is the terminus (destination).
Two vertices, x and y are said to be adjacent if there is an edge connecting them.
We use the notation sGd to mean that s is adjacent to d. With a digraph, sGd implies direction. (xGy is not the same as yGx).
The set of nodes adjacent to s is called the adjacency set of s. This set is fundamental to many graph algorithms.

Paths and Connectivity

A (contiguous) sequence of edges is a path.
If there is a path from x to y, y is reachable from x.
The length of a path is the number of edges on the path.
Two vertices are connected if there is a path from one to the other.
A connected component is a subset, S, of vertices that are all connected.
A single directed graph with two components:

Digraphs can be strongly connected or weakly connected. (Definitions vary.)

Strongly connected - There is a path from each node to every other node.
Weakly connected - There is not a path from each node to every other node. (See node 3 below)


Strongly connected   Weakly connected

Cycles

A cycle is a path whose source and destination node are the same.
A cycle is simple if all nodes on the path are distinct (with the exception of the first and last). A simple cycle must include at least 3 vertices. (Definitions vary.)
Another way of describing a simple cycle: When you travel around the loop in a simple cycle, you must visit at least three different vertices and you must visit each vertex only once.
Think of a "Figure 8" as being a non-simple cycle. (The middle vertex is visited twice.)
If a graph has no cycles, it is acyclic. A directed acyclic graph is called a DAG.

Degree

For an undirected graph, the degree is the number of edges connecting to a node
For a directed graph:
- in-degree is the number of incoming edges into a node (node is a destination)
- out-degree is the number of outgoing edges from a node (node is a source)

Self-check:

How do directed and undirected graphs differ?
What is a path in a graph? What is a cycle in a graph?
What is a simple cycle?
What is the in-degree and out-degree of a vertex in a directed graph?

Representing Graphs

Trees vs. Graphs

A tree is a collection of nodes. Each node can be accessed from the root.
A graph has no "root" node so there is no logical "beginning".
Each node in a graph can be used as a starting point for traversals.
With a tree, we are guaranteed to reach every node by starting from the root.
With a graph, there is no guarantee that we will reach any other nodes from any particular node.
Because of these differences, the data structures representing the structures are quite different.

Adjacency Matrix

A graph G with M nodes represented by an M x M boolean array (matrix).

For each x and y, G(x,y) = TRUE if xGy, otherwise false.


A   B   C

Graph A

Graph B

Graph C

	A	B	C	D	E	F
A	0	1	1	1	0	0
B	1	0	0	0	1	0
C	1	0	0	0	1	0
D	1	0	0	0	0	0
E	0	1	1	0	0	1
F	0	0	0	0	1	0

	A	B	C	D	E	F
A	0	1	1	1	0	0
B	1	0	0	0	1	0
C	0	0	0	0	1	0
D	0	0	0	0	0	0
E	0	0	1	0	0	1
F	0	0	0	0	0	0

	A	B	C	D	E	F
A	0	5	6	4	0	0
B	3	0	0	0	4	0
C	0	0	0	0	9	0
D	0	0	0	0	0	0
E	0	0	7	0	0	9
F	0	0	0	0	0	0

In the matrices above, the source node is along the left and the destination node is across the top.

Note that for the weighted directed graph above we are using an integer matrix. (Could use other types depending on the weights.)

Space required is O(M²) as well as algorithms that must access all elements.
A sparse graph has few edges, and a dense graph has many edges.
- Sparse graphs will have many matrix entries of 0.
- Dense graphs will have many matrix entries of 1.
Determining if two nodes are adjacent is O(1)
The size of the matrix is independent of the number of edges.
An adjacency matrix may be a more desirable representation for dense graphs. (Less "wasted" space with a lot of 0 entries.)

Adjacency Lists

A graph G with M nodes represented by an array of M linked lists.
For each x and y, if xGy is TRUE, y is on x's list.

Unweighted digraph Adjacency list
Space required is O(M²) as well as algorithms that must access all elements.
Density affects the lists:
- Sparse graphs will have shorter lists.
- Dense graphs will have longer lists.
The order of the nodes in a list may be arbitrary.

A weighted graph may order them by weight.

Determining if two nodes are adjacent is O(M) in the worst case.

Could be much less if there are few edges.

The number of nodes in the lists is dependent on the number of edges.
An adjacency list may be a more desirable representation for sparse graphs.

There's no node if there's no edge.

Self-check

Draw the adjacency matrix and adjacency list for the following digraph:
Write C/C++ code to represent the adjacency matrix and list.

Implement functions that:

give the degree of a given node (InDegree)

unsigned InDegree(Graph G, unsigned size, unsigned node);

determine if two given nodes are adjacent (IsAdjacent)

bool IsAdjacent(Graph G, unsigned node1, unsigned node2);

The number of nodes in an adjacency list is ___________________ on the number of edges in the graph.

Determining if two nodes are adjacent using a matrix is ___________________ in the worst case.

With an adjacency list, sparse graphs have ___________________ lists.

Graph Traversals

Traversing a graph is a form of searching.

Unlike tree traversals, there is no "starting" (i.e. root) node in a graph.
Choosing an arbitrary starting node will not guarantee that all nodes are visited.
Unlike trees, it's possible to visit a node more than once.
The search must systematically traverse all of the edges in order to discover all of the vertices.
Although it sounds like a lot of redundant work, it can be accomplished in O(n) time.

An algorithm for traversing a graph (assumes that vertices have a boolean visited field):

GraphSearch (G is the graph to search, v is the starting vertex)
  Put v into container C.
  While container C is not empty
    Remove a vertex, x, from container C
    If x has not been visited
      Visit x
      Set x.visited to TRUE
      For each vertex, w, adjacent to x
        If w has not been visited
          Put w into container C
        End If
      End For
    End If
  End While
End GraphSearch

Given this graph, determine the sequence of nodes that are visited from different starting nodes.







Adjacency matrix:

   A  B  C  D  E  F  G  H
A  0  1  1  1  0  0  0  0
B  0  0  0  0  1  1  0  0
C  0  0  0  0  0  0  0  0
D  0  0  0  0  0  0  1  1
E  0  0  0  0  0  0  0  0
F  0  0  0  0  1  0  0  0
G  1  0  1  0  0  1  0  0
H  0  0  0  0  0  1  0  0

Example 1: Starting at A

If C is a Stack, one order of traversal is: A, D, H, F, E, G, C, B
- another traversal is: A, B, E, F, C, D, G, H
If C is a Queue, one order of traversal is: A, B, C, D, E, F, G, H
- another traversal is: A, D, C, B, H, G, F, E

Example 2: Starting at G

If C is a Stack, one order of traversal is: G, F, E, C, A, D, H, B
- another traversal is: G, A, D, H, F, E, C, B
If C is a Queue, one order of traversal is: G, A, C, F, B, D, E, H
- another traversal is: G, F, C, A, E, D, B, H

A weighted graph:







Adjacency matrix:

   A  B  C  D  E  F  G  H
A  0  3  9  7  0  0  0  0
B  0  0  0  0  6  5  0  0
C  0  0  0  0  0  0  0  0
D  0  0  0  0  0  0  4  2
E  0  0  0  0  0  0  0  0
F  0  0  0  0  8  0  0  0
G  5  0  1  0  0  4  0  0
H  0  0  0  0  0  8  0  0

Example 3: Starting at A and sorting the adjacency set (maybe with a priority queue):

Performing a breadth-first traversal (high to low), the order is: A, C, D, B, G, H, E, F
Performing a depth-first traversal (high to low), the order is: A, C, D, G, F, E, H, B

Make sure you understand how the sequences above were arrived at.

Self-check What is the visited order starting at G using a queue? Using a stack? What is the visited order starting at H using a queue? Using a stack?

Notes:

Depth-first: descendants are visited before siblings.

To traverse depth-first, use a stack.

Breadth-first: siblings are visited before descendants.

To traverse breadth-first, use a queue.

For all vertices to be visited from any node, the graph must be strongly connected.

For weakly connected graphs, you'd need to exhaustively traverse from every vertex:

Unoptimized	Optimized
For each vertex, v, in graph, G GraphSearch (G, v) End For	For each vertex, v, in graph, G GraphSearch (G, v) If all nodes have been visited Break out of loop End For

Self-check
   1. Given the graph below, what is the in-degree and out-degree of each node?
   2. Starting at node A and using a depth-first search, in what order will the nodes be visited? If a node has more than one neighbor, choose the edge with the larger value.
   3. What is the order if you performed a breadth-first traversal instead?
   4. What is the order if you choose the smaller edge for both depth-first and breadth-first traversals?
   5. Is the graph strongly connected?

A Simple Implementation

The graph representation:

const int SIZE = 8; typedef bool Graph[SIZE][SIZE]; Graph G = { // Adjacency matrix Adjacency list {0, 1, 1, 1, 0, 0, 0, 0}, // A-->B-->C-->D {0, 0, 0, 0, 1, 1, 0, 0}, // B-->E-->F {0, 0, 0, 0, 0, 0, 0, 0}, // C {0, 0, 0, 0, 0, 0, 1, 1}, // D-->G-->H {0, 0, 0, 0, 0, 0, 0, 0}, // E {0, 0, 0, 0, 1, 0, 0, 0}, // F-->E {1, 0, 1, 0, 0, 1, 0, 0}, // G-->A-->C-->F {0, 0, 0, 0, 0, 1, 0, 0} // H-->F }; struct Vertex { char label; // For displaying bool visited; // Visited flag bool *neighbors; // Adjacency "list" }; Vertex Vertices[SIZE] = { {'A', false, G[0]}, {'B', false, G[1]}, {'C', false, G[2]}, {'D', false, G[3]}, {'E', false, G[4]}, {'F', false, G[5]}, {'G', false, G[6]}, {'H', false, G[7]} };

With actual bool values:

Graph G = { // Adjacency matrix                                       Adjacency list   
            {false,  true,  true,  true, false, false, false, false},  // A-->B-->C-->D
            {false, false, false, false,  true,  true, false, false},  // B-->E-->F
            {false, false, false, false, false, false, false, false},  // C
            {false, false, false, false, false, false,  true,  true},  // D-->G-->H
            {false, false, false, false, false, false, false, false},  // E 
            {false, false, false, false,  true, false, false, false},  // F-->E
            { true, false,  true, false, false,  true, false, false},  // G-->A-->C-->F
            {false, false, false, false, false,  true, false, false}   // H-->F
          };

Visit operation and search algorithm:

void Visit(Vertex &v)
{
  cout << v.label << " ";
}

void GraphSearchStack1(Vertex *v, Vertex Vertices[])
{
  stack<Vertex *> C;

  C.push(v);                          // Put v into container C.  
  while (!C.empty())                  // While (container C is not empty)
  {
    Vertex *x = C.top();              // Remove a vertex, x, from container C
    C.pop();
    if (!x->visited)                  // If (x has not been visited)
    {
      Visit(*x);                      // Visit x
      x->visited = true;              // Set x.visited to TRUE
      for (int i = 0; i < SIZE; i++)  // For each vertex, w, 
      {
        if ((x->neighbors[i]) &&      //  (adjacent to x) and
            (!Vertices[i].visited))   //  (has not been visited)
          C.push(&Vertices[i]);       // Put w into container C
      }
    }
  }
}

void main()
{
  GraphSearchStack1(&Vertices[0], Vertices);
}

Changing the for loop causes the alternative orderings:

for (int i = SIZE - 1; i >= 0; i--)

Using a queue instead of a stack: (slightly modified)

void GraphSearchQueue1(Graph G, Vertex *v, Vertex *vertices)
{
  int size = sizeof(G[0]);
  queue<Vertex *> C;        // uses a queue
  C.push(v);
  while (!C.empty())
  {
    Vertex *x = C.front();  // front() instead of top()
    C.pop();
    if (!x->visited)
    {
      Visit(*x);
      x->visited = true;
      for (int i = 0; i < size; i++)
      {
        if ((x->neighbors[i]) &&
            (!vertices[i].visited))
          C.push(&vertices[i]);
      }
    }
  }
}

Self-check

Implement code to traverse the graph exhaustively.

Spanning Trees

Suppose we have an undirected graph with many edges connecting the vertices. In other words, there are many paths from a given vertice to any other vertice. Also suppose that we just want the set of edges that connects all of the vertices in the cheapest way.

Given a connected, undirected graph G = (V, E), a tree that uses the edges, E, from G, and contains all of the vertices, V, is called a spanning tree for G.

Named because the tree "spans" the graph.
Since we are dealing with a tree, the set of vertices and edges must be acyclic (i.e. no cycles).
If there are N nodes in the graph, there will be exactly N - 1 edges in the tree. (This is one of the definitions of a tree.) The graph itself will likely have more than N edges.
If the graph is weighted, then there is a cost associated with the spanning tree.
If the cost is minimized, the tree is a minimal spanning tree. More accurately, it might be called a minimum-weighted spanning tree. (We are decreasing the cost of the edges in the tree, not the number of edges, which will always be N - 1.)
The trees are also unrooted and unordered, unlike other trees we've been working with.
Used in many situations, especially networking and communications: (Spanning Tree Protocol)
- May have many routes between computers, but you just want one set that connects everyone in the cheapest way.
- Cheap could mean actual monetary cost or could mean "fastest" (in which case you may want to maximize the cost.)

There are two well-known algorithms for finding minimum spanning trees from a graph: Prim's algorithm and Kruskal's algorithm.

Both can be made to run in O(E lg V) time, which is the same as O(N lg N) when the graph is sparse (the number of edges is about the same as the number of vertices).
Prim's algorithm can be made to run in time O(E + V lg V), which is an improvement if |V| (the number of vertices) is much smaller than |E| (the number of edges). (The graph is very dense.)
The efficiency of the algorithms depends on the implementation of the "auxiliary" data structures as well as the density of the graphs.
- Prim's performance
- Kruskal's performance
If all edges from a node have unique weights, the resulting tree will be unique. (Otherwise, there could be multiple min/max spanning trees.)
Both algorithms are greedy algorithms.
- Successful: Making change (US coins). (From CS120: coins practice)
- Unsuccessful: Largest-sum from here.

Examples:

Original graph Embedded tree Tree

Prim's algorithm using a tree:

Choose any vertex in the graph
Add it to an empty tree
Until all nodes are in the tree
- Choose the edge of least cost that emanates from a node in the tree thus far
- Add that edge and vertex to the tree

Examples:

Starting at node A, the nodes will be added to the tree in this order:

A B G H F D C E

Starting at node H, the nodes will be added to the tree in this order:

H F D C E B A G

The implementation is left as an exercise for the student.

Kruskal's algorithm using a forest:

Construct a forest from the N nodes in the graph
Put the (sorted) edges in a queue
Until there are N - 1 edges in the forest (a single tree)

Extract the "cheapest" edge from the queue
If it will form a cycle, discard it
Otherwise, add to the forest (always joins two trees)

Priority queue
Edge Weight ---- ------ C-D 3 H-F 4 H-D 5 D-E 6 A-B 10 A-G 12 B-H 14 G-F 15 F-E 17 A-F 18 B-C 22

The edges will be added in this order:

C-D  H-F  H-D  D-E  A-B  A-G  B-H

Changing a few weights: A-B(10 to 13), G-F(15 to 7) , D-E(6 to 20)

New graph	Priority queue		Graph with cycle	Embedded tree
	Edge Weight ---- ------ C-D 3 H-F 4 H-D 5 G-F 7 A-G 12 A-B 13 B-H 14 F-E 17 A-F 18 D-E 20 B-C 22

The embedded tree:

Tree alone Another view Another view

Elephant in the room: There is something that could make the implementation (as described above) of Kruskal's algorithm inefficient. (Union-Find algorithms).

The implementation is left as an exercise for the student.

More Spanning tree info

A Shortest Path Algorithm

Example graph and adjacency matrix:

Graph Adjacency matrix Paths

1 2 3 4 5 6

1 0 3 ∞ ∞ ∞ 5

2 ∞ 0 7 ∞ ∞ 10

3 ∞ ∞ 0 5 1 ∞

4 ∞ ∞ ∞ 0 6 ∞

5 ∞ ∞ ∞ ∞ 0 7

6 ∞ ∞ 8 2 ∞ 0

Nodes Cost ------------------ 1 2 3 4 5 21 1 2 3 5 11 1 2 6 3 5 22 1 2 6 3 4 5 32 1 2 6 4 5 21 1 6 3 5 14 1 6 3 4 5 24 1 6 4 5 13

There are many paths from 1 to 5. How do we find the cheapest path?

Dijkstra's Algorithm

Given a source node, we can find the shortest distance to every other node in a graph.

Undirected Weighted Graph	Paths and Costs From A
	To Path Cost ------------------------ A A 0 B A-B 13 C A-B-D-C 30 D A-B-D 25 E A-B-D-C-E 34 F A-F 11

Pseudocode for Dijkstra's algorithm:

Choose a node to be the source or starting point.
Initialize source to 0 cost and mark as evaluated.
Initialize all nodes to infinite cost from the source.

For each node, y, adjacent to source
  1. Relax the node. That is, set y's cost to the cost of all edges from source to y.
  2. Place y into a priority queue based on its total cost. (Lower is better)
  3. Add source node as predecessor of y.
End For

While there are nodes in the graph that haven't been evaluated
  Remove a node, x, from the PQ (lowest total cost)
  If the node has already been evaluated
    Discard the node
    Go to top of while 
  Else
    Mark x as evaluated.
    For each neighbor, y, of x
      Relax y
      If new cost to reach y is less
        Update list of nodes (path) to y from source.
        Place y in the PQ.
      End If
    End For
  End If
End While

Notes:

Dijkstra's algorithm only works for positive weights.
Runtime complexity is O(m log n) where m is the number of edges and n is the number of nodes.
The algorithm is also a dynamic algorithm, meaning we keep the previous results for use with future calculations. (Think about the Fibonacci algorithm and the naive recursive implementation which didn't use dynamic programming.)
When the graph has the maximum number of edges, complexity is O(n²). (The maximum number of edges is m = n²).
To find the shortest path for all nodes, simply run the algorithm once for each node.
Other related algorithms are Floyd's algorithm (all-pairs shortest-paths using an adjacency matrix) and Warshall's algorithm (simply determines if a path exists using an adjacency matrix). This also works with negative weights.
Dijkstra's algorithm is a special case of the A* (A-star) algorithm.

Dijkstra's algorithm