Introduction to Linked Lists

Array Behavior

Arrays are simple, popular, and built-in to the C language and they have certain characteristics, both good and bad.

The Good:

Built-in to the language.
Easy to create at compile time or runtime.
Accessing any element in the array is trivial (just use an index).

This is also known as constant-time access. (Arrays allow random access)
The time it takes to access an element is independent of the number of elements in the array. Time complexity: O(k)

Easy to "clean up". (Static arrays, just forget about it. Dynamic arrays, use free.)

The Bad:

You need to know size ahead of time (both static and dynamic arrays).
You must allocate a fixed amount of space (can't resize an array).

For static arrays, the size must be known at compile-time.

Inserting and deleting anywhere but at the end requires a lot of work.

This is linear-time complexity: O(N)
The time it takes to insert/delete is proportional to the number of elements in the array.
Example: Insert 2 into a sorted array:

copy

It's called a linear-time algorithm because as the number of elements in the array increases, the time to insert an item increases (linearly).

The Ugly:

No bounds checking, allowing you to overwrite memory. (Most students are painfully aware of this by now.)
The unusual pointer-array relationship here. (Look in the Critique section.)

Example of the limitation of arrays: Reading integers from a file into an array.

This is our algorithm:

Allocate an array to hold the numbers
Open a file for reading
While there are more numbers in the file
1. Read in a number
2. Add it to the end of the array
Close the file

/* Prints each value in the integer array */
void print_array(int array[], int size)
{
  int i; /* Loop variable */

  for (i = 0; i < size; i++)
    printf("%i   ", array[i]);
  printf("\n");
}

int main(void)
{
  int numbers[30]; /* Array to hold the numbers (arbitrary size) */
  int count = 0;   /* Keep track of how many numbers read in     */

    /* Open a file for reading */
  FILE *fp = fopen("numbers.txt", "r");
    
    /* Check that the file opened... */

    /* While there are more numbers in the file */
  while (!feof(fp))
  {
    int number; /* Value from the file */

      /* Read in a number */
    if (fscanf(fp, "%i", &number) == 0)
      break;

      /* Add the number to the end of the array */
    numbers[count++] = number;
  }

    /* Close the file */
  fclose(fp);

    /* Print the array */
  print_array(numbers, count);

  return 0;
}

Of course, if there are more than 30 numbers, we are going to overwrite the end of the array.

Possible "fixes":

Don't read in more than 30 numbers.
Set the size of the array to more than 30 (and hope it's big enough and/or waste a lot of space)
Allocate the array at runtime. (Still need a size.) Choices:
1. Put the size in the file. (Maybe the first line contains the number of integers.) Allocate the array when the size is known.
2. Read the file twice. Once to count how many numbers are in the file and allocate the array dynamically and a second time to read them into the array.
3. Query the filesystem for the actual size of the file.
4. When the array is full, grow the array: allocate another, bigger array, and copy old values into it.

Linked Lists

We'd like to overcome the limitations of arrays. One way is to use a linked list. So, what is a Linked List?

A dynamic collection of elements called nodes.
A node is a struct in C.
A node has two main portions
- data portion -- same type (size) of information in all nodes
- pointer portion -- a pointer to the next node in the list (These are self-referencing structures.)
The data portion can be as simple as an int or very complex. (It's user-defined.)
The linked list is accessed through an external pointer (often called the head) which points to the first node in the list.
The last pointer in the list points to NULL, marking the end of the list. (Think of it as a "NULL-terminated" list)

An example of a node structure (for a singly linked list) that contains an integer as it's data:

struct NODE
{
  int number;        /* data portion    */
  struct NODE *next; /* pointer portion */
};

Example showing data as an integer, each node is 8 bytes (assume 32-bit pointers):

Example showing arbitrary addresses:
If the list is empty, head points to NULL.
If each node in the list has only one pointer (called a "next" pointer), the list is called a singly linked list.
- You can only move one direction (forward) in the list.
Some lists have nodes with two pointers (called "next" and "previous"). This type of linked list is known as a doubly linked list.

You can move in both directions (forward and backward).

Notice that the structure above is sort of recursive. In other words, we're defining the structure by including a reference to itself in the definition. (Actually, there's only a pointer to a structure of the same type.)

When the compiler encounters a structure member, it must know the size of the member. Since the size of all pointers is known at compile time, the code above is completely sane and legal. (Also, the compiler already knows what a NODE is.)

This example code:

  /* #1 Declare 3 structs */
struct NODE A, B, C;

  /* #2 Set the 'data' portions of the nodes */
A.number = 10;
B.number = 20;
C.number = 30;

  /* #3 Connect (link) the nodes together */
A.next = &B;   /* A's next points to B */
B.next = &C;   /* B's next points to C */
C.next = NULL; /* Nothing follows C    */

can be visualized as this:

After #1

After #2

After #3

With arbitrary addresses

The "problem" with this approach, is that we are declaring (and naming) all of the nodes at compile time. If we wanted to read a list of 30 integers from a file, we'd need to declare 30 NODE structs. We're worse off than with arrays.

Notice from the diagram that naming struct B and C is redundant. Also remember that we don't "name" our individual elements of an array. We refer to them by supplying a subscript on the array name:

int numbers[30]; /* 30 "anonymous" elements                    */
numbers[5] = 0;  /* We don't have a "name" for the 6th element */

This principle of "anonymous" elements will apply to linked lists as well:

To access an element of an array, we simply use the name of the array (essentially a pointer to the first element) and an index (offset from the beginning).

Constant-time complexity, O(k)

To access an element of a linked list, we use a pointer to the first node and then walk the list to find a particular node.

Linear-time complexity, O(N)

For example, with named nodes (as in the example above) we can print out the data of each node very simply:

printf("%i\n", A.number); /* 10 */
printf("%i\n", B.number); /* 20 */
printf("%i\n", C.number); /* 30 */

With unnamed nodes (i.e. access to the first node only):

printf("%i\n", A.number);             /* 10 */
printf("%i\n", A.next->number);       /* 20 */
printf("%i\n", A.next->next->number); /* 30 */

Because the next field of the NODE structure is a pointer to a structure, you must use the arrow operator, ->, when accessing the members of its structure.

Just like arrays go hand-in-hand with looping, so do linked lists. This is the common method of using a loop to "walk" the list:

struct NODE *pNode = &A; /* Point to first node */
while (pNode) 
{
  printf("%i\n", pNode->number); /* Print data                */
  pNode = pNode->next;           /* "Follow" the next pointer */
}

Visually:

We're already pointing at the first node, so just print the value:

Move to the next node (B) and print the value:

Move to the next node (C) and print the value:

There is no next node, so we're done.

Note: Just like with pointers to dynamically allocated memory, if you "move" the pointer, you lose where the first node is and there is no way to get back to it. When you are doing things like "walking the list", you need to make sure that you save a pointer to the first node, otherwise all of the nodes in the list could be lost.

Problem Revisited

Let's revisit the original problem of reading an unknown number of integers from a file:

Old way: Array New way: Linked list

Allocate an array to hold all of the numbers

Open a file for reading

While there are more numbers in the file

Read in a number

Add it to the end of the array

Close the file

Open a file for reading

While there are more numbers in the file

Read in a number

Allocate a node to hold the number

Add the node to the end of the list

Close the file

Old way: Array		New way: Linked list
Allocate an array to hold all of the numbers Open a file for reading While there are more numbers in the file Read in a number Add it to the end of the array Close the file		Open a file for reading While there are more numbers in the file Read in a number Allocate a node to hold the number Add the node to the end of the list Close the file

int main(void)
{
  struct NODE *pList = NULL; /* empty list */
  
    /* 1. Open a file for reading */
  FILE *fp = fopen("numbers.txt", "r");
    /* Check that the file opened... */

    /* 2. While there are more numbers in the file */
  while (!feof(fp))
  {
    struct NODE *pNode; /* for the current number */
    int number;         /* the number just read   */

      /* A. Read next integer from the file */
    if (fscanf(fp, "%i", &number) == 0)
      break;

      /* B. Allocate a new node struct (same for all nodes) */
    pNode = malloc(sizeof(struct NODE));
    pNode->number = number; /* Set the number         */
    pNode->next = NULL;     /* Set next (no next yet) */

      /* C. Add the new node to the end of the list     */
      /* If the list is NULL (empty), this is the first */
      /* node we are adding to the list.                */ 
    if (pList == NULL)
      pList = pNode;
    else
    {
        /* Find the end of the list (don't move pList!) */
      struct NODE *temp = pList;
      while (temp->next)
        temp = temp->next;

      temp->next = pNode; /* Put new node at the end */
    }
  }
    /* 3. Close the file */
  fclose(fp);
  print_list(pList);  /* Display the list */
  
  return 0;
}

Make sure you can follow what each line of code is doing. You should definitely draw diagrams until you are very comfortable with linked lists. (Which won't be until after you graduate, and even then you should still draw diagrams!)

Note these two sections especially:

Creating a new node for each element of data (number in the file):

  /* Allocate a new node struct (same for all nodes) */
pNode = malloc(sizeof(struct NODE));
pNode->number = number; /* Set the number         */
pNode->next = NULL;     /* Set next (no next yet) */

Adding the new node to the end of the list:

  /* If the list is NULL (empty), this is the first */
  /* node we are adding to the list.                */ 
if (pList == NULL)
  pList = pNode;
else
{
    /* Find the end of the list (don't move pList!) */
  struct NODE *temp = pList;
  while (temp->next)
    temp = temp->next;

  temp->next = pNode; /* Put new node at the end */
}

Also note the print_list function used above:

void print_list(const struct NODE *list)
{
  while (list)
  {
    printf("%i   ", list->number);
    list = list->next;
  }
  printf("\n");
}

A few points to make so far:

The code is certainly more complex than arrays, although not hard to understand.
The number of nodes in a linked list is only dependent on the amount of memory in the computer. (This code can handle small lists or large lists.)
Unlike arrays, there is no random access. We cannot just "jump" to the node we want. Instead, we must "walk" through the list one node-at-a-time.
We are only allocating what we need. (Arrays can waste space.)
There is a 4-byte or 8-byte (size of a pointer on 32/64-bit computers) overhead for each node.

With a previous pointer (doubly linked), there is twice as much overhead.

The time it takes to add a node to the end of the linked list takes longer as the list grows.
We must also remember to deallocate (free) each node in the list when we are finished. (We haven't done that in the example yet.)

Note: This very simple example does not do any error handling, especially the condition where malloc fails. In Real World™ code, you would need to have code that handles the case when malloc fails and deal with it accordingly.

Adding Nodes

Let's address the last two points now. First, this one: "The time it takes to add a node to the end of the linked list takes longer as the list grows."

This is simply because we are adding to the end and we don't have any immediate (random) access to the end. We only have immediate access to the first node; all of the other nodes must be accessed from the first one. If the list is long, this can take a while.

Solution #1: Maintain a pointer to the last node (tail).

We add a pointer variable to track the tail:

struct NODE *pList = NULL; /* empty list  */
struct NODE *pTail = NULL; /* no tail yet */

Now adding to the end of the list can be done in constant time instead of linear time.

Solution #2: Insert at the head of the list instead of the tail. This is simpler yet. This has the "feature" that the items in the list will be reversed.

How does this compare to adding elements to the front of an array?

When the order of the items in the list is not important to preserve, use Solution #2. This is the canonical way of dealing with a single-linked list. It's simple, efficient, and trivial to code and understand.

Freeing Nodes

Up until now, we haven't freed any of the nodes. Since we called malloc for these nodes, we have to call free when we're through. This is straight-forward using another while loop:

while (pList)
{
  struct NODE *temp = pList->next;
  free(pList);
  pList = temp;
}

Diagrams for deleting the nodes.

Notes thus far:

When we traverse (walk) a linked list, we always start from the beginning (head).
We cannot "jump" to a particular node because we don't have random access (like arrays).
Note that we cannot walk backwards through a singly linked list because we have no way to get from a node to the previous node.
If you only need to move in one direction, a singly linked list might be enough.
If you require bi-directional traversals, you will want to use a doubly linked list, which allows traversals in both directions.

Creating Functions

The code that was shown thus far manipulates all of the nodes in the list directly. What this means is that the lists were not "passed" to another function to do the work. In real code, you would create functions to do things such as AddToEnd, AddToFront, DeleteFront, FindItem, Insert, etc. The reason to do this is simple: we want to be able to re-use the functionality so that we can perform these operations on any list.

However, there is a slight caveat. Something that we've discussed many times before but still confuses many new programmers: Passing a pointer in to a function. Remember these points about passing parameters to functions:

If we want a function to change our data, we must pass a pointer to our data.
Passing the value instead of the address will cause the function to modify a copy of our data.
For example, if I want a function to change an integer, I would pass a pointer to the integer (e.g. the swap function).
But, what if I want the function to change a pointer? If we just pass the pointer by value (a copy), then the function will change the copy, meaning the copy will point to something else, but the original pointer will be unchanged.
If we want a function to change our pointer, we have to do the same thing: pass a pointer to the pointer.

With linked lists, we may want a function to add a new node to the front of a list. The way we do this is to point the new node's next pointer at the first node in the list, and then have the head pointer point at the new node. This is trivial and we saw it above in this page.

This means that, instead of passing the head node to the function, we need to pass a pointer to the head node to the function. This will allow the function to change what the head pointer will point at.

So, a function to add a node to the head of a list would be prototyped something like this (assuming a linked list of integers):

  /* Adds a node to the front of the list */
void AddToFront(struct NODE **ppList, struct NODE *pNode);

ppList - a pointer to the head pointer (i.e. the address of the head pointer)
pNode - a pointer to the new node that will be inserted at the front of the list

So, a possible implementation of the function may look like this:

void AddToFront(struct NODE **ppList, struct NODE *pNode)
{
    /* The new node's next pointer will point at the first node */
  pNode->next = *ppList;   
  
    /* 
     * Now, the head pointer points at the new node, which is now at the front.          
     * Notice that we are dereferencing the pointer to modify the original head pointer. 
     * The client passed in the address of the head pointer so we could change it. 
     */ 
  *ppList = pNode;         
}

As a rule of thumb, any function that might change what the head pointer is pointing at must take a pointer to a pointer to a node. If not, then the function will only change a copy of the head pointer, which, as everyone knows, will have no effect on the original head pointer. You would need to use a similar technique for a function that added to the end of a list as well. To help understand this concept, draw a diagram and see how passing pointers to pointers produces the desired results.

There is nothing fancy or weird or exotic about passing pointers to pointers. The only way a function can change the original data is if it receives a pointer to the data. If the data happens to be a pointer itself, then a pointer to the pointer must be passed in order to change it. This is why some of the linked list functions must pass pointers to pointers to nodes.

An Ordered List

The previous examples have added the data (integers) to the list in the order they arrived from the file. (Inserting at the front of the list caused the data to be reversed.) This is no different than the way you would add elements to an array.

Suppose we have some data (integers) in a file. This is the data that is in the file:

12 34 21 56 38 94 23 22 67 56 88 19 59 10 17

After reading it in, we would get this: (The bold indicates newly added numbers.)

pList → NULL
pList → 12
pList → 12 → 34
pList → 12 → 34 → 21
pList → 12 → 34 → 21 → 56
pList → 12 → 34 → 21 → 56 → 38
pList → 12 → 34 → 21 → 56 → 38 → 94
pList → 12 → 34 → 21 → 56 → 38 → 94 → 23
pList → 12 → 34 → 21 → 56 → 38 → 94 → 23 → 22
pList → 12 → 34 → 21 → 56 → 38 → 94 → 23 → 22 → 67
pList → 12 → 34 → 21 → 56 → 38 → 94 → 23 → 22 → 67 → 56
pList → 12 → 34 → 21 → 56 → 38 → 94 → 23 → 22 → 67 → 56 → 88
pList → 12 → 34 → 21 → 56 → 38 → 94 → 23 → 22 → 67 → 56 → 88 → 19
pList → 12 → 34 → 21 → 56 → 38 → 94 → 23 → 22 → 67 → 56 → 88 → 19 → 59
pList → 12 → 34 → 21 → 56 → 38 → 94 → 23 → 22 → 67 → 56 → 88 → 19 → 59 → 10
pList → 12 → 34 → 21 → 56 → 38 → 94 → 23 → 22 → 67 → 56 → 88 → 19 → 59 → 10 → 17

Now, suppose instead that we want to keep the linked list sorted, from smallest to largest, as we build it.

We just need to modify our while loop from the code above that adds a node to the list. Instead of walking to find the end (or using a tail pointer), you would walk the list until you encountered a value that was greater than (or equal to) the value you are inserting.

The final sorted list would look like this:

10   12   17   19   21   22   23   34   38   56   56   59   67   88   94

If we insert a call to print_list after every insertion into the list, we can see the list evolve. The bold indicates newly inserted numbers:

pList → NULL
pList → 12
pList → 12 → 34
pList → 12 → 21 → 34
pList → 12 → 21 → 34 → 56
pList → 12 → 21 → 34 → 38 → 56
pList → 12 → 21 → 34 → 38 → 56 → 94
pList → 12 → 21 → 23 → 34 → 38 → 56 → 94
pList → 12 → 21 → 22 → 23 → 34 → 38 → 56 → 94
pList → 12 → 21 → 22 → 23 → 34 → 38 → 56 → 67 → 94
pList → 12 → 21 → 22 → 23 → 34 → 38 → 56 → 56 → 67 → 94
pList → 12 → 21 → 22 → 23 → 34 → 38 → 56 → 56 → 67 → 88 → 94
pList → 12 → 19 → 21 → 22 → 23 → 34 → 38 → 56 → 56 → 67 → 88 → 94
pList → 12 → 19 → 21 → 22 → 23 → 34 → 38 → 56 → 56 → 59 → 67 → 88 → 94
pList → 10 → 12 → 19 → 21 → 22 → 23 → 34 → 38 → 56 → 56 → 59 → 67 → 88 → 94
pList → 10 → 12 → 17 → 19 → 21 → 22 → 23 → 34 → 38 → 56 → 56 → 59 → 67 → 88 → 94

Try doing that with an array! (By the way, how would you do this with an array?)

Modifying the loop to keep the list sorted is an exercise for the student.

Doubly Linked Lists

A doubly linked list is a list that has two pointers. In addition to the next pointer, it has a previous pointer. This allows you to traverse the list in both directions.

An example node structure for a doubly linked list:

struct Node
{
  int number;        /* data portion         */
  struct NODE *next; /* node after this one  */
  struct NODE *prev; /* node before this one */
};

Compared with singly linked lists, doubly linked lists:

require an extra pointer for each node.
require slightly more work at runtime to "hook-up" the extra pointer.
are more flexible by allowing traversals in both directions.
can be simpler to implement because nodes can access their next and previous node.

The implementations for functions to manipulate doubly linked lists would be similar to the singly linked lists functions, with the additional code for dealing with the previous pointer.

Linked List Summary

To summarize, linked lists:

are an alternate data structure for storing related data.
are more complex in structure and more complex in accessing than arrays.
require the programmer to deal with the complexity. (They are not built into the language.)
require more overhead than arrays.
can grow and shrink in size at runtime.
allow very efficient use of memory since you only allocate what you need (and free when you're done.)
are very efficient for inserting and deleting items at any point in the list.

You don't have to move existing elements.

are used heavily in most programs written in C or C++.

Why not do away with arrays and use linked lists for all lists?