B-Trees

Internal vs. External Searching

It is true that the bottleneck in modern computers is main memory (RAM). But that's because we assume that all of the data is already stored there. If you have to go to the disk, then even slow RAM seems extremely fast compared to the disk.

Considerations:

Introduction to B-Trees

Suppose we have 20 data elements (keyed from 1 to 20) and we want to store them in a tree so we can minimize the search time (read: minimize the height).

Storing the data in a balanced binary tree: (Height: 4)



Storing the data in a 2-3 tree: (Height: 2)



Storing the data in a 2-3-4-5 tree: (Height: 1)



Storing the data in a tree that can have more than 20 children per node: (Height: 0)



Observations:

For a balanced binary tree, the height, h was based on the number of nodes/elements: (1 element per node)
h = floor(log2(N))
So, a full and complete balanced binary tree with 31 nodes/elements will have a height of 4:

In a balanced n-ary tree, the height is also based on the number of elements: (more than 1 element per node)

h = floor(logbf(N)) where bf is the branching factor. (Note this is the minimum height)
So, a balanced 2-3-4 tree with 15 elements will have a height of 1. The same tree with 16 elements will have a height of 2 (the root node splits causing the tree to add one more level).


Same data in a balanced binary tree (height is 3):Showing the "4-nodes" (if this was a red-black tree):

B-Tree Details

Discussion of other Trees

More Details about B-Trees:

Below is a diagram of a B-tree of height 2 containing over one billion keys. Each internal node and leaf contains 1000 keys. There are 1001 nodes at depth 1 and over one million nodes at depth 2. There are over 1 billion keys at level 2. Shown inside each node is the number of keys in the node.


© 2001 Cormen et al.
Suppose the entire tree fit into memory. What can you say about the algorithm used to find a particular value? What if we had a balanced BST instead? What about the performance?

Self-check: What is a tree's branching factor? What is the branching factor of a BST? A 2-3 tree? A 2-3-4 tree? At most, how many children does a B-tree of order M have?

Implementing B-Trees

When storing the data in the node, a node may contain: Number of Keys and Children
struct Data1
{
  int Key;  // 4 bytes
  int Foo;  // 4 bytes
  int Bar;  // 4 bytes
};

struct BTreeNode
{
  int NumKeys;                        // num keys in this node
  Data1 Keys[MAX_KEYS];               // key is in a struct
  BTreeNode *Children[MAX_CHILDREN];  // "pointers" to children
                                      // MAX_CHILDREN == MAX_KEYS + 1
};
Given a PAGE SIZE of 4096 bytes, this would yield: Diagrams

Suppose this was our data: (52 bytes)

struct Data2
{
  int Key;       //  4 bytes
  int Foo;       //  4 bytes
  int Bar;       //  4 bytes
  char Name[20]; // 20 bytes
  double This;   //  8 bytes
  double That;   //  8 bytes
  float Other;   //  4 bytes
};
Given the same PAGE SIZE of 4096 bytes, this would yield: What about this data:
struct Data3
{
  int Key;  // 4 bytes
  int Data; // 4 bytes
};
Given the same PAGE SIZE of 4096 bytes, this would yield:

Self-check: What is a B-tree's minimum degree? What is the minimum degree of a 2-3-4 tree?

Operations with B-Trees

Inserting a key/data value Deleting a key/data value

B+ Trees

The size of DATA below is 1344. With 4096 byte pages, we would only get 3 structs per page.

struct DATA
{
  int ID; // the key
  char Name[32];    
  struct 
  {
    double x;
    double y;
    double z;
  } Position;
  double NFA[128];
  int DFA[64];
};
A good candidate for a B+ tree implementation.

File systems (huge amounts of data) such as NTFS (Windows), APFS (macOS), JFS, XFS, ReiserFS, and Btrfs are implemented using B+Trees. Also, Microsoft's SQL Server and Oracle's database systems support B+Trees.

Filesystem notes

More information about B-Trees at wikipedia.