Mead's Guide to Lambda Expressions in C++

(a.k.a. Anonymous Functions)

Last update: November 03, 2020 at 08:26:33

Background

This topic is directly related to functions in C++. Functions are the basis for all programming in C and C++. But, for all of the power of functions, they have some glaring short-comings:

You can't pass a function to a function. (You can pass a pointer to a function, though.)
You can't return a function from a function. (You can return a pointer to a function.)
You can't have an array of functions. (Again, you can have an array of function pointers.)
In C, all functions are at the global (or file) scope.

In both C and C++ you can't have nested (or local) functions (i.e. a function defined inside of another function.)

In C++, we can put functions into namespaces or classes to minimize the affect that one function's name may clash with another function of the same name.
Using function pointers prevents the compiler from certain optimizations (such as inlining).

We can get the benefit of inlining from function objects, which are simply objects instantiated from a class that overloads the function call operator.
Function objects can be passed to functions, returned from functions, and stored in an array.
However, creating a class and overloading the function call operator for a simple function seems like overkill.

If we need a simple, small function, we still need to give it a name that doesn't conflict with any other symbol in scope. This function may be separated by some distance to the code that is actually going to use it (again, no local functions).
If we have lots of small, only-used-in-one-place kind of functions, we litter the code with all of these seemingly-random functions all over the place.

However, even in spite of these supposed limitations, they don't really seem to prevent programmers from writing massive and complex programs in C++. So, what exactly is the problem? As usual, I will demonstrate with several code examples.

A lot of the code used on this web page requires a C++14-compliant compiler. This includes the GNU g++ compiler (version 5 or later) and the Clang compiler (version 3.4 or later). You will need to provide this option on the command line: -std=c++14. (This replaces -ansi.) Depending on the exact version of the compiler, you may have to use -std=c++1y instead. I tested with g++ version 5.1 and Clang version 3.4 and both accepted -std=c++1y.

Microsoft's Visual Studio 2015 is supposed to be released soon with more support for C++11/14. Until then, you'll need to use one of the compliant compilers.
Update: Microsoft has released Visual Studio 2015 and is said to have full C++14 compiliance.
Update 2: Microsoft's Visual Studio 2017 includes a compiler that is fully C++17 compliant.

First Example

As it turns out, one of the most popular uses for lambda expressions is as parameters to the STL generic algorithms (functions). So, as a first example, let's look at the popular sort algorithm in the STL using a vector of strings:

Code:

vector<string> animals {"dog", "stringray", "alligator", "hippopotamus",
                        "mouse", "chihuahua", "yak", "zebra", "uguisu",
                        "rabbit", "cheetah"
                       };

cout << "Original order:" << endl;
printc(animals);

Output:

dog  stringray  alligator  hippopotamus  mouse  chihuahua  yak  zebra  uguisu  rabbit  cheetah

There's nothing really notable here, but there are a couple of things to point out:

The vector is initialized with the Uniform Initializer (a.k.a. brace-enclosed initializer) syntax new in C++11. In the "olden days", we would have had to write it something like this:

  // Create the container
vector<string> animals;

  // Add items to the container
animals.push_back("dog");
animals.push_back("stringray");
animals.push_back("alligator");
animals.push_back("hippopotamus");
animals.push_back("mouse");
animals.push_back("chihuahua");
animals.push_back("yak");
animals.push_back("zebra");
animals.push_back("uguisu");
animals.push_back("rabbit");
animals.push_back("cheetah");

There's a function template I wrote called printc that simply prints out the contents of most any container:

template <typename T>
void printc(const T& v)
{
  typename T::const_iterator iter;
  for (iter = v.begin(); iter != v.end(); ++iter)
    cout << *iter << "  ";
  cout << endl;
}

Now, let's say we want to sort the strings alphabetically in increasing order:

sort(animals.begin(), animals.end());
cout << "Order by less:" << endl;
printc(animals);

Output:

Order by less:
alligator  cheetah  chihuahua  dog  hippopotamus  mouse  rabbit  stringray  uguisu  yak  zebra

The default behavoir of sort is to use the standard function object less. The above call to sort is equivalent to this:

sort(animals.begin(), animals.end(), less<string>());

So, if we want to sort the list in decreasing order, we would use greater, which is another pre-defined function object in the STL:

sort(animals.begin(), animals.end(), greater<string>());

Output:

Order by greater:
zebra  yak  uguisu  stringray  rabbit  mouse  hippopotamus  dog  chihuahua  cheetah  alligator

A sample implementation of greater:

template <typename T>
class greater
{
  public:
    bool operator()(const T& a, const T &b) const
    {
      return a > b;
    }
};

You can see that the greater class is just a glorified wrapper around the > operator. Most function objects are simplistic like this.

Here are some other standard function objects that are available in the STL:

Binary function objects		Meaning
less<type>() greater<type>() equal_to<type>() not_equal_to<type>() less_equal<type>() greater_equal<type>() logical_and<type>() logical_or<type>() plus<type>() minus<type>() multiplies<type>() divides<type>() modulus<type>()		lhs < rhs lhs > rhs lhs == rhs lhs != rhs lhs <= rhs lhs >= rhs lhs && rhs lhs \|\| rhs lhs + rhs lhs - rhs lhs * rhs lhs / rhs lhs % rhs

Binary function objects

Meaning

less<type>()           
greater<type>()        
equal_to<type>()       
not_equal_to<type>()   
less_equal<type>()     
greater_equal<type>()  
logical_and<type>()     
logical_or<type>()      
plus<type>()            
minus<type>()           
multiplies<type>()      
divides<type>()         
modulus<type>()

lhs < rhs
lhs > rhs
lhs == rhs
lhs != rhs
lhs <= rhs
lhs >= rhs
lhs && rhs
lhs || rhs
lhs + rhs
lhs - rhs
lhs * rhs
lhs / rhs
lhs % rhs

But what if we need a custom sort method, say, sorting by string length? We have several options.

Create a boolean function, shorter, that compares two strings to see which is shorter:

bool shorter(const string& left, const string& right)
{
  return left.size() < right.size();
}

This function will return true if the length of left string is less than the length of the right string and false otherwise. We would call it like this:

sort(animals.begin(), animals.end(), shorter);

Output:

yak  dog  zebra  mouse  uguisu  rabbit  cheetah  stringray  chihuahua  alligator  hippopotamus

Create a class (called Shorter) and overload the function call operator to create a functor:

class Shorter
{
  public:
    bool operator()(const string& left, const string& right)
    {
      return left.size() < right.size();      
    }
};

We would call it like this:

sort(animals.begin(), animals.end(), Shorter());

with the output being the same. At least with function objects, you can define the class locally within a function (C++11 or later), giving you kind of the behavior of local or nested functions. You can't do that with a regular function, to wit:

void f1()
{
  vector<string> animals {"dog", "stringray", "alligator", "hippopotamus",
                          "mouse", "chihuahua", "yak", "zebra", "uguisu",
                          "rabbit", "cheetah"
   
    // This class can only be local to a function in C++11 or later.
    // C++98 required it to be at the global scope.
  class Shorter
  {
    public:
      bool operator()(const string& left, const string& right)
      {
        return left.size() < right.size();      
      }
  };

  cout << "Original order:" << endl;
  printc(animals);
  cout << endl;

  sort(animals.begin(), animals.end(), Shorter());
  cout << "Order by length (functor, shortest first)" << endl;
  printc(animals);
  cout << endl;
}

Output:

Original order:
dog  stringray  alligator  hippopotamus  mouse  chihuahua  yak  zebra  uguisu  rabbit  cheetah  

Order by length (functor, shortest first)
yak  dog  zebra  mouse  uguisu  rabbit  cheetah  stringray  chihuahua  alligator  hippopotamus

Still, that's a lot of overhead for something so trivial.

You can imagine that we might also like to sort from longest to shortest, sort by number of vowels, sort by the sum of the ASCII characters, etc. We could easily have a dozen or more functions and/or classes to deal with. Lots of code with very local and specific use.

This leads to a third alternative:

Use a lambda expression. This is what we would really like to do (but it's not legal syntax):
```
sort(animals.begin(), 
     animals.end(), 
     bool shorter(const string& l, const string& r){return l.size() < r.size();}
    );
```
See how I'm passing the entire function to sort? This, in a nutshell, is what it means to pass a "function" to another function. Well, that's not the exact syntax. Basically, you remove the return type and replace the function name with a pair of brackets:
```
sort(animals.begin(), 
     animals.end(), 
     [](const string& l, const string& r){return l.size() < r.size();}
    );
```
Notes:
- It may appear that we are passing the entire "function body" to the sort algorithm inline! This is why lambda expressions are also known as anonymous functions; the function has no name.
- Its use is exactly where it is needed, no more, no less.
- It's also impossible to conflict with any other symbol in the program because there is no name.
- Also, this is just the basic syntax for a lambda expression. We will see shortly that we can put stuff inside the pair of brackets, as well as other annotations.

So, it appears that for smaller, inlined functions, lambda expressions are exactly what we've been waiting for.

Incidentally, other languages have had this technique for years. The name lambda comes from Lambda Calculus where this technique originated decades ago.

Function Pointers as Parameters

The following is an example that I have used in the past when discussing STL algorithms and function objects. I'm going to repeat it here so that we can compare and contrast with the new version that uses lambda expressions instead.

We will use these simple functions, which should be self-explanatory:

int NextInt() { static int i = 0; return ++i; }

int RandomInt() { return rand(); }

void TripleByRef(int& value) { value *= 3; }

int TripleByVal(int value) { return value * 3; }

bool DivBy6(int value) { return !(value % 6); }

bool IsEven(int value) { return !(value % 2); }

Note 1: NextInt and RandomInt are called generators.

A generator is a function with no inputs that returns a value (in some sequence, which may be random) each time it is called.

Note 2: DivBy6 and IsEven are called predicates.

A predicate is simply a function that returns true or false.

This code uses the functions (pointers, actually) as parameters to the algorithms. Notice the lack of any looping or iteration. The algorithms do the looping for you. And as the wise-man, Bjarne Stroustrup, says: "Prefer algorithms to loops".

void f5()
{
    // Make it easy to switch containers
  typedef std::list<int> ContainerType;

    // Create a container all set to 0: 0  0  0  0  0  0  0  0  0  0
  ContainerType cont1(10);
  std::cout << "Container all set to 0\n";
  printc(cont1);

    // Fill list with the value 5: 5  5  5  5  5  5  5  5  5  5
  std::fill(cont1.begin(), cont1.end(), 5);
  std::cout << "\nContainer all set to 5\n";
  printc(cont1);

    // Fill list with values (1..10): 1  2  3  4  5  6  7  8  9  10
  std::generate(cont1.begin(), cont1.end(), NextInt);
  std::cout << "\nContainer set sequentially from 1 to 10\n";
  printc(cont1);

    // Multiply each element by 3 (incorrect): 1  2  3  4  5  6  7  8  9  10
  std::for_each(cont1.begin(), cont1.end(), TripleByVal);
  std::cout << "\nEach element multiplied by 3 (incorrect)\n";
  printc(cont1);

    // Multiply each element by 3: 3  6  9  12  15  18  21  24  27  30
  std::for_each(cont1.begin(), cont1.end(), TripleByRef);
  std::cout << "\nEach element multiplied by 3\n";
  printc(cont1);

    // Multiply each element by 3: 9  18  27  36  45  54  63  72  81  90
  std::transform(cont1.begin(), cont1.end(), cont1.begin(), TripleByVal);
  std::cout << "\nEach element multiplied by 3 (again)\n";
  printc(cont1);

    // Create another list (same size as first list)
  ContainerType cont2(cont1.size());

    // Count number of even elements: 5
  int count = std::count_if(cont1.begin(), cont1.end(), IsEven);
  std::cout << "\nNumber of even elements: " << count << std::endl;

    // Copy values from list1 to list2 where element not divisible by 6
    // 9  27  45  63  81  0  0  0  0  0
  std::remove_copy_if(cont1.begin(), cont1.end(), cont2.begin(), DivBy6);
  std::cout << "\nCopy values from list1 to list2 where element not divisible by 6\n";
  printc(cont2);

    // Copy values from list1 to list2 where element not divisible by 6
    // and trim the list
    // List1: 9  18  27  36  45  54  63  72  81  90
    // List2: 9  27  45  63  81
  std::cout << "\nCopy values from list1 to list2 where element not divisible by 6 ";
  std::cout << "and trim the list\n";
  cont2.erase(std::remove_copy_if(cont1.begin(), cont1.end(), cont2.begin(), DivBy6), 
              cont2.end());
  std::cout << "List1: ";
  printc(cont1);
  std::cout << "List2: ";
  printc(cont2);
}

Full output:

Container all set to 0
0  0  0  0  0  0  0  0  0  0

Container all set to 5
5  5  5  5  5  5  5  5  5  5

Container set sequentially from 1 to 10
1  2  3  4  5  6  7  8  9  10

Each element multiplied by 3 (incorrect)
1  2  3  4  5  6  7  8  9  10

Each element multiplied by 3
3  6  9  12  15  18  21  24  27  30

Each element multiplied by 3 (again)
9  18  27  36  45  54  63  72  81  90

Number of even elements: 5

Copy values from list1 to list2 where element not divisible by 6
9  27  45  63  81  0  0  0  0  0

Copy values from list1 to list2 where element not divisible by 6 and trim the list
List1: 9  18  27  36  45  54  63  72  81  90
List2: 9  27  45  63  81

In case you need a refresher, here are the details regarding the algorithms used in the example:

Declarations
Implementations

The same code using lambda expressions instead of the functions:

void f3()
{
    // Make it easy to switch containers
  typedef std::list<int> ContainerType;

    // Create a container all set to 0: 0  0  0  0  0  0  0  0  0  0
  ContainerType cont1(10);
  std::cout << "Container all set to 0\n";
  printc(cont1);

    // Fill list with the value 5: 5  5  5  5  5  5  5  5  5  5
  std::fill(cont1.begin(), cont1.end(), 5);
  std::cout << "\nContainer all set to 5\n";
  printc(cont1);

    // Fill list with values (1..10): 1  2  3  4  5  6  7  8  9  10
  std::generate(cont1.begin(), cont1.end(), []{static int i = 0; return ++i;});
  std::cout << "\nContainer set sequentially from 1 to 10\n";
  printc(cont1);

    // Multiply each element by 3 (incorrect): 1  2  3  4  5  6  7  8  9  10
  auto TBV = [](int v){return v * 3;};
  std::for_each(cont1.begin(), cont1.end(), TBV);
  std::cout << "\nEach element multiplied by 3 (incorrect)\n";
  printc(cont1);

    // Multiply each element by 3: 3  6  9  12  15  18  21  24  27  30
  std::for_each(cont1.begin(), cont1.end(), [](int& v){v *= 3;});
  std::cout << "\nEach element multiplied by 3\n";
  printc(cont1);

    // Multiply each element by 3: 9  18  27  36  45  54  63  72  81  90
  std::transform(cont1.begin(), cont1.end(), cont1.begin(), TBV);
  std::cout << "\nEach element multiplied by 3 (again)\n";
  printc(cont1);

    // Create another list (same size as first list)
  ContainerType cont2(cont1.size());

    // Count number of even elements: 5
  int count = std::count_if(cont1.begin(), cont1.end(), [](int v){return !(v %2);});
  std::cout << "\nNumber of even elements: " << count << std::endl;

    // Copy values from list1 to list2 where element not divisible by 6
    // 9  27  45  63  81  0  0  0  0  0
  auto DB6 = [](int v){return !(v % 6);};
  std::remove_copy_if(cont1.begin(), cont1.end(), cont2.begin(), DB6);
  std::cout << "\nCopy values from list1 to list2 where element not divisible by 6\n";
  printc(cont2);

    // Copy values from list1 to list2 where element not divisible by 6
    // and trim the list
    // List1: 9  18  27  36  45  54  63  72  81  90
    // List2: 9  27  45  63  81
  std::cout << "\nCopy values from list1 to list2 where element not divisible by 6 ";
  std::cout << "and trim the list\n";
  cont2.erase(std::remove_copy_if(cont1.begin(), cont1.end(), cont2.begin(), DB6), 
              cont2.end());
  std::cout << "List1: ";
  printc(cont1);
  std::cout << "List2: ";
  printc(cont2);
}

The only things that need explaining at this point are the auto variables TBV (shorthand for TripleByValue) and DB6 (shorthand for DivideBy6).

Yes, you can name your lambda expressions if you want, and this is one of the ways. I named them because I wanted to re-use them instead of typing the entire lambda expressions again. That's all it is.

Revisiting the first example we can see how we might name our lambda expressions.

The original, anonymous way:

sort(animals.begin(), 
     animals.end(), 
     [](const string& l, const string& r){return l.size() < r.size();}
    );

Declaring a variable of the correct type using std::function (harder):

function<bool(const string&, const string&)> comp1 = [](const string& l, const string& r){return l.size() < r.size();};
sort(animals.begin(), animals.end(), comp1);

Declaring a variable of the correct type using auto (easier):

auto comp2 = [](const string& l, const string& r){return l.size() < r.size();};
sort(animals.begin(), animals.end(), comp2);

This is where the auto keyword really shines. It's not so much about saving keystrokes here. You can imagine that complicated functions can be challenging to get the exact type correct. Compilers are not challenged and can figure this out in their sleep.

Lambda Expressions (Details)

Up to this point, all of our lambda expressions (from here on simply called lambdas) have been self-contained, meaning, the only data that was accessed were the parameters that were passed in. Let's take a step back and look at the canonical C++ program Lambda-style:

int main()
{
  []{cout << "Hello, World!" << endl;}();
}

Some points to make about this trivial lambda:

The left bracket is what introduces the lambda. As we'll see, you can put stuff inside the brackets.
There are no inputs, so the parentheses are optional and omitted here.
The left brace starts the body of the lambda (just like a function).
The only statement is the output statement, terminated by a semi-colon.
The right brace ends the body of the lambda (just like a function).
The empty parentheses at the end are the function call operator. Thus, invoking the lambda immediately.
The entire expression is terminated with a semi-colon.

This lambda is about as trivial as they come, no inputs, no outputs (returns), no access to anything outside but the global cout/endl symbols. It should come as no surprise that lambdas can access global symbols just like any ordinary function/functor.

But, can lambdas access any of the symbols that are local to the function (the local environment) that contains the lambda? The answer, of course, is yes.

Suppose we have a string that is local to the function and we try to access it:

void f4()
{
  string s("Hello");

    // Define the lambda expression
  auto lambda1 = []{cout << s << endl;};

    // Call it
  lambda1();
}

We are immediately met with this error message (the name of the source file is first.cpp):

first.cpp: In lambda function:
first.cpp:361:28: error: 's' is not captured
  auto lambda1 = []{cout << s << endl;};
                            ^

Believe it or not, the message tells you exactly what the problem is: The variable s is not captured. Duh.

If you want to access a non-static local symbol, you need to capture it:

void f5()
{
  string s("Hello");

    // Define the lambda expression
  auto lambda1 = [s]{cout << s << endl;};

    // Call it
  lambda1();
}

Notice the [s] in the brackets. This is how you capture (gain access to) non-static local symbols. Now, the lambda will output the string as expected.

Notice the phrase non-static local in the sentences above.

You only need to capture non-static local symbols.
Local symbols that are static are captured automatically.
In fact, any symbol that has static storage duration is automatically captured. So, in addition to static local symbols, this includes objects that are at the:

global scope
file scope
namespace scope

Attempting to capture an object that is automatically captured is an error.

We'll see that C++14 relaxes this a bit with generalized captures.

This short snippet demonstrates the automatic capturing:

int a = 1;                 // global scope
static int b = 2;          // file scope
namespace foo {int c = 3;} // namespace scope

void f9()
{
  static int d = 4; // local static
  auto lambda1 = []{cout << a + b + foo::c + d << endl;};
  lambda1();
}

Output:

Ok, so now we know how to access (non-static) local variables. Going back to the previous example, suppose that the lambda wants to modify the string like this:

void f6()
{
  string s("Hello");

  auto lambda1 = [s]{s[0] = 'C'; cout << s << endl;};
  lambda1();
}

We expect that the string will be changed from "Hello" to "Cello" but we are met with this error message:

first.cpp: In lambda function:
first.cpp:382:26: error: assignment of read-only location 's.std::basic_string<_CharT, _Traits, _Alloc>::operator[], std::allocator >(0ul)'
  auto lambda1 = [s]{s[0] = 'C'; cout << s << endl;};
                          ^

Long story short, the symbol was captured by value. This is why the error message said you were trying to change a read-only location. By default, symbols are captured by value and they can't be change. The solution? Capture the symbol by reference:

void f7()
{
  string s("Hello");

  auto lambda1 = [&s]{s[0] = 'C'; cout << s << endl;};
  lambda1();
}

Notice the [&s] in the brackets. This is how you capture local symbols by reference. Now, the lambda can modify the string as expected:

Output:

Cello

To see it in context:

void f8()
{
  string s("Hello");

  cout << s << endl;
  auto lambda1 = [&s]{s[0] = 'C';};
  lambda1();
  cout << s << endl;
}

Output:

Hello
Cello

Behind-the-scenes

We've seen that, before lambdas came along, we were using functions and function objects to provide the same kinds of behavior that the lambdas provide. How the compiler deals with lambdas isn't really that mysterious. Essentially, the lambda expression gives the compiler enough information for it to create a class with an overloaded function call operator.

Also, realize that the Standard does not specify how a compiler implements lambdas. This is just a high-level view of one possible way a compiler might implement the behavior.

Suppose we have a vector of numbers and we want to find out how many are between the numbers 3 and 7:

void f17()
{
  std::vector<int> v {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
  int low = 3, high = 7;

    // prints the value 5
  count = std::count_if(v.begin(), v.end(), [low, high](int x){return (x >= low && x <= high);});
  cout << "Count is " << count << endl;
}

The compiler will create a closure class (with some internal compiler-generated name), and instantiate it where the lambda was:

void f17()
{
  std::vector<int> v {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
  int low = 3, high = 7;

  class _xyz_INT_001
  {
    public:
      _xyz_INT_001(int a, int b) : a_(a), b_(b) {};
      bool operator()(int x) const {return (x >= a_ && x <= b_);}

    private:
      int a_;
      int b_;
  };

    // prints the value 5
  count = std::count_if(v.begin(), v.end(), _xyz_INT_001(low, high));
  cout << "Count is " << count << endl;
}

The class name _xyz_INT_001 above was something I made up. The compiler will generate a unique name for each closure class it creates. Had I captured the locals by reference:

  count = std::count_if(v.begin(), v.end(), [&low, &high](int x){return (x >= low && x <= high);});

The compiler would create a closure class something like this:

class _xyz_INT_002
{
  public:
    _xyz_INT_002(int& a, int& b) : a_(a), b_(b) {};
    bool operator()(int x) const {return (x >= a_ && x <= b_);}

  private:
    int& a_;
    int& b_;
};

This is what allows the captured references to be modified within the const member function, if we needed that ability.

What is sizeof a lambda?

We often use the sizeof operator to give us insight into a data structure. We've seen above how a compiler might implement the lambda. Using the sizeof operator shows that this is pretty close. As usual, code is worth a 1,000 words:

void f34()
{
  int a = 5, b = 4, c = 3, d = 2;

  auto lambda1 = [a]{cout << a << endl;};
  auto lambda2 = [&a]{cout << a << endl;};
  auto lambda3 = [a, b]{cout << a << endl;};
  auto lambda4 = [=]{cout << a << endl;};
  auto lambda5 = [=]{cout << (a + b + c + d) << endl;};
  auto lambda6 = [&a, &b, &c]{cout << a << endl;};
  auto lambda7 = [&]{cout << a << endl;};
  auto lambda8 = [&]{cout << (a + b + c + d) << endl;};
  auto lambda9 = []{cout << "Hello!" << endl;};

    // Output when compiled as a 64-bit program
  cout << sizeof(lambda1) << endl; //  4
  cout << sizeof(lambda2) << endl; //  8
  cout << sizeof(lambda3) << endl; //  8
  cout << sizeof(lambda4) << endl; //  4
  cout << sizeof(lambda5) << endl; // 16
  cout << sizeof(lambda6) << endl; // 24
  cout << sizeof(lambda7) << endl; //  8
  cout << sizeof(lambda8) << endl; // 32
  cout << sizeof(lambda9) << endl; //  1 (compiler dependent)
}

The reason that the "empty" lambda is 1 byte is to be compliant with the C++ Standard. Essentially, two distinct objects must have distinct addresses. With a size of at least 1 byte, two objects will have unique addresses. This is also why using new to allocate memory and providing a size of 0 usually allocates 1 byte. Compilers can specify any size but 0, and they generally specify 1 byte. Empty base classes (i.e. interfaces) can optimize away the 1 byte. Want to know more? Google is your buddy.

Caveats when capturing by reference

Recall this previous example:

auto comp2 = [](const string& l, const string& r){return l.size() < r.size();};
sort(animals.begin(), animals.end(), comp2);

We have a lambda expression (right side) being assigned (stored) in a variable (left side) named comp2. There is a special name that is given to variables of this kind: We call it a closure. A closure includes not only the code, but the captured environment. In the example above, nothing is captured, so this closure is safe to use anywhere at anytime.

The fundamental problem when capturing by reference is that the captured environment may no longer exist when the closure is invoked. And, since we can store a closure for use at a later time and place, it's quite possible that the captured objects are no longer valid.

Example 1, capture by value:

std::function<int(void)> func0()
{
  int a = 1, b = 2, c = 3;
  auto lambda1 = [a, c]{return a + c;};
  return lambda1;
}

void f13()
{
  auto x = func0();
  int i = x();
  cout << i << endl; // prints 4
}

As you can see, there is no problem when capturing by value. However:

Example 2, capture by reference:

std::function<int(void)> func1()
{
  int a = 1, b = 2, c = 3;
  auto lambda1 = [&a, &c]{return a + c;};
  return lambda1;
}

void f13()
{
  auto x = func1();
  int i = x();
  cout << i << endl; // prints random values (undefined)
}

Of course, the result is the same that would occur if you returned a pointer or reference to a local variable in a regular function. The local variables, a, b, and c, are no longer valid after func1 returns. But, we stored the closure (and references to the locals) in a variable and used it afterwards.

In the examples above, I use several temporary variables. I often do this so as not to make the code so terse and cryptic for first-time learners. The above example can be shortened to this:

std::function<int(void)> func4()
{
  int a = 1, b = 2, c = 3;
  return [&a, &c]{return a + c;};
}

void f14()
{
  cout << func4()() << endl; // prints random values (undefined)
}

A note about this syntax:

std::function<int(void)> func()

This syntax was introduced in C++11. The return type is the exact type of the lambda. There is also this syntax (called the trailing return type) which is also available in C++11:

auto func() -> std::function<int(void)>

And C++14 fixes things so we can just use the auto keyword alone:

auto func()

and the compiler will deduce the return type based on what is actually being returned. If nothing is returned, the type is void. So, the example above becomes this:

auto func5()
{
  int a = 1, b = 2, c = 3;
  return [&a, &c]{return a + c;};
}

void f15()
{
  cout << func5()() << endl; // prints random values (undefined)
}

And that is why auto is so nice to have. Let the compiler figure things out, if it can. This is not your parents' C++ language!

Deducing a Lambda's Return Type

Up until now, we haven't explicitly specified the return types from the lambdas. We just let the compiler deduce the correct type. Unfortunately, C++11 is very limited in this regard. C++11 can only deduce the type if the body is a single line return statement:

[](const string& l, const string& r){return l.size() < r.size();}

All of the examples thus far have had lambdas with trivial bodies. This code is problematic in C++11:

auto lambda1 = [](bool x){if (x) return 1; else return 2;};

The above will fail to compile in C++11, so you need to help the compiler by explicitly specifying the return type using the trailing return type syntax:

auto lambda1 = [](bool x) -> int {if (x) return 1; else return 2;};
                          ^^^^^^

However, C++14 relaxes that restriction and will handle it just fine. And while I'm on the subject, C++14 also supports return type deduction for all functions (regardless of their complexity), not just lambdas. So in C++ 14, lambdas can be as complex as you want them to be and the compiler will deduce the correct type.

Of course, you can't do things like this (returning two different types):

auto lambda1 = [x]{if (x) return 1; else return 2.0;};

Each return specifies different, incompatible, types. You'll see something like this error:

error: inconsistent types 'int' and 'double' deduced for lambda return type

But, you can't do that with any function (without casting), so it's really not a limitation.

Lambdas in Member Functions

As is expected, you can create and use lambdas within member functions of a class. But, there are a few things to note about that.

Suppose we have a lambda in a member function and we want to capture a member of that class:

class Foo
{
  public:
    Foo(int a) : a_(a) {}
    void bar() const
    {
      auto lam = [a_]{cout << a_ << endl;};
      lam();
    }

  private:
    int a_;
};

The above leads to these errors:

first.cpp: In member function 'void Foo::bar() const':
first.cpp:587:16: error: capture of non-variable 'Foo::a_' 
    auto lam = [a_]{cout << a_ << endl;};
                ^
first.cpp:592:7: note: 'int Foo::a_' declared here
   int a_;
       ^
first.cpp: In lambda function:
first.cpp:587:28: error: 'this' was not captured for this lambda function
    auto lam = [a_]{cout << a_ << endl;};
                            ^
first.cpp:587:28: error: invalid use of non-static data member 'Foo::a_'
first.cpp:592:7: note: declared here
   int a_;
       ^

If you look closely at the errors, you can see what the problem is:

error: 'this' was not captured for this lambda function

You can't capture individual members; you must capture this. Change this code:

auto lam = [a_]{cout << a_ << endl;};

to this code:

auto lam = [this]{cout << a_ << endl;};

and the compiler will accept it. Note that you can use default capture modes to capture this:

auto lam = [=]{cout << a_ << endl;};

auto lam = [&]{cout << a_ << endl;};

will work. But, you cannot capture this by reference directly:

auto lam = [&this]{cout << a_ << endl;};

g++ error message:

first.cpp: In member function 'void Foo::bar() const':
first.cpp:587:17: error: expected ','' before 'this'
    auto lam = [&this]{cout << a_ << endl;};
                 ^

Clang++ error message (clearer):

first.cpp:587:17: error: 'this' cannot be captured by reference
                        auto lam = [&this]{cout << a_ << endl;};
                                     ^
1 error generated.

The this pointer is treated special, so all of these do the same thing (capture this by value):

[this]    [&]    [=]    [&, this]

These constructs are ill-formed in C++11/14:

[&this]    [=, this]

Now, it's important to realize that what is being captured by value is the this pointer. Only the pointer is captured by value, meaning, a copy of the pointer is what we have. This is different than capturing the entire object by value:

[*this]

With C++11/14, this is illegal. However, with C++17, it is now legal to capture the object itself by value (copy). One situation where this might be necessary is when a lambda outlives the object that it is referencing. Capturing by value (copy) means that the object will be guaranteed to still be valid during the execution of the lambda. An example would be a lambda that's passed to another thread and is executed long after the object it references has gone out of scope and been destroyed.

C++2a (or whatever the version after C++17 will be called) will now accept this syntax:

[=, this]

Generic (Polymorphic) Lambda Expressions

The ability to have generic lambdas (templates) was added in C++14. The syntax is a little different, though. Instead of using the keyword template, you use the auto keyword:

void f19()
{
  std::vector<int> iv {1, 2, 3, 4, 5};
  std::vector<double> dv {1.1, 2.2, 3.3, 4.4, 5.5};
  
    // This is a generic lambda
  auto lambda1 = [](auto &value){return value * value;};

  std::transform(iv.begin(), iv.end(), iv.begin(), lambda1);
  printc(iv);

  std::transform(dv.begin(), dv.end(), dv.begin(), lambda1);
  printc(dv);
}

Output:

1  4  9  16  25  
1.21  4.84  10.89  19.36  30.25

The compiler-generated closure class may look something like this:

class _xyz_INT_003
{
  public:
    template<typename T>
    bool operator()(const T& x) const {return (return x * x);}
};

Generalized (Init) Captures

C++14 has given more power to the capture lists. They're called generalized captures or init captures (because you must initialize them). Here are some examples.

// "Normal" capture by value  
void f20()
{
  int level = 3;

  auto lambda1 = [level]{cout << level << endl;};
  lambda1();
}

// "Normal" capture by value  
void f21()
{
  int level = getLevel();

  auto lambda1 = [level]{cout << level << endl;};
  lambda1();
}

The two examples above use the "normal" capture syntax introduced in C++11. The next two examples show how you use the init capture.

// Init capture  
void f22()
{
  auto lambda1 = [level = getLevel()]{cout << level << endl;};
  lambda1();
}

The symbol level above is not a local symbol in the function. It's scope is only within the lambda. You don't have to declare its type like a regular symbol because the compiler will automatically deduce the type based on the initializer.

// Init capture gives new name to symbol  
void f23()
{
  int level = getLevel();
  auto lambda1 = [lev = level]{cout << lev << endl;};
  lambda1();
}

The example above shows how you can "rename" a captured local. The symbol on the left side of the assignment operator is local to the lambda. In contrast, the symbol on the right side of the assignment operator is local to the function f23. When renaming the captured local, you can capture it as a reference:

int level = getLevel();
auto lambda1 = [&lev = level]{cout << lev << endl;};

One of the main reasons for this new functionality is because some objects can't be copied. It's not very common, but there are some. Also, capturing by reference can be problematic when storing the closure for later use, as we've seen earlier. The solution is to not make a copy of the object, but to move the object into the capture.

Capturing by value with a non-copyable type produces an error:

void f24()
{
  std::unique_ptr<int> p = std::make_unique<int>(8);
  auto lambda1 = [p]{return *p;};
  cout << lambda1() << endl;
}

Error from g++:

first.cpp: In function 'void f24()':
first.cpp:661:17: error: use of deleted function 'std::unique_ptr<_Tp, _Dp>::unique_ptr(const std::unique_ptr<_Tp, _Dp>>) [with _Tp = int; _Dp = std::default_delete<int>]'
  auto lambda1 = [p]{return *p;};
                 ^
In file included from /usr/include/c++/5.1/memory:81:0,
                 from first.cpp:8:
/usr/include/c++/5.1/bits/unique_ptr.h:356:7: note: declared here
       unique_ptr(const unique_ptr&) = delete;
       ^

Error from Clang++:

first.cpp:661:18: error: call to deleted constructor of 'std::unique_ptr<int>'
        auto lambda1 = [p]{return *p;};
                        ^
/usr/include/c++/5.1/bits/unique_ptr.h:356:7: note: function has been explicitly marked deleted here
      unique_ptr(const unique_ptr&) = delete;
      ^
1 error generated.

So, when capturing non-copyable but moveable types, this is a common use:

void f25()
{
  std::unique_ptr<int> p = std::make_unique<int>(8);
  auto lambda1 = [myp = std::move(p)]{return *myp;};
  cout << lambda1() << endl;
}

The details regarding std::unique_ptr, std::make_unique, and std::move are well beyond the scope of this introduction to lambda expressions.

Finally, with generalized captures, we can capture objects that were illegal to capture in C++11. Specifically, we can capture globals. Note these various ways of capturing a global:

int a = 1; // global scope

void f9a()
{
    // a is implicitly captured by reference
  auto lambda1 = []{cout << a << endl;};
  a = 5; // change the global a
  lambda1();
}

Output:

However, if we use generalized capture, we can do this and capture by value:

int a = 1; // global scope

void f9b()
{
    // a is captured by value
  auto lambda1 = [a=a]{cout << a << endl;};
  a = 5; // change the global a, doesn't affect the capture
  lambda1();
}

Output:

Do we really need this? I'm not 100% sure, I'm just showing how it works. I believe the motivation for this was

to allow the programmer to capture by value.
to allow the programmer to rename the global during the capture:

int a = 1; // global scope

void f9c()
{
    // a is captured by value as g
  auto lambda1 = [g=a]{cout << g << endl;};
  a = 5; // change the global a, doesn't affect the capture
  lambda1();
}

Output:

Capture by reference:

int a = 1; // global scope

void f9a()
{
    // a is captured by reference as g
  auto lambda1 = [&g=a]{cout << g << endl;};
  a = 5; // change the global a, now affects the capture
  lambda1();
}

Output:

Note that capturing with either these

[=]  [&]

does not affect the way globals are captured.

Function Pointers and Miscellany

If you have a lambda expression that does not capture anything, you can assign it directly to a function pointer as such:

void f27()
{
  double (*pf)(double) = [](double d){return sqrt(d);};

    // Prints 1.41421
  cout << pf(2.0) << endl;

  double (*pfuns[])(double) = {
                                [](double d){return sin(d);},
                                [](double d){return cos(d);},
                                [](double d){return tan(d);}
                              };

    // Prints 0.382673  0.923884  0.4142
  for (auto f : pfuns)
    cout << f(3.1415 / 8) << "  ";
  cout << endl;
}

This can be convenient, especially when interfacing with C code that expects maybe a function pointer as a callback function.

Mutable lambdas

We know that when capturing by value, a copy is made and that copy cannot be modified. What if you need to modify it? Since the overloaded functional call operator is marked const, there is no const to cast away. We do exactly what we would do with any member variable: mark it as mutable.

We don't exactly mark the variable as mutable, we mark the lambda as mutable. Notice the empty parentheses after the lambda introducer. These are necessary when providing the mutable keyword, even though no parameters are expected.

void f31()
{
  int a = 5;

  cout << a << endl;
  auto lambda1 = [a]() mutable {a++; cout << a << endl;};
  lambda1();
}

Output:

5
6

Without the mutable keyword, we would see an error along these lines:

first.cpp:746:26: error: cannot assign to a variable captured by copy in a non-mutable lambda
        auto lambda1 = [a]()  {a++; cout << a << endl;};
                               ~^
1 error generated.

Nested lambda expressions

It is possible to define a lambda within a lambda:

void f26()
{
  int i = 1, j = 2;
  auto lambda1 = [i1 = i, j1 = j]{
    int i2 = 3, j2 = 4;
    auto lambda2 = [=]{
      cout << i1 << ", " << j1 << endl;
      cout << i2 << ", " << j2 << endl;
    };
    lambda2(); 
  };
  lambda1();
}

Output:

1, 2
3, 4

Recursive lambdas

Here's a simple, classic recursive function that calculates Fibonacci numbers. (Yes, there are many ways to write this function. I'm just doing a simple, straight-forward implementation.)

int fib(int value)
{
  if (value == 0 || value == 1)
    return value;
  else
    return fib(value - 1) + fib(value - 2);
}

Call it like this:

  // Prints 55
cout << fib(10) << endl;

As a lambda expression:

void f29()
{
  std::function<int(int)> fib = [&fib](int value) {
  if (value == 0 || value == 1)
    return value;
  else
    return fib(value - 1) + fib(value - 2);
  };

    // Prints 55
  cout << fib(10) << endl;
}

Notes:

You can't use auto with the recursive version because you need to know the type within the lambda expression.
Since we're refering to the name, you must capture the name of the closure that holds the lambda. It must be captured by reference.
Some of these things would be better as functions, not lambda expressions.
Is this the only way to create recursive lambdas? No, it's just an example.
Like many things in C++, "Just because you can do something, doesn't mean that you should do it!". Examples abound.
I'm sure that "clever" programmers will find many ways to use (and abuse) lambda expressions that were never dreamed of by the designers.
The last two examples above are for the "Can you do X with lambdas?"-people and the "What if you do Y with lambdas?"-people. There are probably many other odd tricks you can do and if you're really interested in finding out, just start writing code and find out for yourself!

Lambdas and constexpr

As of C++17, lambdas are now implicitly constexpr. I've talked a little of that here. The full details can be found in this document: Generalized Constant Expressions.

Before C++17, this was an error:

void f35()
{
  auto lam = [](int value) { return value * value;};

  constexpr int result = lam(10); // Error: the lambda isn't constexpr
  cout << result << endl;
}

Error message:

first.cpp:844:17: error: constexpr variable 'result' must be initialized by a constant expression
  constexpr int result = lam(10);
                ^        ~~~~~~~

That's because this is how (not exactly, of course) the C++11/14 compiler generated the function:

int operator()(int value) const 
{
  return value * value;
}

The function is not constexpr, so can't be used to initialize result. The C++17 compiler will now generate this function:

constexpr int operator()(int value) const 
{
  return value * value;
}

The compiler will only mark the generated function with constexpr if it can do so. If you use constructs that can't be used with constexpr, then the function won't be marked as such. The code below is illegal because the lambda is no longer constexpr because of the use of cin:

void f36()
{
  auto lam = [](auto value)
  { 
    int a = 0;
    cin >> a;
    return value * value * a;
  };

  constexpr int result = lam(10); // Error: the lambda isn't constexpr
  cout << result << endl;
}

Error message:

first.cpp:857:17: error: constexpr variable 'result' must be initialized by a constant expression
  constexpr int result = lam(10);
                ^        ~~~~~~~
first.cpp:857:26: note: non-constexpr function 'operator()' cannot be used in a constant expression
  constexpr int result = lam(10);
                         ^

With C++17, you can now explicitly mark lambdas as constexpr and have the compiler enforce the use of it:

auto lam = [](int value) constexpr
{ 
  int a = 0;
  cin >> a;
  return value * value * a;
};

Now, we'll get these compiler errors when the lambda is defined, not when it is called:

first.cpp:850:14: error: constexpr function never produces a constant expression [-Winvalid-constexpr]
  auto lam = [](int value) constexpr
             ^
first.cpp:853:9: note: non-constexpr function 'operator>>' cannot be used in a constant expression
    cin >> a;
        ^

This helps the programmer detect lambdas that can't be used as a constexpr. It's always better to have the compiler detect errors sooner (lambda definition) than later (calling the lambda).

Summary

This has been a whirlwind introduction to lambda expressions in C++, so let's recap the interesting points:

With the advent of C++14, there are very few things that functions can do that lambdas can't.
Lambdas automatically capture static, global, and namespace symbols.
Local non-static symbols must be specifically captured.
You can capture by value (copy) or by reference.
Default capture modes allow you to capture all local symbols easily, either by value or by reference.
Closures can be assigned and stored for later use. (Use auto or std::function)

And, finally, lambdas are really meant to be short and used within a limited scope. Don't try to replace your "normal" functions with lambdas.

References

The examples above covered the most-used features of lambda expressions. For all of the gory details about them and more, refer to the following links:

C++11/14 support:

Clang CC++11/14/1z Support in Clang
GCC C++0x/C++11 Support in GCC ( C++1y/C++14 support)