These articles are written by Codalogic empowerees as a way of sharing knowledge with the programming community. They do not necessarily reflect the opinions of Codalogic.

Proposal for Tagged References in C++

By: Pete, March 2019

I'm a big fan of C++ move semantics. They are a significant development that not only improves C++, but also moves computer programming language concepts forward. But a question raised recently on the ACCU General mailing list makes me wonder if they are still a "work in progress".

The question asked was, when is it best to use l-value refs, r-value refs or pass-by-value to pass non-trivial parameters to functions? Even amongst the experts of ACCU there didn't seem much consensus. And there seemed no resolution on how to pass potentially moveable types through a derived class constructor on to a base class constructor.

To me, if there is no clear consensus among ACCU members on the best way to do this, then it seems too complicated for the average C++ programmer like myself. This is a bad situation for C++. This led me to idly musing whether something better could be achieved.

A key observation is that passing parameters as l-value refs, r-value refs or values is essentially an optimisation problem, and conventional wisdom is to let the compiler do low level optimisation. Therefore, would it be possible for the compiler to handle this situation too?

The core of the problem is that a function (or class) wants to end up with its own copy of a value, such as a string or more complex data structure, passed in as a parameter. If the input parameter is an l-value ref, then the function has no choice but to do a copy operation. But if the parameter is an r-value ref, then the most efficient operation is typically a move.

Outside of templates, a programmer would have to implement two functions to cater for both these situations. Both of which are likely to have almost identical code. This is not desirable.

I could wish for a syntax that allowed a function to declare that "this parameter can be either a l-value ref or an r-value ref" and the compiler would be responsible for generating functions for both the l-value ref and an r-value ref variants. So, using a triple ampersand (&&&) to denote such a function parameter, given a function definition of:

    int func( Foo const &&& f ) {...}

the compiler would generate code for:

    int func( Foo const & f ) {...}
    int func( Foo && f ) {...}

where the ... would be identical C++ code in both cases.

This is certainly an option, but it doesn't readily cater for the more general case where a function needs to take a number of such parameters, such as:

    int func( Foo const &&& f, Bar const &&& b, Dee const &&& d ) {...}

A definition like this would require 8 functions to be auto written. Feasible, but not ideal.

An alternative in such as case is to move away from having a compile-time solution and adopt a run-time one instead. This would require the generated reference to include whether it is an l-value ref or an r-value ref. In C++ terms it might look something like:

template< class T >
struct tagged_ref {
    T & ref;
    enum { lvalue, rvalue } form;
}

But rather than be an STL type, it would be built into the compiler, and the compiler would be able to optimise its format and usage as necessary. For example, if found to be more efficient, the reference part could be passed to a function in a register and the form part on the stack. (Or even, for simple functions that don't have multiple parameters, the compiler could use the approach of auto-generating both l-value and r-value reference forms of the function, as previously mentioned.)

I'll call the type a "tagged reference".

Converting a tagged reference to a const l-value reference would be trivial. This could happen when using the tagged reference in an expression or passing it to a function parameter that has an l-value reference signature.

Assigning a tagged reference to another variable becomes more interesting. The generated code would have to look at the tagged reference form and decide whether a copy or move should be done. While the program code might look like:

    void func( Foo const &&& f ) {
        a = f;
    }

The code generated by the compiler might look more like:

    void func( Foo const &&& f ) {
        if( f.form == tagged_ref<Foo>::lvalue )
            a = f;
        else
            a = std::move( f );
    }

Note that once the tagged reference has been assigned to another variable, its value may have been destroyed by a move operation. To avoid problems, only the last assignment of a tagged reference should allow the option of a move. It needs to effectively go out of scope:

    void func( Foo const &&& f ) {
        a = f; // Enforce a copy
        b = f; // Last assignment of f – allow copy or move
    }

The same applies to passing it to another function as a parameter that is a tagged reference or r-value ref. (For this reason, inside a loop when assigning a tagged reference to another variable or passing it to a tagged reference function parameter, it must be treated as if it were an l-value form. Another little thing for a compiler to look out for!)

Other than that, compiler support for a tagged reference type looks like a reasonably simple feature to support. Programmers would still be able to use the separate forms if they needed upmost efficiency.

In conclusion, tagged reference types could significantly simplify the burden on programmers of making the most of move semantics in a consistent, correct and efficient way. The compiler would automagically by able to "do the right thing". Source code size would be reduced by de-duplication, easier to understand and gain all the other benefits of DRY code. All of which is in line with many of the other improvements made to C++ in recent years. Move semantics could then become "magic move semantics."

(In my next article: The "const?" keyword, wherein the compiler generates two versions of a function, one with the "const?" keyword replaced with "const" and the other with the empty string.)

Keywords