February 14, 2019.

Hypothetical C++: extensible tagging

Data in a C++ program flows from a source, gets modified multiple times and ends up as something entirely different. During processing, the data may change type, reflecting its changing nature. As programmers, we design a type system that reflects the semantic meaning of the data, its structure and its current state. Ideally, the C++ type system should give us all the vocabulary we need.

C++ has access to a rich set of tools to design types and create this vocabulary. Its type system is open-ended and allows programmers to define new types as needed. Yet there is one aspect in which it is closed: type qualifiers.

C++ provides two type qualifiers: const and volatile(1). They allow us to give additional meaning to an existing type without having to create a separate type manually. The const qualifier is the most used, of course. If it didn’t exist, having to reproduce its behaviour for every type we create would be tedious. Without it, C++ would be a lot less expressive and harder to reason with. And yet, this expressive power is limited to these two qualifiers. There is a disconnect between their usefulness and our inability to create new ones.

User-Defined Qualifiers

What I’d like to see in C++ is the opening of qualifiers to user-defined ones. I’ll give you a few examples of what can be achieved with user-defined qualifiers, but let’s first describe how they would work. Quite simply, they would work just like const and volatile. That is:

• Qualify a type or a member function.
• An optional automatic one-way equivalence could be declared, like how you can pass a non-const pointer to a function taking a const pointer.
• Forced conversion between the qualified and non-qualified type using a const_cast(2).

I do not wish to focus on the details of a hypothetical syntax. I think this detail is unimportant and everyone could come up with something. I’ll just show one possible choice using a new typequal keyword:

typequal NewQualifier;
typequal NewQualifier auto qualified;
typequal NewQualifier auto non-qualified;
typequal NewQualifier invalid;
typequal NewQualifier invalid auto qualified;

Each of these examples would create a qualifier named NewQualifier. The “auto qualified” variant means that the qualifier can be added silently to a type when assigned to a variable in the same way that the const qualifier works. The “auto non-qualified” variant allows the automatic conversion in the opposite direction. What about the “invalid” variant? This declares that data with the qualifier cannot be accessed. As you will see, this is a useful feature.

Payoffs

Now, I want to convince you of the usefulness of this feature. Let’s try to solve multiple problems that cause real bugs in actual programs.

1. Null Pointers

Let’s start with null pointers. Dereferencing null pointers is a major source of bugs. Having nullable pointers is often decried as a major design blunder in the language. But the real problem is not the null pointer. It’s the fact that the language does not prevent us from using a null pointer. Let’s fix that:

typequal maybe invalid;

That’s it. Now, every function that produces a pointer should produce a pointer with the underlying type adorned with the maybe qualifier. Given that it is declared as an invalid-marking qualifier, the language will not allow us to use the data. Once you have tested for null, you can const_cast it to remove the maybe qualifier. Of course, it would be even more practical if the language supported such a qualifier natively, so that built-in functions would take and provide pre-qualified pointers.

2. Invalid Data

Similar qualifiers can be used to describe different states of data. Two examples that often come up in code would be:

typequal invalidated invalid auto qualified;
typequal tainted invalid auto qualified;

The first could be used to mark data when it has not yet been validated against a desired constraint. Often, a given group of functions will impose such constraints on its input. By having the entry-point function take invalidated data and internal functions taking unqualified data, we can insure that internal functions cannot be called without the data being validated.

The second one is an idea borrowed from Perl: that tainted data cannot be trusted. It is similar to invalidated, but instead of merely not conforming to some constraint, it is to be entirely treated with suspicion. In Perl, such tainted data comes from web data, email data and other such untrusted sources. Additional precautions should be taken when validating the data.

3. State of Data

Of course, such a system of validation can be extended to support multiple states to reflect the progression of an algorithm. Or it can reflect different types of validation. Here are a few ideas:

// The data has been sorted and can be binary-searched.
typequal sorted;

// Sort a vector and return the same vector with the qualifier.
sorted vector<int>& sort(vector<int>& unsorted_vector);

// Do a binary-search in a vector, but only if it has already been sorted.
bool find_in_data(const sorted vector<int>& sorted, int value);

// The data is shared between threads.
// You would create an instance of Lock that would
// take the shared data and mutex as arguments and
// do the const_cast to remove the shared qualifier.
typequal shared invalid;

// Which coordinate system is used in a 3D algorithm.
// Avoids error of using a local point in an algorithm
// working in world coordinates.
typequal local;
typequal world;
typequal view;
typequal screen;

// Applying the corresponding matrix would return the
// vector with the qualifier correctly updated.
world vector& apply_world_matrix(
local vector&, const local world matrix&);

Conclusion

My goal was to show you the advantages of adding user-defined qualifiers to the C++ language. As demonstrated, user-defined qualifiers open a new world of possibilities with various benefits:

• Being more expressive with existing data types.
• Clearly representing the state of the data to the code reader.
• Following the progression of changes made to data.
• Allowing the compiler to enforce various constraints.
• Avoiding bugs that arise from incorrect data passing to functions.

(1) There are also the restricted, register and auto type modifiers, but they do not exactly behave like const and volatile.

(2) Although it would be more elegant to rename it to qualifier_cast.