SL.con
SL.con: Containers
???
Container rule summary:
- SL.con.1: Prefer using STL
arrayorvectorinstead of a C array - SL.con.2: Prefer using STL
vectorby default unless you have a reason to use a different container - SL.con.3: Avoid bounds errors
- SL.con.4: don't use
memsetormemcpyfor arguments that are not trivially-copyable
SL.con.1: Prefer using STL array or vector instead of a C array
Reason
C arrays are less safe, and have no advantages over array and vector.
For a fixed-length array, use std::array, which does not degenerate to a pointer when passed to a function and does know its size.
Also, like a built-in array, a stack-allocated std::array keeps its elements on the stack.
For a variable-length array, use std::vector, which additionally can change its size and handles memory allocation.
Example
int v[SIZE]; // BAD
std::array<int, SIZE> w; // ok
Example
int* v = new int[initial_size]; // BAD, owning raw pointer
delete[] v; // BAD, manual delete
std::vector<int> w(initial_size); // ok
Note
Use gsl::span for non-owning references into a container.
Note
Comparing the performance of a fixed-sized array allocated on the stack against a vector with its elements on the free store is bogus.
You could just as well compare a std::array on the stack against the result of a malloc() accessed through a pointer.
For most code, even the difference between stack allocation and free-store allocation doesn't matter, but the convenience and safety of vector does.
People working with code for which that difference matters are quite capable of choosing between array and vector.
Enforcement
- Flag declaration of a C array inside a function or class that also declares an STL container (to avoid excessive noisy warnings on legacy non-STL code). To fix: At least change the C array to a
std::array.
SL.con.2: Prefer using STL vector by default unless you have a reason to use a different container
Reason
vector and array are the only standard containers that offer the following advantages:
- the fastest general-purpose access (random access, including being vectorization-friendly);
- the fastest default access pattern (begin-to-end or end-to-begin is prefetcher-friendly);
- the lowest space overhead (contiguous layout has zero per-element overhead, which is cache-friendly).
Usually you need to add and remove elements from the container, so use vector by default; if you don't need to modify the container's size, use array.
Even when other containers seem more suited, such as map for O(log N) lookup performance or a list for efficient insertion in the middle, a vector will usually still perform better for containers up to a few KB in size.
Note
string should not be used as a container of individual characters. A string is a textual string; if you want a container of characters, use vector<!--*char_type*/--> or array<!--*char_type*/--> instead.
Exceptions
If you have a good reason to use another container, use that instead. For example:
-
If
vectorsuits your needs but you don't need the container to be variable size, usearrayinstead. -
If you want a dictionary-style lookup container that guarantees O(K) or O(log N) lookups, the container will be larger (more than a few KB) and you perform frequent inserts so that the overhead of maintaining a sorted
vectoris infeasible, go ahead and use anunordered_mapormapinstead.
Note
To initialize a vector with a number of elements, use ()-initialization.
To initialize a vector with a list of elements, use {}-initialization.
vector<int> v1(20); // v1 has 20 elements with the value 0 (vector<int>{})
vector<int> v2 {20}; // v2 has 1 element with the value 20
Prefer the {}-initializer syntax.
Enforcement
- Flag a
vectorwhose size never changes after construction (such as because it'sconstor because no non-constfunctions are called on it). To fix: Use anarrayinstead.
SL.con.3: Avoid bounds errors
Reason
Read or write beyond an allocated range of elements typically leads to bad errors, wrong results, crashes, and security violations.
Note
The standard-library functions that apply to ranges of elements all have (or could have) bounds-safe overloads that take span.
Standard types such as vector can be modified to perform bounds-checks under the bounds profile (in a compatible way, such as by adding contracts), or used with at().
Ideally, the in-bounds guarantee should be statically enforced. For example:
- a range-
forcannot loop beyond the range of the container to which it is applied - a
v.begin(),v.end()is easily determined to be bounds safe
Such loops are as fast as any unchecked/unsafe equivalent.
Often a simple pre-check can eliminate the need for checking of individual indices. For example
- for
v.begin(),v.begin()+itheican easily be checked againstv.size()
Such loops can be much faster than individually checked element accesses.
Example, bad
void f()
{
array<int, 10> a, b;
memset(a.data(), 0, 10); // BAD, and contains a length error (length = 10 * sizeof(int))
memcmp(a.data(), b.data(), 10); // BAD, and contains a length error (length = 10 * sizeof(int))
}
Also, std::array<>::fill() or std::fill() or even an empty initializer are better candidates than memset().
Example, good
void f()
{
array<int, 10> a, b, c{}; // c is initialized to zero
a.fill(0);
fill(b.begin(), b.end(), 0); // std::fill()
fill(b, 0); // std::ranges::fill()
if ( a == b ) {
// ...
}
}
Example
If code is using an unmodified standard library, then there are still workarounds that enable use of std::array and std::vector in a bounds-safe manner. Code can call the .at() member function on each class, which will result in an std::out_of_range exception being thrown. Alternatively, code can call the at() free function, which will result in fail-fast (or a customized action) on a bounds violation.
void f(std::vector<int>& v, std::array<int, 12> a, int i)
{
v[0] = a[0]; // BAD
v.at(0) = a[0]; // OK (alternative 1)
at(v, 0) = a[0]; // OK (alternative 2)
v.at(0) = a[i]; // BAD
v.at(0) = a.at(i); // OK (alternative 1)
v.at(0) = at(a, i); // OK (alternative 2)
}
Enforcement
- Issue a diagnostic for any call to a standard-library function that is not bounds-checked.
Insert
This rule is part of the bounds profile.
SL.con.4: don't use memset or memcpy for arguments that are not trivially-copyable
Reason
Doing so messes the semantics of the objects (e.g., by overwriting a vptr).
Note
Similarly for (w)memset, (w)memcpy, (w)memmove, and (w)memcmp
Example
struct base {
virtual void update() = 0;
};
struct derived : public base {
void update() override {}
};
void f(derived& a, derived& b) // goodbye v-tables
{
memset(&a, 0, sizeof(derived));
memcpy(&a, &b, sizeof(derived));
memcmp(&a, &b, sizeof(derived));
}
Instead, define proper default initialization, copy, and comparison functions
void g(derived& a, derived& b)
{
a = {}; // default initialize
b = a; // copy
if (a == b) do_something(a, b);
}
Enforcement
- Flag the use of those functions for types that are not trivially copyable
TODO Notes:
- Impact on the standard library will require close coordination with WG21, if only to ensure compatibility even if never standardized.
- We are considering specifying bounds-safe overloads for stdlib (especially C stdlib) functions like
memcmpand shipping them in the GSL. - For existing stdlib functions and types like
vectorthat are not fully bounds-checked, the goal is for these features to be bounds-checked when called from code with the bounds profile on, and unchecked when called from legacy code, possibly using contracts (concurrently being proposed by several WG21 members).