SL.con
SL.con: Containers
???
Container rule summary:
- SL.con.1: Prefer using STL
array
orvector
instead of a C array - SL.con.2: Prefer using STL
vector
by default unless you have a reason to use a different container - SL.con.3: Avoid bounds errors
- SL.con.4: don't use
memset
ormemcpy
for arguments that are not trivially-copyable
SL.con.1: Prefer using STL array
or vector
instead of a C array
Reason
C arrays are less safe, and have no advantages over array
and vector
.
For a fixed-length array, use std::array
, which does not degenerate to a pointer when passed to a function and does know its size.
Also, like a built-in array, a stack-allocated std::array
keeps its elements on the stack.
For a variable-length array, use std::vector
, which additionally can change its size and handles memory allocation.
Example
int v[SIZE]; // BAD
std::array<int, SIZE> w; // ok
Example
int* v = new int[initial_size]; // BAD, owning raw pointer
delete[] v; // BAD, manual delete
std::vector<int> w(initial_size); // ok
Note
Use gsl::span
for non-owning references into a container.
Note
Comparing the performance of a fixed-sized array allocated on the stack against a vector
with its elements on the free store is bogus.
You could just as well compare a std::array
on the stack against the result of a malloc()
accessed through a pointer.
For most code, even the difference between stack allocation and free-store allocation doesn't matter, but the convenience and safety of vector
does.
People working with code for which that difference matters are quite capable of choosing between array
and vector
.
Enforcement
- Flag declaration of a C array inside a function or class that also declares an STL container (to avoid excessive noisy warnings on legacy non-STL code). To fix: At least change the C array to a
std::array
.
SL.con.2: Prefer using STL vector
by default unless you have a reason to use a different container
Reason
vector
and array
are the only standard containers that offer the following advantages:
- the fastest general-purpose access (random access, including being vectorization-friendly);
- the fastest default access pattern (begin-to-end or end-to-begin is prefetcher-friendly);
- the lowest space overhead (contiguous layout has zero per-element overhead, which is cache-friendly).
Usually you need to add and remove elements from the container, so use vector
by default; if you don't need to modify the container's size, use array
.
Even when other containers seem more suited, such as map
for O(log N) lookup performance or a list
for efficient insertion in the middle, a vector
will usually still perform better for containers up to a few KB in size.
Note
string
should not be used as a container of individual characters. A string
is a textual string; if you want a container of characters, use vector<!--*char_type*/-->
or array<!--*char_type*/-->
instead.
Exceptions
If you have a good reason to use another container, use that instead. For example:
-
If
vector
suits your needs but you don't need the container to be variable size, usearray
instead. -
If you want a dictionary-style lookup container that guarantees O(K) or O(log N) lookups, the container will be larger (more than a few KB) and you perform frequent inserts so that the overhead of maintaining a sorted
vector
is infeasible, go ahead and use anunordered_map
ormap
instead.
Note
To initialize a vector with a number of elements, use ()
-initialization.
To initialize a vector with a list of elements, use {}
-initialization.
vector<int> v1(20); // v1 has 20 elements with the value 0 (vector<int>{})
vector<int> v2 {20}; // v2 has 1 element with the value 20
Prefer the {}-initializer syntax.
Enforcement
- Flag a
vector
whose size never changes after construction (such as because it'sconst
or because no non-const
functions are called on it). To fix: Use anarray
instead.
SL.con.3: Avoid bounds errors
Reason
Read or write beyond an allocated range of elements typically leads to bad errors, wrong results, crashes, and security violations.
Note
The standard-library functions that apply to ranges of elements all have (or could have) bounds-safe overloads that take span
.
Standard types such as vector
can be modified to perform bounds-checks under the bounds profile (in a compatible way, such as by adding contracts), or used with at()
.
Ideally, the in-bounds guarantee should be statically enforced. For example:
- a range-
for
cannot loop beyond the range of the container to which it is applied - a
v.begin(),v.end()
is easily determined to be bounds safe
Such loops are as fast as any unchecked/unsafe equivalent.
Often a simple pre-check can eliminate the need for checking of individual indices. For example
- for
v.begin(),v.begin()+i
thei
can easily be checked againstv.size()
Such loops can be much faster than individually checked element accesses.
Example, bad
void f()
{
array<int, 10> a, b;
memset(a.data(), 0, 10); // BAD, and contains a length error (length = 10 * sizeof(int))
memcmp(a.data(), b.data(), 10); // BAD, and contains a length error (length = 10 * sizeof(int))
}
Also, std::array<>::fill()
or std::fill()
or even an empty initializer are better candidates than memset()
.
Example, good
void f()
{
array<int, 10> a, b, c{}; // c is initialized to zero
a.fill(0);
fill(b.begin(), b.end(), 0); // std::fill()
fill(b, 0); // std::ranges::fill()
if ( a == b ) {
// ...
}
}
Example
If code is using an unmodified standard library, then there are still workarounds that enable use of std::array
and std::vector
in a bounds-safe manner. Code can call the .at()
member function on each class, which will result in an std::out_of_range
exception being thrown. Alternatively, code can call the at()
free function, which will result in fail-fast (or a customized action) on a bounds violation.
void f(std::vector<int>& v, std::array<int, 12> a, int i)
{
v[0] = a[0]; // BAD
v.at(0) = a[0]; // OK (alternative 1)
at(v, 0) = a[0]; // OK (alternative 2)
v.at(0) = a[i]; // BAD
v.at(0) = a.at(i); // OK (alternative 1)
v.at(0) = at(a, i); // OK (alternative 2)
}
Enforcement
- Issue a diagnostic for any call to a standard-library function that is not bounds-checked.
Insert
This rule is part of the bounds profile.
SL.con.4: don't use memset
or memcpy
for arguments that are not trivially-copyable
Reason
Doing so messes the semantics of the objects (e.g., by overwriting a vptr
).
Note
Similarly for (w)memset, (w)memcpy, (w)memmove, and (w)memcmp
Example
struct base {
virtual void update() = 0;
};
struct derived : public base {
void update() override {}
};
void f(derived& a, derived& b) // goodbye v-tables
{
memset(&a, 0, sizeof(derived));
memcpy(&a, &b, sizeof(derived));
memcmp(&a, &b, sizeof(derived));
}
Instead, define proper default initialization, copy, and comparison functions
void g(derived& a, derived& b)
{
a = {}; // default initialize
b = a; // copy
if (a == b) do_something(a, b);
}
Enforcement
- Flag the use of those functions for types that are not trivially copyable
TODO Notes:
- Impact on the standard library will require close coordination with WG21, if only to ensure compatibility even if never standardized.
- We are considering specifying bounds-safe overloads for stdlib (especially C stdlib) functions like
memcmp
and shipping them in the GSL. - For existing stdlib functions and types like
vector
that are not fully bounds-checked, the goal is for these features to be bounds-checked when called from code with the bounds profile on, and unchecked when called from legacy code, possibly using contracts (concurrently being proposed by several WG21 members).