CP.con
CP.con: Concurrency
This section focuses on relatively ad-hoc uses of multiple threads communicating through shared data.
- For parallel algorithms, see parallelism
- For inter-task communication without explicit sharing, see messaging
- For vector parallel code, see vectorization
- For lock-free programming, see lock free
Concurrency rule summary:
- CP.20: Use RAII, never plain
lock()
/unlock()
- CP.21: Use
std::lock()
orstd::scoped_lock
to acquire multiplemutex
es - CP.22: Never call unknown code while holding a lock (e.g., a callback)
- CP.23: Think of a joining
thread
as a scoped container - CP.24: Think of a
thread
as a global container - CP.25: Prefer
gsl::joining_thread
overstd::thread
- CP.26: Don't
detach()
a thread - CP.31: Pass small amounts of data between threads by value, rather than by reference or pointer
- CP.32: To share ownership between unrelated
thread
s useshared_ptr
- CP.40: Minimize context switching
- CP.41: Minimize thread creation and destruction
- CP.42: Don't
wait
without a condition - CP.43: Minimize time spent in a critical section
- CP.44: Remember to name your
lock_guard
s andunique_lock
s - CP.50: Define a
mutex
together with the data it guards. Usesynchronized_value<T>
where possible -
When
- ??? when to use
try_lock()
- ??? when to prefer
lock_guard
overunique_lock
-
Time
- ??? when/how to use
new thread
CP.20: Use RAII, never plain lock()
/unlock()
Reason
Avoids nasty errors from unreleased locks.
Example, bad
mutex mtx;
void do_stuff()
{
mtx.lock();
// ... do stuff ...
mtx.unlock();
}
Sooner or later, someone will forget the mtx.unlock()
, place a return
in the ... do stuff ...
, throw an exception, or something.
mutex mtx;
void do_stuff()
{
unique_lock<mutex> lck {mtx};
// ... do stuff ...
}
Enforcement
Flag calls of member lock()
and unlock()
. ???
CP.21: Use std::lock()
or std::scoped_lock
to acquire multiple mutex
es
Reason
To avoid deadlocks on multiple mutex
es.
Example
This is asking for deadlock:
// thread 1
lock_guard<mutex> lck1(m1);
lock_guard<mutex> lck2(m2);
// thread 2
lock_guard<mutex> lck2(m2);
lock_guard<mutex> lck1(m1);
Instead, use lock()
:
// thread 1
lock(m1, m2);
lock_guard<mutex> lck1(m1, adopt_lock);
lock_guard<mutex> lck2(m2, adopt_lock);
// thread 2
lock(m2, m1);
lock_guard<mutex> lck2(m2, adopt_lock);
lock_guard<mutex> lck1(m1, adopt_lock);
or (better, but C++17 only):
// thread 1
scoped_lock<mutex, mutex> lck1(m1, m2);
// thread 2
scoped_lock<mutex, mutex> lck2(m2, m1);
Here, the writers of thread1
and thread2
are still not agreeing on the order of the mutex
es, but order no longer matters.
Note
In real code, mutex
es are rarely named to conveniently remind the programmer of an intended relation and intended order of acquisition.
In real code, mutex
es are not always conveniently acquired on consecutive lines.
Note
In C++17 it's possible to write plain
lock_guard lck1(m1, adopt_lock);
and have the mutex
type deduced.
Enforcement
Detect the acquisition of multiple mutex
es.
This is undecidable in general, but catching common simple examples (like the one above) is easy.
CP.22: Never call unknown code while holding a lock (e.g., a callback)
Reason
If you don't know what a piece of code does, you are risking deadlock.
Example
void do_this(Foo* p)
{
lock_guard<mutex> lck {my_mutex};
// ... do something ...
p->act(my_data);
// ...
}
If you don't know what Foo::act
does (maybe it is a virtual function invoking a derived class member of a class not yet written),
it might call do_this
(recursively) and cause a deadlock on my_mutex
.
Maybe it will lock on a different mutex and not return in a reasonable time, causing delays to any code calling do_this
.
Example
A common example of the "calling unknown code" problem is a call to a function that tries to gain locked access to the same object.
Such problem can often be solved by using a recursive_mutex
. For example:
recursive_mutex my_mutex;
template<typename Action>
void do_something(Action f)
{
unique_lock<recursive_mutex> lck {my_mutex};
// ... do something ...
f(this); // f will do something to *this
// ...
}
If, as it is likely, f()
invokes operations on *this
, we must make sure that the object's invariant holds before the call.
Enforcement
- Flag calling a virtual function with a non-recursive
mutex
held - Flag calling a callback with a non-recursive
mutex
held
CP.23: Think of a joining thread
as a scoped container
Reason
To maintain pointer safety and avoid leaks, we need to consider what pointers are used by a thread
.
If a thread
joins, we can safely pass pointers to objects in the scope of the thread
and its enclosing scopes.
Example
void f(int* p)
{
// ...
*p = 99;
// ...
}
int glob = 33;
void some_fct(int* p)
{
int x = 77;
joining_thread t0(f, &x); // OK
joining_thread t1(f, p); // OK
joining_thread t2(f, &glob); // OK
auto q = make_unique<int>(99);
joining_thread t3(f, q.get()); // OK
// ...
}
A gsl::joining_thread
is a std::thread
with a destructor that joins and that cannot be detached()
.
By "OK" we mean that the object will be in scope ("live") for as long as a thread
can use the pointer to it.
The fact that thread
s run concurrently doesn't affect the lifetime or ownership issues here;
these thread
s can be seen as just a function object called from some_fct
.
Enforcement
Ensure that joining_thread
s don't detach()
.
After that, the usual lifetime and ownership (for local objects) enforcement applies.
CP.24: Think of a thread
as a global container
Reason
To maintain pointer safety and avoid leaks, we need to consider what pointers are used by a thread
.
If a thread
is detached, we can safely pass pointers to static and free store objects (only).
Example
void f(int* p)
{
// ...
*p = 99;
// ...
}
int glob = 33;
void some_fct(int* p)
{
int x = 77;
std::thread t0(f, &x); // bad
std::thread t1(f, p); // bad
std::thread t2(f, &glob); // OK
auto q = make_unique<int>(99);
std::thread t3(f, q.get()); // bad
// ...
t0.detach();
t1.detach();
t2.detach();
t3.detach();
// ...
}
By "OK" we mean that the object will be in scope ("live") for as long as a thread
can use the pointers to it.
By "bad" we mean that a thread
might use a pointer after the pointed-to object is destroyed.
The fact that thread
s run concurrently doesn't affect the lifetime or ownership issues here;
these thread
s can be seen as just a function object called from some_fct
.
Note
Even objects with static storage duration can be problematic if used from detached threads: if the thread continues until the end of the program, it might be running concurrently with the destruction of objects with static storage duration, and thus accesses to such objects might race.
Note
This rule is redundant if you don't detach()
and use gsl::joining_thread
.
However, converting code to follow those guidelines could be difficult and even impossible for third-party libraries.
In such cases, the rule becomes essential for lifetime safety and type safety.
In general, it is undecidable whether a detach()
is executed for a thread
, but simple common cases are easily detected.
If we cannot prove that a thread
does not detach()
, we must assume that it does and that it outlives the scope in which it was constructed;
After that, the usual lifetime and ownership (for global objects) enforcement applies.
Enforcement
Flag attempts to pass local variables to a thread that might detach()
.
CP.25: Prefer gsl::joining_thread
over std::thread
Reason
A joining_thread
is a thread that joins at the end of its scope.
Detached threads are hard to monitor.
It is harder to ensure absence of errors in detached threads (and potentially detached threads).
Example, bad
void f() { std::cout << "Hello "; }
struct F {
void operator()() const { std::cout << "parallel world "; }
};
int main()
{
std::thread t1{f}; // f() executes in separate thread
std::thread t2{F()}; // F()() executes in separate thread
} // spot the bugs
Example
void f() { std::cout << "Hello "; }
struct F {
void operator()() const { std::cout << "parallel world "; }
};
int main()
{
std::thread t1{f}; // f() executes in separate thread
std::thread t2{F()}; // F()() executes in separate thread
t1.join();
t2.join();
} // one bad bug left
Note
Make "immortal threads" globals, put them in an enclosing scope, or put them on the free store rather than detach()
.
Don't detach
.
Note
Because of old code and third party libraries using std::thread
, this rule can be hard to introduce.
Enforcement
Flag uses of std::thread
:
- Suggest use of
gsl::joining_thread
or C++20std::jthread
. - Suggest "exporting ownership" to an enclosing scope if it detaches.
- Warn if it is not obvious whether a thread joins or detaches.
CP.26: Don't detach()
a thread
Reason
Often, the need to outlive the scope of its creation is inherent in the thread
s task,
but implementing that idea by detach
makes it harder to monitor and communicate with the detached thread.
In particular, it is harder (though not impossible) to ensure that the thread completed as expected or lives for as long as expected.
Example
void heartbeat();
void use()
{
std::thread t(heartbeat); // don't join; heartbeat is meant to run forever
t.detach();
// ...
}
This is a reasonable use of a thread, for which detach()
is commonly used.
There are problems, though.
How do we monitor the detached thread to see if it is alive?
Something might go wrong with the heartbeat, and losing a heartbeat can be very serious in a system for which it is needed.
So, we need to communicate with the heartbeat thread
(e.g., through a stream of messages or notification events using a condition_variable
).
An alternative, and usually superior solution is to control its lifetime by placing it in a scope outside its point of creation (or activation). For example:
void heartbeat();
gsl::joining_thread t(heartbeat); // heartbeat is meant to run "forever"
This heartbeat will (barring error, hardware problems, etc.) run for as long as the program does.
Sometimes, we need to separate the point of creation from the point of ownership:
void heartbeat();
unique_ptr<gsl::joining_thread> tick_tock {nullptr};
void use()
{
// heartbeat is meant to run as long as tick_tock lives
tick_tock = make_unique<gsl::joining_thread>(heartbeat);
// ...
}
Enforcement
Flag detach()
.
CP.31: Pass small amounts of data between threads by value, rather than by reference or pointer
Reason
A small amount of data is cheaper to copy and access than to share it using some locking mechanism. Copying naturally gives unique ownership (simplifies code) and eliminates the possibility of data races.
Note
Defining "small amount" precisely is impossible.
Example
string modify1(string);
void modify2(string&);
void fct(string& s)
{
auto res = async(modify1, s);
async(modify2, s);
}
The call of modify1
involves copying two string
values; the call of modify2
does not.
On the other hand, the implementation of modify1
is exactly as we would have written it for single-threaded code,
whereas the implementation of modify2
will need some form of locking to avoid data races.
If the string is short (say 10 characters), the call of modify1
can be surprisingly fast;
essentially all the cost is in the thread
switch. If the string is long (say 1,000,000 characters), copying it twice
is probably not a good idea.
Note that this argument has nothing to do with async
as such. It applies equally to considerations about whether to use
message passing or shared memory.
Enforcement
???
CP.32: To share ownership between unrelated thread
s use shared_ptr
Reason
If threads are unrelated (that is, not known to be in the same scope or one within the lifetime of the other)
and they need to share free store memory that needs to be deleted, a shared_ptr
(or equivalent) is the only
safe way to ensure proper deletion.
Example
???
Note
- A static object (e.g. a global) can be shared because it is not owned in the sense that some thread is responsible for its deletion.
- An object on free store that is never to be deleted can be shared.
- An object owned by one thread can be safely shared with another as long as that second thread doesn't outlive the owner.
Enforcement
???
CP.40: Minimize context switching
Reason
Context switches are expensive.
Example
???
Enforcement
???
CP.41: Minimize thread creation and destruction
Reason
Thread creation is expensive.
Example
void worker(Message m)
{
// process
}
void dispatcher(istream& is)
{
for (Message m; is >> m; )
run_list.push_back(new thread(worker, m));
}
This spawns a thread
per message, and the run_list
is presumably managed to destroy those tasks once they are finished.
Instead, we could have a set of pre-created worker threads processing the messages
Sync_queue<Message> work;
void dispatcher(istream& is)
{
for (Message m; is >> m; )
work.put(m);
}
void worker()
{
for (Message m; m = work.get(); ) {
// process
}
}
void workers() // set up worker threads (specifically 4 worker threads)
{
joining_thread w1 {worker};
joining_thread w2 {worker};
joining_thread w3 {worker};
joining_thread w4 {worker};
}
Note
If your system has a good thread pool, use it. If your system has a good message queue, use it.
Enforcement
???
CP.42: Don't wait
without a condition
Reason
A wait
without a condition can miss a wakeup or wake up simply to find that there is no work to do.
Example, bad
std::condition_variable cv;
std::mutex mx;
void thread1()
{
while (true) {
// do some work ...
std::unique_lock<std::mutex> lock(mx);
cv.notify_one(); // wake other thread
}
}
void thread2()
{
while (true) {
std::unique_lock<std::mutex> lock(mx);
cv.wait(lock); // might block forever
// do work ...
}
}
Here, if some other thread
consumes thread1
's notification, thread2
can wait forever.
Example
template<typename T>
class Sync_queue {
public:
void put(const T& val);
void put(T&& val);
void get(T& val);
private:
mutex mtx;
condition_variable cond; // this controls access
list<T> q;
};
template<typename T>
void Sync_queue<T>::put(const T& val)
{
lock_guard<mutex> lck(mtx);
q.push_back(val);
cond.notify_one();
}
template<typename T>
void Sync_queue<T>::get(T& val)
{
unique_lock<mutex> lck(mtx);
cond.wait(lck, [this] { return !q.empty(); }); // prevent spurious wakeup
val = q.front();
q.pop_front();
}
Now if the queue is empty when a thread executing get()
wakes up (e.g., because another thread has gotten to get()
before it),
it will immediately go back to sleep, waiting.
Enforcement
Flag all wait
s without conditions.
CP.43: Minimize time spent in a critical section
Reason
The less time is spent with a mutex
taken, the less chance that another thread
has to wait,
and thread
suspension and resumption are expensive.
Example
void do_something() // bad
{
unique_lock<mutex> lck(my_lock);
do0(); // preparation: does not need lock
do1(); // transaction: needs locking
do2(); // cleanup: does not need locking
}
Here, we are holding the lock for longer than necessary: We should not have taken the lock before we needed it and should have released it again before starting the cleanup. We could rewrite this to
void do_something() // bad
{
do0(); // preparation: does not need lock
my_lock.lock();
do1(); // transaction: needs locking
my_lock.unlock();
do2(); // cleanup: does not need locking
}
But that compromises safety and violates the use RAII rule. Instead, add a block for the critical section:
void do_something() // OK
{
do0(); // preparation: does not need lock
{
unique_lock<mutex> lck(my_lock);
do1(); // transaction: needs locking
}
do2(); // cleanup: does not need locking
}
Enforcement
Impossible in general.
Flag "naked" lock()
and unlock()
.
CP.44: Remember to name your lock_guard
s and unique_lock
s
Reason
An unnamed local object is a temporary that immediately goes out of scope.
Example
// global mutexes
mutex m1;
mutex m2;
void f()
{
unique_lock<mutex>(m1); // (A)
lock_guard<mutex> {m2}; // (B)
// do work in critical section ...
}
This looks innocent enough, but it isn't. At (A), m1
is a default-constructed
local unique_lock
, which shadows the global ::m1
(and does not lock it).
At (B) an unnamed temporary lock_guard
is constructed and locks ::m2
,
but immediately goes out of scope and unlocks ::m2
again.
For the rest of the function f()
neither mutex is locked.
Enforcement
Flag all unnamed lock_guard
s and unique_lock
s.
CP.50: Define a mutex
together with the data it guards. Use synchronized_value<T>
where possible
Reason
It should be obvious to a reader that the data is to be guarded and how. This decreases the chance of the wrong mutex being locked, or the mutex not being locked.
Using a synchronized_value<T>
ensures that the data has a mutex, and the right mutex is locked when the data is accessed.
See the WG21 proposal to add synchronized_value
to a future TS or revision of the C++ standard.
Example
struct Record {
std::mutex m; // take this mutex before accessing other members
// ...
};
class MyClass {
struct DataRecord {
// ...
};
synchronized_value<DataRecord> data; // Protect the data with a mutex
};
Enforcement
??? Possible?