SL.str
SL.str: String
Text manipulation is a huge topic.
std::string doesn't cover all of it.
This section primarily tries to clarify std::string's relation to char*, zstring, string_view, and gsl::span<char>.
The important issue of non-ASCII character sets and encodings (e.g., wchar_t, Unicode, and UTF-8) will be covered elsewhere.
See also: regular expressions
Here, we use "sequence of characters" or "string" to refer to a sequence of characters meant to be read as text (somehow, eventually). We don't consider ???
String summary:
- SL.str.1: Use
std::stringto own character sequences - SL.str.2: Use
std::string_vieworgsl::span<char>to refer to character sequences - SL.str.3: Use
zstringorczstringto refer to a C-style, zero-terminated, sequence of characters - SL.str.4: Use
char*to refer to a single character -
SL.str.5: Use
std::byteto refer to byte values that do not necessarily represent characters -
SL.str.10: Use
std::stringwhen you need to perform locale-sensitive string operations - SL.str.11: Use
gsl::span<char>rather thanstd::string_viewwhen you need to mutate a string - SL.str.12: Use the
ssuffix for string literals meant to be standard-librarystrings
See also:
SL.str.1: Use std::string to own character sequences
Reason
string correctly handles allocation, ownership, copying, gradual expansion, and offers a variety of useful operations.
Example
vector<string> read_until(const string& terminator)
{
vector<string> res;
for (string s; cin >> s && s != terminator; ) // read a word
res.push_back(s);
return res;
}
Note how >> and != are provided for string (as examples of useful operations) and there are no explicit
allocations, deallocations, or range checks (string takes care of those).
In C++17, we might use string_view as the argument, rather than const string& to allow more flexibility to callers:
vector<string> read_until(string_view terminator) // C++17
{
vector<string> res;
for (string s; cin >> s && s != terminator; ) // read a word
res.push_back(s);
return res;
}
Example, bad
Don't use C-style strings for operations that require non-trivial memory management
char* cat(const char* s1, const char* s2) // beware!
// return s1 + '.' + s2
{
int l1 = strlen(s1);
int l2 = strlen(s2);
char* p = (char*) malloc(l1 + l2 + 2);
strcpy(p, s1, l1);
p[l1] = '.';
strcpy(p + l1 + 1, s2, l2);
p[l1 + l2 + 1] = 0;
return p;
}
Did we get that right?
Will the caller remember to free() the returned pointer?
Will this code pass a security review?
Note
Do not assume that string is slower than lower-level techniques without measurement and remember that not all code is performance critical.
Don't optimize prematurely
Enforcement
???
SL.str.2: Use std::string_view or gsl::span<char> to refer to character sequences
Reason
std::string_view or gsl::span<char> provides simple and (potentially) safe access to character sequences independently of how
those sequences are allocated and stored.
Example
vector<string> read_until(string_view terminator);
void user(zstring p, const string& s, string_view ss)
{
auto v1 = read_until(p);
auto v2 = read_until(s);
auto v3 = read_until(ss);
// ...
}
Note
std::string_view (C++17) is read-only.
Enforcement
???
SL.str.3: Use zstring or czstring to refer to a C-style, zero-terminated, sequence of characters
Reason
Readability.
Statement of intent.
A plain char* can be a pointer to a single character, a pointer to an array of characters, a pointer to a C-style (zero-terminated) string, or even to a small integer.
Distinguishing these alternatives prevents misunderstandings and bugs.
Example
void f1(const char* s); // s is probably a string
All we know is that it is supposed to be the nullptr or point to at least one character
void f1(zstring s); // s is a C-style string or the nullptr
void f1(czstring s); // s is a C-style string constant or the nullptr
void f1(std::byte* s); // s is a pointer to a byte (C++17)
Note
Don't convert a C-style string to string unless there is a reason to.
Note
Like any other "plain pointer", a zstring should not represent ownership.
Note
There are billions of lines of C++ "out there", most use char* and const char* without documenting intent.
They are used in a wide variety of ways, including to represent ownership and as generic pointers to memory (instead of void*).
It is hard to separate these uses, so this guideline is hard to follow.
This is one of the major sources of bugs in C and C++ programs, so it is worthwhile to follow this guideline wherever feasible.
Enforcement
- Flag uses of
[]on achar* - Flag uses of
deleteon achar* - Flag uses of
free()on achar*
SL.str.4: Use char* to refer to a single character
Reason
The variety of uses of char* in current code is a major source of errors.
Example, bad
char arr[] = {'a', 'b', 'c'};
void print(const char* p)
{
cout << p << '\n';
}
void use()
{
print(arr); // run-time error; potentially very bad
}
The array arr is not a C-style string because it is not zero-terminated.
Alternative
See zstring, string, and string_view.
Enforcement
- Flag uses of
[]on achar*
SL.str.5: Use std::byte to refer to byte values that do not necessarily represent characters
Reason
Use of char* to represent a pointer to something that is not necessarily a character causes confusion
and disables valuable optimizations.
Example
???
Note
C++17
Enforcement
???
SL.str.10: Use std::string when you need to perform locale-sensitive string operations
Reason
std::string supports standard-library locale facilities
Example
???
Note
???
Enforcement
???
SL.str.11: Use gsl::span<char> rather than std::string_view when you need to mutate a string
Reason
std::string_view is read-only.
Example
???
Note
???
Enforcement
The compiler will flag attempts to write to a string_view.
SL.str.12: Use the s suffix for string literals meant to be standard-library strings
Reason
Direct expression of an idea minimizes mistakes.
Example
auto pp1 = make_pair("Tokyo", 9.00); // {C-style string,double} intended?
pair<string, double> pp2 = {"Tokyo", 9.00}; // a bit verbose
auto pp3 = make_pair("Tokyo"s, 9.00); // {std::string,double} // C++14
pair pp4 = {"Tokyo"s, 9.00}; // {std::string,double} // C++17
Enforcement
???