SL.str
SL.str: String
Text manipulation is a huge topic.
std::string
doesn't cover all of it.
This section primarily tries to clarify std::string
's relation to char*
, zstring
, string_view
, and gsl::span<char>
.
The important issue of non-ASCII character sets and encodings (e.g., wchar_t
, Unicode, and UTF-8) will be covered elsewhere.
See also: regular expressions
Here, we use "sequence of characters" or "string" to refer to a sequence of characters meant to be read as text (somehow, eventually). We don't consider ???
String summary:
- SL.str.1: Use
std::string
to own character sequences - SL.str.2: Use
std::string_view
orgsl::span<char>
to refer to character sequences - SL.str.3: Use
zstring
orczstring
to refer to a C-style, zero-terminated, sequence of characters - SL.str.4: Use
char*
to refer to a single character -
SL.str.5: Use
std::byte
to refer to byte values that do not necessarily represent characters -
SL.str.10: Use
std::string
when you need to perform locale-sensitive string operations - SL.str.11: Use
gsl::span<char>
rather thanstd::string_view
when you need to mutate a string - SL.str.12: Use the
s
suffix for string literals meant to be standard-librarystring
s
See also:
SL.str.1: Use std::string
to own character sequences
Reason
string
correctly handles allocation, ownership, copying, gradual expansion, and offers a variety of useful operations.
Example
vector<string> read_until(const string& terminator)
{
vector<string> res;
for (string s; cin >> s && s != terminator; ) // read a word
res.push_back(s);
return res;
}
Note how >>
and !=
are provided for string
(as examples of useful operations) and there are no explicit
allocations, deallocations, or range checks (string
takes care of those).
In C++17, we might use string_view
as the argument, rather than const string&
to allow more flexibility to callers:
vector<string> read_until(string_view terminator) // C++17
{
vector<string> res;
for (string s; cin >> s && s != terminator; ) // read a word
res.push_back(s);
return res;
}
Example, bad
Don't use C-style strings for operations that require non-trivial memory management
char* cat(const char* s1, const char* s2) // beware!
// return s1 + '.' + s2
{
int l1 = strlen(s1);
int l2 = strlen(s2);
char* p = (char*) malloc(l1 + l2 + 2);
strcpy(p, s1, l1);
p[l1] = '.';
strcpy(p + l1 + 1, s2, l2);
p[l1 + l2 + 1] = 0;
return p;
}
Did we get that right?
Will the caller remember to free()
the returned pointer?
Will this code pass a security review?
Note
Do not assume that string
is slower than lower-level techniques without measurement and remember that not all code is performance critical.
Don't optimize prematurely
Enforcement
???
SL.str.2: Use std::string_view
or gsl::span<char>
to refer to character sequences
Reason
std::string_view
or gsl::span<char>
provides simple and (potentially) safe access to character sequences independently of how
those sequences are allocated and stored.
Example
vector<string> read_until(string_view terminator);
void user(zstring p, const string& s, string_view ss)
{
auto v1 = read_until(p);
auto v2 = read_until(s);
auto v3 = read_until(ss);
// ...
}
Note
std::string_view
(C++17) is read-only.
Enforcement
???
SL.str.3: Use zstring
or czstring
to refer to a C-style, zero-terminated, sequence of characters
Reason
Readability.
Statement of intent.
A plain char*
can be a pointer to a single character, a pointer to an array of characters, a pointer to a C-style (zero-terminated) string, or even to a small integer.
Distinguishing these alternatives prevents misunderstandings and bugs.
Example
void f1(const char* s); // s is probably a string
All we know is that it is supposed to be the nullptr or point to at least one character
void f1(zstring s); // s is a C-style string or the nullptr
void f1(czstring s); // s is a C-style string constant or the nullptr
void f1(std::byte* s); // s is a pointer to a byte (C++17)
Note
Don't convert a C-style string to string
unless there is a reason to.
Note
Like any other "plain pointer", a zstring
should not represent ownership.
Note
There are billions of lines of C++ "out there", most use char*
and const char*
without documenting intent.
They are used in a wide variety of ways, including to represent ownership and as generic pointers to memory (instead of void*
).
It is hard to separate these uses, so this guideline is hard to follow.
This is one of the major sources of bugs in C and C++ programs, so it is worthwhile to follow this guideline wherever feasible.
Enforcement
- Flag uses of
[]
on achar*
- Flag uses of
delete
on achar*
- Flag uses of
free()
on achar*
SL.str.4: Use char*
to refer to a single character
Reason
The variety of uses of char*
in current code is a major source of errors.
Example, bad
char arr[] = {'a', 'b', 'c'};
void print(const char* p)
{
cout << p << '\n';
}
void use()
{
print(arr); // run-time error; potentially very bad
}
The array arr
is not a C-style string because it is not zero-terminated.
Alternative
See zstring
, string
, and string_view
.
Enforcement
- Flag uses of
[]
on achar*
SL.str.5: Use std::byte
to refer to byte values that do not necessarily represent characters
Reason
Use of char*
to represent a pointer to something that is not necessarily a character causes confusion
and disables valuable optimizations.
Example
???
Note
C++17
Enforcement
???
SL.str.10: Use std::string
when you need to perform locale-sensitive string operations
Reason
std::string
supports standard-library locale
facilities
Example
???
Note
???
Enforcement
???
SL.str.11: Use gsl::span<char>
rather than std::string_view
when you need to mutate a string
Reason
std::string_view
is read-only.
Example
???
Note
???
Enforcement
The compiler will flag attempts to write to a string_view
.
SL.str.12: Use the s
suffix for string literals meant to be standard-library string
s
Reason
Direct expression of an idea minimizes mistakes.
Example
auto pp1 = make_pair("Tokyo", 9.00); // {C-style string,double} intended?
pair<string, double> pp2 = {"Tokyo", 9.00}; // a bit verbose
auto pp3 = make_pair("Tokyo"s, 9.00); // {std::string,double} // C++14
pair pp4 = {"Tokyo"s, 9.00}; // {std::string,double} // C++17
Enforcement
???