Special Interest Group on C++
This is Part 2 of a 3-part series on std::string_view
. This part focuses on the safety
std::string_view
provides over character arrays, and on the safety considerations to be
made when using std::string_view
.
Part 1 focuses on efficiency of std::string_view
over std::string
. Part 3 provides guidelines for
using std::string_view
.
As stated in Part 1 of this series,
std::string_view
is simply
a read-only wrapper around a character array. This statement is true regardless of the
creation means used: from nothing, from a C-string,
from a character array that is not a C-string, or from a std::string
object.
Listing A shows creation of four string_view objects, each using a different creation
means. The object created from nothing (variable sv1
) is effectively as if it is
created from the empty C-string ""
.
std::string_view sv1; // from nothing: approximates std::string_view sv1("");
std::string_view sv2{"hello"}; // from C-string
char a[]{'h','e','l','l','o'};
std::string_view sv3{a, 5}; // from character array that is not a C-string
std::string s{"hello"};
std::string_view sv4{s}; // from string: approx std::string_view sv4{s.c_str()};
A string_view created from a string object is the same as a string_view created using the
C-string obtained via the c_str
function on the string object. That is, the code associated with variable sv4
could be
rewritten as: std::string_view sv4{s.c_str()};
Note: As discussed in Part 1,
creating a string_view from a string object actually invokes an operator function in
std::string
.
It is much cleaner and safer to use the string_view wrapper instead of directly using a
character array because string_view has functions and operators that abstract over
low-level C-style functions. For example, we could simply test if sv1 > sv2
, instead of
testing if std::strcmp(z1, z2) > 0
, where z1
and z2
represent C-strings, and sv1
and sv2 are corresponding string_view objects. Likewise, we can use the function member
find
instead of the low-level functions std::strchr
, and std::strstr
. (The
string_view function find
can find both single-character and multi-character texts.)
The string_view approach also has the advantage that it works with any character array,
not just with C-strings. As a result, there is no need to resort to using functions such
as std::memcmp
to compare arrays and std::memchr
to locate a character. Plus, with
string_view, there is no need to explicitly carry around the size of every character
array.
Also, because a string_view is immutable by design, it eliminates the many const
qualifications that would be necessary to guarantee immutability of character arrays.
Lastly, the string_view approach is safer because the programmer does not have to worry
about buffer overflow and other issues associated with low-level functions such as
std::strcmp
and std::strchr
.
As stated in Part 1, just two internal data members facilitate the entire string_view functionality:
data_
: a pointer to the first character of the array wrappedsize_
: the number of characters of interest in the arrayUsing just these two data members, a string_view is able to become a wrapper to a
character array regardless of the creation means used: In all four creation means shown
in Listing A, the internal member data_
points to the first character in the array
wrapped, and the size_
member contains the number of characters of interest.
String_view function members data
and size
provide
access to the internal data members data_
and size_
respectively.
There are three major safety-related issues when using string_views:
data
function member always returns a C-string.String_view does not make a copy of the character array it wraps; nor does it own the array. Instead, the string_view object’s creator continues to own the character array and is responsible for the array’s management. Specifically, after a string_view object has served its purpose, the object owner should deallocate the array if the array was dynamically allocated.
The bottom line is that a string_view object should never outlive the character array it wraps. For example, a function should not return a string_view object that wraps a local array. Listing B shows examples of acceptable and unacceptable uses of string_view. The comments in the code are self-explanatory.
Note: Carefully study the differences between functions bad_idea
and
also_acceptable
in Listing B.
std::string_view bad_idea() {
char z[]{"hello"}; // z is deallocated when this function ends
return std::string_view{z}; // bad: returned object wraps deallocated array
}
std::string_view another_bad_idea() {
std::string s{"hello"}; // s is deleted when this function ends
return std::string_view{s}; // bad: returned obj wraps data in deleted string
}
std::string_view also_bad_idea() {
char* p = new char[6]{};
std::string_view sv{p}; // sv wraps dynamically allocated array
delete[] p; // sv still wraps deallocated array
return sv; // bad: returned object wraps deallocated array
}
void possibly_acceptable(std::string_view& sv) {
std::cout << sv; // safe only if sv wraps a legit array
}
void acceptable() {
char z[]{"hello"}; // z is deallocated when this function ends
std::string_view sv{z};
possibly_acceptable(sv); // OK: z lives until end of this function
}
std::string_view also_acceptable() {
return std::string_view{"hello"}; // OK: "hello" has static storage
}
Another issue to be aware of when using string_view is that the function member data
is
not guaranteed to return a C-string. Specifically, that function simply returns a pointer
to the first character in the array that was passed to it. (It could return a pointer to
a later character in the array if the function remove_prefix
was called earlier.)
Listing C illustrates safe and unsafe uses of the data
function member.
Part 3 discusses in detail, but briefly, it is
better to avoid accessing the data
function altogether. For example, insert a
string_view directly to an output stream (as is done with sv6
in the last line of
Listing C) instead of inserting the value returned from the data
function.
data
function memberchar z[]{"hello"}; // z is a C-string
std::string_view sv5{z};
std::cout << sv5.data(); // OK: sv5 wraps a C-string
char a[]{'h','e'}; // a is not a C-string
std::string_view sv6{a,2};
std::cout << sv6.data(); // unsafe: sv6 does not wrap a C-string
std::cout << sv6; // OK: insertion operator is safely overloaded
Overall, std::string_view
provides a cleaner and safer means to process immutable data
than character arrays do. However, there are some safety concerns in using string_view,
especially concerns related to object lifetime.
Listing D shows two versions of a function to count vowels in some text. The first version represents text as a C-string; the second represents text as a string_view. The listing aptly demonstrates that the string_view version is both simpler and safer:
No pointers
No need for const
qualification: the C-string version needs const
qualification; the string_view version does not need it, but that qualification is
made as good practice. (In this case, there is a benefit to const
qualifying the
string_view objects. What is that benefit?)
Simpler code: the for-loop header and the test for vowel are both easier to comprehend (and thus to maintain) in the string_view version.
No undefined behavior: the C-string version has undefined behavior if the null character is missing. (This issue exists in two locations in the C-string version. What are those locations?)
Note: Be sure to read Part 3 of this series for guidelines on using string_view.
// using C-string
std::size_t vowel_count(const char* z) {
const char vowels[]{"aeiouAEIOU"};
std::size_t count{0};
for (std::size_t i = 0; z[i] != '\0'; ++i)
if (std::strchr(vowels, z[i]) != nullptr)
++count;
return count;
}
// using std::string_view
std::size_t vowel_count(const std::string_view& sv) {
const std::string_view vowels{"aeiouAEIOU"};
std::size_t count{0};
for (auto c : sv)
if (vowels.find(c) != std::string_view::npos)
++count;
return count;
}
Answer the questions embedded in the bulleted list in the summary section.
vowel_count
in Listing D is faster? Which
version is likely to use more run-time memory? Why?
Which of the two versions vowel_count
is better to count vowels in a character
array that is not null-terminated? Why?
Using only the string_view version of vowel_count
, write a main
function to count
vowels in a C-string and a character array that is not null-terminated. That is, make
two calls to the string_view version of vowel_count
, each time with a different
argument. (For this part of the exercise, it might help to remove the C-string
version of vowel_count
from the program.)
Assuming the string_view abstraction (or something comparable) does not exist or cannot be used, write a function or functions to count vowels in a character array that may or may not be null-terminated.
It is OK to have two versions of the function if that approach seems better. (Listing D already shows the C-string version.) However, strive to reuse code as much as possible, but also strive to make code maintainable and “efficient”. Include comments that clearly explain the rationale for your choices.
Rewrite the string_view version of vowel_count
using member function
remove_prefix
. There are three different approaches to
this rewrite. Try all three approaches and outline the pros and cons of each approach. State
which approach you prefer and include a rationale.
Rewrite the string_view version of vowel_count
using member function
find_first_of
.
Which version is “better”: the one in Listing D, or the rewritten one? Why?
Write a C-string version of the code in Listing B of Part 1.
Write a program to extract words from text, where words may be separated by space, comma, semi-colon, or period. Write both a C-string version and a string_view version.
Do not use regular-expression, stream extraction, or other such approach that simplifies the task, but feel free to use any other standard-library facility.
Break down the code into appropriate functions.
const
qualify all variables/parameters that represent immutable text. Meeting this
requirement is quite important for this exercise.
Hard-code the following immutable text in the program and use it in testing. Just for this exercise, do not read the text to process as user input at run time:
The quality mantra: improve the process; the process improves you.
Depending on the approach taken in the C-string version, hard-coding the text to
process as a const
qualified variable/parameter could pose a challenge. Yet,
use a const
qualified variable/parameter to represent the text to process exactly
as required in the preceding bullet.
Ask questions, give feedback, and discuss this post on Twitter. The Twitter link is specific to this post. We greatly appreciate all discussion on the post being only at the post-specific tweet.
Submit solutions by DM on Twitter (only by DM, please) so as to avoid spoilers. Please provide Compiler Explorer links to code. We prefer textual answers in the form of GitHub gists, files in a repo, or other form where we can just follow a link and open the content in a browser.