Special Interest Group on C++
Return-value optimization is a compiler technique to avoid copying an object that a function returns as its value, including avoiding creation of a temporary object. This optimization permits a function to efficiently return large objects while also simplifying the function’s interface and eliminating scope for issues such as resource leaks. However, there are situations where a compiler may be unable to perform this optimization, where a function does not capitalize on the optimization, and where it may be acceptable or even be better to forego this optimization.
Return-value optimization is part of a category of optimizations enabled by “copy elision” (meaning “omitting copying”). C++17 requires copy elision when a function returns a temporary object (unnamed object), but does not require it when a function returns a named object. Also, whether copy elision is helpful depends on how the function’s return value is consumed. Thus, it is important to understand the code organization of both the called and calling functions, and verify if the optimization is performed or is helpful in a given situation.
The terms RVO and NRVO are frequently used in relation to copy elision, but the C++ standard does not define those terms. Also, the term RVO is sometimes used to mean optimization with respect to unnamed objects, but sometimes also to mean optimization in relation to both named and unnamed objects. Thus, for clarity, this post uses the following terms and acronyms:
Listing A shows a simple struct rigged to show which constructor is called as well as to
show when the assignment operator and the destructor are called. The static variable
counter
is used to assign unique identifiers to instances of the struct. This post uses
this rig to illustrate RVO and to check the circumstances in which RVO is performed or
is helpful.
static int counter; // counter to identify instances of S
struct S {
int i{ 0 };
int id;
S() : id{ ++counter } {
std::cout << "default ctor " << id << "\n";
}
S(const S& s) : i{ s.i }, id{ ++counter } {
std::cout << "copy ctor " << id << "\n";
}
S& operator=(const S& s) {
i = s.i;
std::cout << "assign " << s.id << " to " << id << "\n";
return *this;
}
~S() {
std::cout << "dtor " << id << "\n";
}
};
Unnamed RVO (URVO) relates to optimizing the return of “unnamed objects” or temporary
objects, which are objects created on a return
statement.
URVO is a relatively old technique and has been permitted since C++98 (Section 12.2 of that standard), but it is required only since C++17. C++ compilers have likely supported URVO at least as far back as 2001. MSVC has supported it at least since Visual C++ 2005, but in the GCC world, due to my limited access to tools, I am able to trace it back only to GCC 4.1.2 (which was released in 2007).
Listing B analyzes the same code in two scenarios. In both scenarios, the object value
to return is created on the return
statement. In the scenario without URVO, the code
would create two objects: a temporary object in function get_B
using the default
constructor, and the object named s
in main
using the copy constructor. However,
with URVO, the same code would create just one object. The comments in code call out the
location and sequence of object creation and destruction.
Note: Both GCC and MSVC perform URVO by default, and it is not possible to disable it because C++17 guarantees copy elision when a temporary object is returned. (URVO can be disabled in C++14. See Exercise 1.)
S get_B() {
return S(); // 1. default ctor 1
} // 2. copy ctor 2; 3. dtor 1
int main() {
S s = get_B();
} // 4. dtor 2
S get_B() {
return S(); // 1. default ctor 1
}
int main() {
S s = get_B();
} // 2. dtor 1
Named RVO (NRVO) is concerned with the optimization performed for “named objects”, which
are objects returned but not created on a return
statement. Listing C illustrates this
optimization. As the comments point out, without NRVO, the code creates two instances of
S
, but with NRVO, it creates only one object.
S get_C() {
S s; // 1. default ctor 1
s.i = 8;
return s;
} // 2. copy ctor 2; 3. dtor 1
int main() {
S s = get_C();
} // 4. dtor 2
S get_C() {
S s; // 1. default ctor 1
s.i = 8;
return s;
}
int main() {
S s = get_C();
} // 2. dtor 1
Presently, compilers are likely to perform NRVO only if the same named object is
returned from all paths of a function; not if different paths return different
named objects. Listing D sets up this comparison: Function get_D1
has two different
paths and both paths return the same named object. Function get_D2
also has two return
paths, but each path creates and returns a different named object.
Note: GCC performs NRVO in get_D1
, but not in get_D2
(not even with the -O4
compiler option which causes the highest level of optimization).
MSVC does not perform NRVO even in function get_D1
(even with the /O2
option
enabled). That is, at least for now, MSVC does not perform NRVO when branching is
involved, even if the function returns the same named object in all paths.
S get_D1(int x) {
S s; // 1. default ctor 1
if (x % 2 == 0) {
s.i = 8;
return s;
} else {
s.i = 22;
return s;
}
}
S get_D2(int x) {
if (x % 2 == 0) {
S s1; // 2. default ctor 2 (or default ctor for s2 below)
s1.i = 8;
return s1;
} else {
S s2; // 2. default ctor 2 (or default ctor for s1 above)
s2.i = 22;
return s2;
}
} // 3. copy ctor 3 (either s1 or s2 above); 4. dtor 2
int main() {
std::cout << "get_D1:\n";
S s1 = get_D1(3);
std::cout << "\nget_D2:\n";
S s2 = get_D2(3);
} // 5. dtor 3, dtor 1
Listing E shows a subtle logic error that causes loss of NRVO benefit: In main
, an
instance of S
is created using the default constructor in the first line and it is then
assigned the return value from function get_E
. This situation requires the compiler to
create two objects and invoke the assignment operator to set variable s
to the
function’s return value.
Note: The loss of optimization benefit in Listing E applies even if function get_E
returns an unnamed object.
To repeat, the situation in Listing E is a logic error; not a case of the compiler not performing RVO. If the compiler does any sort of RVO (and C++17 guarantees URVO), it does so without regard for the calling context. To be precise, the compiler generates code such that object copying is avoided if the return value is used as the initializer for a receiving variable (and in a few other cases; see Exercise 6).
Because the compiler performs RVO independent of the calling context, RVO benefits are available whether the function is reused in source form or binary form, and it is always up to the calling function to lose or gain the benefit.
S get_E() {
S s; // 2. default ctor 2
s.i = 8;
return s;
} // 3. assign 2 to 1; 4. dtor 2
int main() {
S s; // 1. default ctor 1
s = get_E();
} // 5. dtor 1
In a situation where the compiler does not perform RVO (either unnamed or named) or if RVO cannot be exploited for some reason, a simple solution is for the calling function to pass a reference to a pre-built instance and have the called function modify the object it receives by reference. This approach assumes the object supports all necessary modifier functions. It also makes the function interface a bit more complex.
An alternative is for the called function to dynamically allocate an object and return an object pointer, but this approach is error prone (all the issues related to pointer manipulation) and causes memory leaks if the object is not freed.
Listing F illustrates the use of the two alternatives just outlined: Function get_F1
receives a pre-created object by reference and alters the received object. get_F2
dynamically allocates an object, sets up the object’s data, and returns a pointer to the
dynamic object. main
is responsible for freeing the dynamically-allocated object.
void get_F1(S& s) {
s.i = 8;
}
S* get_F2() {
S* ps = new S; // 2. default ctor 2
ps->i = 8;
return ps; // should be freed later
}
int main() {
S s; // 1. default ctor 1
get_F1(s);
std::cout << s.i << '\n';
S* ps{ get_F2() };
std::cout << ps->i << '\n';
delete ps; // 3. dtor 2
} // 4. dtor 1
In some situations it can be acceptable or be better to forego RVO. For example, if a
function returns a small object (such as an instance of the example struct S
), it may
be acceptable to lose the benefit of the optimization. However, if a function returns a
large object such as a vector of 100 string objects, it might be important to take
advantage of RVO.
It is not possible to take advantage of RVO if the receiving variable in the calling function is required after the block in which the variable receives the function value. The loss of RVO benefit is due to the need to declare the receiving variable before the block in which the variable receives its object value. In this situation, alternatives such as those outlined in Section 6 would need to be used if it is necessary to avoid copying objects.
Listing G shows two possible code organizations to meet the needs of a real-life application.
Functions get
and use
in the listing are some functions that return and accept an
instance of S
, respectively. The code with the “Lose RVO” organization misses out on
the RVO benefit (why?), but it is simple and readable, mainly because the exception
handlers are sequential. In contrast, the code with the “Gain RVO” organization benefits
from RVO, but it is less readable, largely due to the nested exception handling.
Note: Listing G is meant only to illustrate common trade-offs involving RVO. It is not meant to promote any particular program organization. Other (better) organizations can exist, and the organizations possible as well as the best organization depend on the application.
int main() {
S s; // default ctor
try {
s = get(); // lost RVO benefit
use(s);
} catch (...) {
std::cout << "error get/use";
return 1;
}
try {
// stuff, unrelated to s
} catch (...) {
std::cout << "error";
return 2;
}
if (s.i == 3) // use s again
std::cout << "It was 3?";
}
int main() {
try {
S s = get(); // RVO benefit
use(s);
try {
// stuff, unrelated to s
} catch (...) {
std::cout << "error";
return 2;
}
if (s.i == 3) // use s again
std::cout << "It was 3?";
} catch (...) {
std::cout << "error get/use";
return 1;
}
}
RVO is a compiler technique to avoid copying objects when the object is returned as function value. This optimization helps a function to efficiently return large objects while also simplifying the function’s interface and eliminating scope for errors.
C++ requires RVO only for temporary (unnamed) objects, but not for named objects. Also, support for RVO varies by situation and across compilers. Thus, it is necessary to verify if the compiler performs RVO in a given situation and rewrite code to benefit from RVO, forego RVO, or to work around the loss or lack of RVO.
The struct S
in Listing A is a good instrument to test RVO in a given
situation. The code in Listings C and D help determine if a
compiler performs NRVO in a given situation.
Lastly, beware of the confusing use of the term RVO to mean optimization in relation to only unnamed objects, or optimization in relation to either named or unnamed objects.
Complete the following tasks with the code in Listing B. Do not make any change to that code, and run the code only in GCC 10.1:
Run the code with copy elision enabled (which is the default), but in C++14, and
confirm that the results are the same as for C++17. Use the compiler option
-std=c++14
to set the language to C++14.
Disable copy elision (compiler option -fno-elide-constructors
) and run the
code in C++14. How is the result different from the result anticipated in
Section 2 for the “Without URVO” scenario?
With copy elision disabled, set the compiler to use C++17 and compare the result with the result from C++14. What confirmation does the comparison provide?
Answer the following questions in relation to this program prepared to verify copy elision in C++98 using GCC 4.6.4:
As is, which kind of optimization does the code perform: URVO or NRVO? What is the location and sequence of object construction and destruction?
Change function get
such that it performs a different kind of optimization than
what is originally done: change the code to perform URVO if it already performs
NRVO, and vice versa.
With the program as given, disable copy elision and compare the result with the result from when copy elision is enabled. Explain the reason for the differences between the results.
With the program as given, disable copy elision and compare the result with the result for the part of Listing C where copy elision is disabled. What differences are apparent and what are the likely reasons for the differences?
In what ways does the C++98 code provided differ from the corresponding C++17 code in Listing C?
Note: This question is unrelated to RVO, but it is opportunistically included to highlight some syntactic differences between C++98 and C++17.
Disable copy elision for the code in Listings D, E, and G. For each listing, explain the differences between the results with and without copy elision. Use only GCC 10.1 in all cases.
Run all the code examples in this post in MSVC. Run the code with optimization
disabled (/Od
, which is the default) and again with optimization-for-speed enabled
(/O2
) . Analyze the result from each run and compare the results across runs. For
ease of use, set the active configuration to “Release” in all runs.
Note: Visual Studio Community is free, just in case you do not already have MSVC installed.
Consider the declaration S f();
for a third-party library function f
distributed
in binary form. Assume we do not have access to the source code of f
, but we
know the library is compiled with GCC 10.1. State all assumptions you make.
What can we say about the RVO that might be performed in function f
, without
regard for how and where f
is used? Include a rationale for your position.
How and why would your position change for this statement: S s = f();
How and why would your position change for this statement sequence: S s; s = f();
Revise the code in Listing E as follows in C++17 using GCC 10.1. Then,
based on the revised program’s output, answer this question: Does the revised main
get the benefit of RVO? If yes, make a general statement on the circumstances in which
a calling function gets the benefit of RVO. If you say main
does not get the benefit
of RVO, explain why and point to the part of the program’s output that supports your
position. In either case, submit a Compiler Explorer link to the revised program.
Add a function named use_E
which receives a const
reference to an instance of
S
. In the function, merely print to screen the value of S
’s data member i
.
Change the entire body of main
to contain just this one statement: use_E(get_E());
What revision to struct S
and corresponding revision to function get_D2
of
Listing D permit get_D2
to return different objects in each path
while also enabling RVO? What does this RVO-enabling change in S
and get_D2
inform us about writing classes/structs and returning object values?
[I apologize if this question seems underspecified, but being more specific gives away too much of the solution. Please DM if you like more information.]
Ask questions, give feedback, and discuss this post on Twitter. The Twitter link is specific to this post. We greatly appreciate all discussion on the post being only at the post-specific tweet.
Submit solutions by DM on Twitter (only by DM, please) so as to avoid spoilers. Please provide Compiler Explorer links to code. We prefer textual answers in the form of GitHub gists, files in a repo, or other form where we can just follow a link and open the content in a browser.