These articles are written by Codalogic empowerees as a way of sharing knowledge with the programming community. They do not necessarily reflect the opinions of Codalogic.
Traditionally if we want to pass a string to a function that we only
want to view but not modify or store we would use a
const std::string &
parameter on the called function.
Now in C++17 we have std::string_view
. std::string_view
can
store a view of a std::string
or a C-string. It's internals are
very simple, consisting of a const char *
pointer (*) to the start of the string
and a member storing the length of the string.
(*)
std::string_view
is actually a template of typestd::basic_string_view<T>
. To make life simpler in this post I'm describing things as ifT
has been instantiated with typechar
. Later on I've also replaced a lot of the template parameter noise with...
because those details don't add anything to the discussion.
The GCC implementation of std::string_view
's data members is as follows:
size_t _M_len;
const char* _M_str;
It's constructor from a pointer and a length is:
constexpr string_view(const char* __str, size_type __len) noexcept
: _M_len{__len}, _M_str{__str}
{ }
And the constructor from a regular C-string is as follows (where traits_type::length(__str)
is effectively strlen(__str)
):
constexpr string_view(const char* __str) noexcept
: _M_len{traits_type::length(__str)},
_M_str{__str}
{ }
As I said, very simple.
To compare the efficiency of using a const std::string &
parameter versus
a const std::string_view
parameter in a called function I created two functions:
void string_sink( const std::string & s )
{
std::cout << __FUNCTION__ << ": " << s << "\n";
}
void string_view_sink( const std::string_view sv )
{
std::cout << __FUNCTION__ << ": " << sv << "\n";
}
Then in my main()
function I created a std::string
variable, thus:
std::string s = "My string";
The assembler generated by g++ with the -O1
optimisations enabled is:
mov esi, OFFSET FLAT:.LC5
lea rdi, [rsp+32]
call std::basic_string<...>::basic_string<...>(char const*, std::allocator<char> const&)
Note that I'm using a small test string but potentially the
std::basic_string<...>::basic_string<...>
constructor could allocate memory on the heap.
Next I pass this std::string
to the function that accepts
a const std::string &
parameter:
string_sink( s );
Passing a reference of an already existing string to a function is very efficient. The assemby code is:
lea rdi, [rsp+32]
call string_sink(std::basic_string<...> const&)
Passing the std::string
to a function that accepts a
const std::string_view
parameter is done using:
string_view_sink( s );
And the generated assembly code is:
mov rdi, QWORD PTR [rsp+40]
mov rsi, QWORD PTR [rsp+32]
call string_view_sink(std::basic_string_view<...>)
You can see that this is marginally more involved but not much. This is optimised code,
but what is happening is that std::string
's cast to std::string_view
operator method
is being called. The relevant, simplified, fragment of std::string
is:
class string {
public:
...
operator string_view() const noexcept
{ return string_view(data(), size()); }
...
The mov rdi...
and mov rsi...
instructions are directly pulling out the pointer to
the base of the string stored in s
and its length in such a way that the
rdi/rsi
register pair constitutes the std::string_view
object being passed to
string_view_sink()
.
The conclusion is that passing a std::string
to either of the sink functions is very efficient.
Now let's look at passing a C-String to each of the sink functions.
Calling the function that wants a const std::string &
parameter,
i.e. string_sink( "My other string" );
, involves the following assemply code:
lea rdx, [rsp+79]
mov esi, OFFSET FLAT:.LC6
mov rdi, rsp
call std::basic_string<...>::basic_string<...>(char const*, std::allocator<char> const&)
mov rdi, rsp
call string_sink(std::basic_string<...> const&)
mov rdi, rsp
call std::basic_string<...>::_M_dispose()
This is much larger because a new std::string
has to be constructed from the C-string. As before,
this could involve dynamically allocating memory on the heap and be very expensive.
(The call to basic_string<...>::_M_dispose()
emphasises this potential for heap allocation.)
By comparison, calling the function that wants a const std::string_view
, specifically
string_view_sink( "My third string" );
, requires the following assembly:
mov edi, 15
mov esi, OFFSET FLAT:.LC7
call string_view_sink(std::basic_string_view<...>)
This is very similar to the earlier call to string_view_sink()
. The compiler has
optimised the operations so that it can directly put the length of the string (15)
into the edi
register that ends up being part of the std::string_view
object passed to the
function via CPU registers.
In summary, std::string_view
allows us to avoid creating temporary std::string
objects
when we want to call string handling functions. This can potentially be a big increase in efficiency.
The operations that can be performed on a std::string_view
object are pretty much the same as those
that can be performed on a std::string
object. One method that std::string_view
doesn't have is
the c_str()
. This is because a string view might only be a partial part of a larger string and hence not null terminated.
Another hack you can do with std::string_view
is to more naturally compare two C-strings.
Instead of doing strcmp(str1, str2)
, you can do:
if( std::string_view( str1 ) == str2 )
std::cout << "Strings are equal\n";
else
std::cout << "Strings are not equal\n";
This might save you forgetting that the test for equality using strcmp()
requires comparing to 0
, as in:
if( strcmp( str1, str2 ) == 0 )
The code for this analysis is below, and available at: https://godbolt.org/z/f13cffsnx
#include <iostream>
#include <string>
#include <string_view>
void string_sink( const std::string & s )
{
std::cout << __FUNCTION__ << ": " << s << "\n";
}
void string_view_sink( const std::string_view sv )
{
std::cout << __FUNCTION__ << ": " << sv << "\n";
}
int main()
{
// It takes quite a lot to construct a std::string
std::string s = "My string";
/* mov esi, OFFSET FLAT:.LC5
lea rdi, [rsp+32]
call std::basic_string<...>::basic_string<...>(char const*, std::allocator<char> const&) */
// But when you have one, it's easy to pass it to a function wanting
// const std::string &
string_sink( s );
/* lea rdi, [rsp+32]
call string_sink(std::basic_string<...> const&) */
// Passing a std::string to one wanting a std::string_view does require
// some work. Here we are calling the std::string cast to
// std::string_view operator
string_view_sink( s );
/* mov rdi, QWORD PTR [rsp+40]
mov rsi, QWORD PTR [rsp+32]
call string_view_sink(std::basic_string_view<...>) */
// However, passing a C-string to a function wanting a const std::string & requires
// creating new std::string
string_sink( "My other string" );
/* lea rdx, [rsp+79]
mov esi, OFFSET FLAT:.LC6
mov rdi, rsp
call std::basic_string<...>::basic_string<...>(char const*, std::allocator<char> const&)
mov rdi, rsp
call string_sink(std::basic_string<...> const&)
mov rdi, rsp
call std::basic_string<...>::_M_dispose() */
// Passing a C-string to a function wanting a std::string_view is a lot leaner
string_view_sink( "My third string" );
/* mov edi, 15
mov esi, OFFSET FLAT:.LC7
call string_view_sink(std::basic_string_view<...>) */
// Here's a hack for comparing C-strings :)
if( std::string_view( "A string" ) == "A string" )
std::cout << "Strings are equal\n";
else
std::cout << "Strings are not equal\n";
}
The output is:
string_sink: My string
string_view_sink: My string
string_sink: My other string
string_view_sink: My third string
Strings are equal
From now on, where possible std::string_view
should be your default parameter type choice where
you would have previously used const std::string &
.
February 2023
January 2023
December 2022
November 2022
October 2022
September 2022
August 2022
November 2021
June 2021
May 2021
April 2021
March 2021
October 2020
September 2020
September 2019
March 2019
June 2018
June 2017
August 2016