Video Thumbnail 57:07
A Long Journey of Changing std::sort Implementation at Scale - Danila Kutenin - CppCon 2023
12.3K
266
2023-12-05
https://cppcon.org/ --- A Long Journey of Changing std::sort Implementation at Scale - Danila Kutenin - CppCon 2023 https://github.com/CppCon/CppCon2023 Sorting algorithms have been improving for almost 50 years now with claims of better efficiency and various properties but the question of adopting a new one at scale is often left behind. At Google we replaced std::sort in the LLVM libc++ library to test it for hundreds of thousand calls. We faced many challenges which we had been fixing for ...
Subtitles

it's my first time at cppcon I've thought about going for a lot of years

and now kind of uh regretting that I didn't go

[Music] sooner hello everyone let's get started

um so my name is danil uh today I'm going to talk about a long journey of

changing standard sword implementation at scale so who am I I a senior software

engineer at Google I work on software efficiency for all our data centers at

Google so um if you want to talk about performance like come to me we'll have a

lovely chat so what's the agenda for today

today we're going to talk a little bit about the history of sorting like how it

was developed in the standard Library what's out there like and the research

behind it so why have we decided to change anything at Google uh a lot of

bugs so you'll love them uh implementation which implementation we

decided to choose why with prons and cons efficiencies and efficiencies so

and last but not the least what can you do to improve your sortings to improve

or maybe just to update the library um all right so couple reminders

so uh I believe if you wrote C++ for about a week maybe a month you wanted to

sort some elements and the Sorting is pretty much just the ordering of

elements today I'm going to talk about the calls which about like the calls St

sword stable sword range sword and um like similar partial sword

nth element so all this stuff um yeah so you can see that um

they might accept two or three arguments two arguments like for begin and end and

the third argument a user defined comparator that's a quite

important detail which we're going to talk later so this is where most

interest most interest interesting stuff comes from all right um many of you

should know what quick sort is but just um short reminder so quick sort is a

type of sorting General General sorting so pretty much you take any element all

those who are less than this element go to the left those who are more to the

right and and then you recurse on both parts and you can have different

strategies for taking this element maybe a median of three maybe just the middle

one maybe I don't know something out of some set doesn't really matter quick

sort is like the general um description of any of such

algorithm um and if we going to look at first STL version by Stan and Lee we're

going to see these like four or five lines so third line is like quick sort

Loop and then final insertion sort so how does quick sort Loop look like uh it

look like it looks like um like the strategy for partitioning it takes um

median of three elements and then then it does partitioning like as again less

to the left more to the right and um uh it does some thresholding so what what

does it mean it means that if you have in the range more than elements just do

as usual quick sort if you don't have more than these elements don't do

anything and the range will be unsorted STL threshold in the first

implementation was set to 16 uh because you don't want to go a lot of deep and

like um deep you want to you want to have you don't want to have deep

recursion like and when you have 16 elements you're going to go like four

times so other sorts might be useful in this case uh and uh yeah final insertion

insertion sort pretty much just Smooths out these inaccuracies for 16 elements

it just does insertion sort for 16 elements insertion sort is a type of

sorting where you just go element by element and put it into the place where

it should be sorted uh to the prefix um but it works quadratic

time um so this is pretty much how looks like the first sorting from uh like

first STL you can see that you have more and more sorted range and you see these

little little inaccuracies uh that uh for every 16 ele

that occur for every 16 elements and in the end uh it does this um final

insertion sort and just Smooths everything out and now you have a

complete sorted array so this is kind of the first implementation of sorting that

has been available in somewhat standard library of

C++ okay but standard Library Bol from STL so STL is kind of notion of standard

template library now we call it like just standard library of C++ and uh we

have the following um the following definitions for them so the Sorting can

take two arguments can can take three arguments and the only restriction that

it should it has that it should it should have like an Logan comparisons

this actually hasn't been always the case so for some time it was analog and

comparisons on average but in C++ 11 it was changed to have worst case scenario

Anan and quick sort uh as you might know but it's easy to understand worst case

scenario arrow is squared because you can just pick the worst element and only

eliminate in Rec cursion only one element which is not great but this is

possible so the um so quick initial quick sort no longer works as a standard

compliant algorithm so what has been done and GCC leaps did C++ as well as L

leaps C++ but that's in a couple slides introduced a notion called intort what

does intort mean so this is the intort Lo and you can see final insertion sord

is still there so it hasn't gone anywhere but like in this int like and

you yeah the the difference is only in these calls and interport loop has on

line eight this argument so it has argument of two times logarithm it's

base two of plus minus first so what does it mean so what does it actually do

um so this algorithm is is called intort have no idea when it was named like that

but uh whatever so it's almost like a quick swort as you can see it has the

same um almost the same arguments and if we suspect a lot of recursion calls so

for example I don't know we don't want to go quadratic otherwise like you will

um exhaust your stack so in this case just use Heap sword and this argument

actually is what just limits the number of layers you go deep into the recursion

uh Microsoft standard Library does pretty much the same um but with the

only difference that when it when it comes to 16 elements in the range it

just calls insertion sord whereas GCC leap C++ calls insertion sord at the end

uh like the in the GCC version might be a little bit faster because if you do uh

like linear accesses you might get better cach locality um which might

result in better efficiency of performance of standard sword all right

but if we think a little bit like about the worst case scenario like it's just

not great I mean um still does the job but it does it like slower and uh you

can see that uh performance of sorting can vary from input to input that's

usual in in this um area This research so and you can see in the end it also

tries to smooth everything out all

right um today I'm going to talk a lot about LM leap C++ because at Google we

use it primarily for pretty much everything that been built all products

all that are written in C++ are built with standard library from llm and but

they didn't do a lot for like they didn't do anything for a long time so in

2014 uh the issue was reported that standard sword has worst case quadratic

worst case but standard mandates and Logan however like it took only seven

years years kind of to fix it so what's wrong um like first of all why was that

um the answer is just simple actually anybody of you could do and just

Implement that the thing was nobody really cared a lot and um like that

that's the only factor and after that we implemented that we found that only 0.1%

of calls got into this Heap Sort fallback in production so um yeah this

is I mean Google might be Google production might be biased of course but

that's what we what we've seen so far um so if we talk a little bit about

lvm history and sorting lvm history it doesn't like it doesn't do only quick

sort it doesn't doesn't it didn't want to do only quick sort it only wants to

be a little bit faster than leaped C++ back then um it this is the slide from

lolm conference from 2010 I guess uh C++ 11 hasn't been adopted and uh what it

wanted to do it just wanted to adapt to S sort like to certain patterns for

example equal patterns ascending descending even odds some pipes like

maybe almost almost sorted and random of course random

random sorts are also important and as you can see Lam leap C++ really worked

well on um these patterns and the truth is such patterns are actually quite um

they are not rare they do exist a lot especially in source that are not big

for example I know 100 elements like a lot of cases we've seen where it was

sorting almost already sorted um array and like here comes the question a

very good question that we want to uh kind of like answer before we you know

do anything with the implementation what what what what what what does constitute

a good sort what is a good sword so first of all it should like to have

analog and comparisons that's B like that's written in the standard uh

probably it wants to recognize sorted patterns we don't want to sort a lot of

random we don't want to waste a lot of time on already sorted um Ray um it

should be kind of fast for Modern Hardware and fewer comparisons for heavy

sorting for heavy comparison sorting so assume you have sorting comparison which

I don't know reads from uh reads from chunks of data for like SQL database and

the fewer comparisons you do the fewer reads you do uh this like the hardest

part of all that is actually the third bullet because Hardware changes and we

need to change the implementation as well but how we going to change it so is

it okay just to change a CD sort I mean on one hand yes it's good but let's see

first of all why why do we why do we need other other patterns as well so lip

C++ on a sending already finished and you can see lips C++ on a sending does

some weird stuff uh which I know hasn't finished yet but we'll just wait and see

how it sorts the El all right now it finished so patterns are important

Hardware is important and Lan comparisons is

important so what can we do about it

um so uh patterns first of all uh I want to talk a little bit about how how lip

C++ has achieved that and what it does it when it tries to recurse on subarrays

into quick sort it what it does it does a little bit of insertion sort but

insertion sort up to the end will be quite bad it will be quadratic in the

worst case but so what it does it just sets the limit so it tries to insert at

most eight elements so it just goes sees okay this is the sorted sorted prefix oh

this is the element that not in this place let me move it and only move it

only only do such operations eight times so find eight elements at most which are

not sorted and put them uh to the right place once you put them into the right

place you can recurse on the part that is not sorted and everything will be

really nice and sorted at the end and it in the wave for example it recognizes as

sending p uh patterns like patterns where like almost everything is sorted

and it's good yeah for sorting and Hardware the

story is somewhat interesting because every now and then and actually research

has start like research for sorting has started I don't know since

1950s and continues even now and you can see a lot of stuff has been done for

example I don't know x86 seemd sword VQ sword vectorized portable quick sword

somebody wrote a faster sorted algorithm somebody said okay part and defeating

like sort is better boost library has four different sorts like what should we

do like which which sort should we choose and and why the answer is we

probably as C++ programmers should choose the default like call to STD sort

and the question then okay can we change it to some something faster because I

mean 15 like 10 years has pass have passed can we change to something faster

yes and like our main motivation was to save Cycles uh one more motivation was

that we've seen some production failures but that's like a little bit

later um so we tried uh I love working in Google because we have hundreds of

millions lines of code and I just can take some I just can change something

and just see how it fails and it failed quite a lot so um let's look at this

picture a little bit longer than for other slides um this is pretty simple

don't read too much uh in the middle you have um like a simple SQL script which

inserts some like idea one and like which inserts id1 and id2 topples into

the table and then what it does it just selects everything from this table order

by only First Column you can see on the left

POS pogress um just outputed like has the output which is different from my

SQL and why is that uh the truth is we just ordered by the first element and

second element doesn't promise any order when you have ties and we call this

problem ties and we have we had a lot of these problems let's look at some C++

code say you have a vector pair of vectors and you want to sort say I don't

know by first element and if first elements are equal then the order of the

second element is not defined because this sword like standard sword is

unstable sort um it doesn't promise any ordering of equal elements and a lot of

people wrote tests which pretty much said okay do something it does some sort

in the middle of the function and then expect some I don't know PR above Json

some golden test and this actually caused a lot of pain when we try to

change change the implementation yes um and we decided

okay um in order to change STD sort to something better and at the same time

we're going to improve testing because people who depend on Golden tests

probably don't do real good job in what they really test anytime we're going to

change anything they going to be broken and we wanted to find such cases we

wanted to fix them we wanted to make them more

robust so how can you do that the answer is pretty simple just uh do the

randomization so this is the randomization which we added before

sorting so just randomize elements and then do your usual sort uh we decided

not to include it in like optimized builds that's on the macro however like

we included this macro and find a lot of cases a lot of flacky tests a lot of

problematic golden tests um and uh um how how can we do that actually in the

standard library because uh finally finally enough in the standard Library

when you want to do something just I know some Shuffle you need to depend on

the header random and header random probably depends on header algorithm

because it needs STD Min STD Max element and something like that um we use

technique so-called address space layout optimization so um you have I don't know

a global variable which has its address if it's loaded randomly into memory it

can be a good seeding it can be a good seed for for for um linear I know random

generator and it's uh it's by default on Mac OS it can be turn like it's by

default on shared libraries on Linux so it was pretty good for

us all right uh we thought a little bit about randomization so you can randomize

STD sort before before doing the actual sort however there are other calls that

like deserve randomization as well for example SD nth element and a CD partial

sword a CDN element is an algorithm which um it has three arguments um first

and third arguments are begin and the end of the range and the second argument

takes a iterator to Something in the middle of this range and it promises to

to order the elements in the way that this middle element is put into the

array as if it was sorted for partial sort um first uh first elements up to

this point up to this iterator will be sorted and you can see uh fun fact that

yeah an element of begin plus 9 um like has the element sorted only for partial

sort which has begin plus 10 so this plus minus one is actually very tricky

and we found a lot of bugs when we decided to randomize the ranges which

where the order is undefined and other or for example for nth element the order

of both parts are undefined so anything can be there nobody just should rely on

that if we want to I know Implement a new algorithm for nth element like that

uh that should not be relied on a partial sort pretty much the same the

right part uh has has a property that the order is

undefined uh so how could you do that pretty simple just randomize range for

nth element randomize range before randomize range that comes up to the nth

element and randomized range range that comes after nth element strictly after

for partial sword these are two randomizations just before and the right

part so box uh probably the most interesting part and honestly uh I know

maybe I did a couple bugs as well first of all determinism golden tests and for

example this is a this is an example from click house which is written a lot

of in C++ uh and how how was the test fixed pretty much they extended the

ordering in some tests and moved the reference files here and there we've

seen a lot of failures in my SQL in progress so in my SQL we haven't seen

failures in progress because it's mostly c um but yeah a lot of other places as

well inside Google so um harder cases involved more

time for example assume you have you sort some data or a search and you cach

it somewhere then you have ties and they handled randomly and now we have lower

ke lower lower heat rate cash heit rate because ties can change something that's

on the border where you want to I don't know limit limit the result so you might

have different data depending on different sorts and your cash might

invalidate we couldn't afford that even if we change it only once because a

human change change only once and then throughout the roll out you will you

will pretty much dunk your cash which is not great for services and we tried to

find that we were pretty successful in it uh but that's these are harder cases

for goldens um sometimes yeah in order to

fix such goldens um you need to use probably stable sword stable sword

promises that the elements in which they were in the original array equal

elements it promises that equal elements in the original array will be in the

same order in the resulting array so it's um

deterministic and actually I wondered why in my computer science course uh we

talked a lot about like stable source and the answer is this you don't you you

want like stable sorts if you want determinism all right other bugs let's

try to write a median function um I mean it's pretty good so

we just do an elements to the median um just take the result of this median well

if there are even number of elements we should take um median from like minus

one and like take the average of these two values and the result will be just

um element in the vector from medium and medium minus one over two and we just

return the result um anybody does anybody see the bug in

this in this snippet all right quite a few

hands okay um yeah the bug in this snippet is

that you call nth element twice and the second nth element U might re rewrite

the element for example on the position Med like V of medium might not be like

the one that was here because an element doesn't doesn't specify any order on

both parts um but you can you can get lucky because an element actually puts

elements puts elements around and close to Total ordering

and um let me show you examples from I don't know from GitHub repositories so

this is exactly what happened in some repository with three 3,000

Stars U this happened in repository for m m plot I believe it's a famous library

for I guess python but I might be wrong might M plot lip so you can see they try

to um to use to compute inter quartile range but they called an element first

then an element second one and then they access the

access the the elements in the array and probably go probably they have incorrect

results um another element uh actually it's a quite another another bug

actually it's quite funny because there is a snippet that this was taken from

Geek forces and probably you know the site and yeah actually you can go now to

geek forces and there will be exactly this

bug um another bug another bug

it's Ableton um another bug medical image analysis that's

scary [Music]

uh uh another bug and I

can continue like infinitely so what you can do if you use lvm liap C++ please

use this macro macro in your builds if you want to randomize um randomize

sortings an elements partial sortings it proved to be really useful in finding

bugs fighting golden tests uh and like it improved the test how people think

about testing so it's probably the most important

part all right now let's talk a little bit more more complicated stuff in order

to sort or do anything complicated with ordering in C++ your elements and your

comparator should satisfy so-called strict weak ordering

what does it mean it contains of four properties so if you compare element

with itself the result is always false like

it's it's very logical a symmetry if one element is less than another then the

other way around cannot be true so trans transitivity if x is less than y y is

less than Z then X is less than Z and fourth property is a little bit more

complicated but it's still still needed uh what it means that the incomparable

elements like they considered equal among each other so for example like if

x is less than y is false and Y is less than x is false as well then we consider

these elements equal and if two elements are equal and another two elements are

equal then first and first and third should be also

equal um like interesting interesting fact and

I I've heard a lot um even from the standard Library um imp from the from

the people who Implement standard library that such checks are very

expensive so in order to check everything here for example for the

third for the third and fourth condition you need kind of to Traverse over

triples of all elements in the array and in order to validate that your input

like satisfies strict wig ordering we need like cubic time that's completely

inappropriate for analog algorithm as as sort so but let's think a little bit but

what are the risks what if we don't do that um on paper it's undefined Behavior

but I'm more interested as an engineer what happens in practice and in practice

um I mean best case scenario best case scenario you can sort and then SD is

sorted uh returns false well not great but at least

doesn't fail probably yeah you but probably it's not great um

like let's look at this example probably you see something uh on the left um it's

a very simple very simple code I just have a vector of integers which is

filled with minus one and then I sort them the way that to the left I want all

negative integers and to the right I want Total ordering of positive

integers and uh at first it seems an okay comparator but if uh but um num

alums if I put num Elms to 30 this just works

fine it just sorts and doesn't do anything so that's okay but if we put

301 elements it actually fails with segmentation fault uh and this is

actually very tricky because like I can prove that no like with this comparator

there there is no single case that when it fails with less or equal than 30

elements and this is actually quite interesting because okay if you don't

have in your range 30 elements then all good if you have more them something can

fail but I haven't written a wellth thought test for 30 plus elements just

to test sorting and we found a lot of attack vectors in this area so if you

have incorrect comparator then I can fail your sword with 31 elements like

with out of Bounce read pretty much so and this is one with this is

also one one one reason why we decided to change anything because at Google

like once in several months we found some problems like literally production

failures out of bounds reads segmentation faults scary stuff attack

vectors security like doesn't work and all that stuff we were scared like weeks

of debugging all that stuff and yeah um why does why does it actually can have

like why is it ever why is it even happening the truth the truth is when

you um do the partitioning in your quick sort you kind of wait for an element to

like to be false because for example if you start with a pivot and then you go

further you some at some point will find that there is a pivot element um and you

compare pivot with pivot and this should result in false and this while loop

should end but if it but if it returns a lot of truths

then it doesn't end and then then reads um out of bounds one solution of course

is just to have a bound check here um which we'll discuss a little like later

however well like if you know that you like your comparator is correct you

probably don't even think oh yeah I'll definitely find an element which um

which is less than pivot which is not less than pivot yeah

so but still uh the risks are actually quite huge so we have failing production

we have incorrect sortings um like but still the check actually is uh cubic

what what what can we do um first thing we use some maths to reduce the

complexity of such check um I'm not going to dive into why it works I'm just

going to tell you the algorithm how it works so first of all we need to sort

the range with some algorithm that doesn't fail um this is a little bit

harder than you might think but just yeah imagine any algorithm that just

finishes and correctly sorts when the comparative like satisfy satisfies uh

strict we ordering just sort the range if it's not sorted return false if it's

not sorted after you sort it then your comparison does not satisfy strict week

ordering um then find the prefix of all equal elements so just compare your

first element with the next one and until you find something that's bigger

uh check that these elements are equal among each other uh check that they are

less than all that remain and just remove them um it's a little bit like

not very obvious like why it works you can read the proof uh by the link above

and so but it works so still uh we have a squared algorithm for check and stct

reordering um and what we decided to do in the end is okay we have a sort we

have a quadratic algorithm that kind of tells you like an oracle does your

comparator satisfy St reordering or maybe it's not maybe it does not and we

decided to try out okay let's check just first

100 elements surprisingly it worked insanely Well because most sorts of

small like majority of cases are sorts on 100 elements um having I don't know

10,000 comparisons extra comparisons in debug builds like hasn't time like

hasn't resulted in any timeout across all tests that we had at Google and uh

uh yeah we were very happy actually to discover that limiting this check works

and finds most issues for cubic algorithm we thought about 20 elements

because I mean we if you have like 100 elements for a cubic algorithm that's

going to result around million uh comparisons which is probably not which

is probably a little bit like harder to do so

square like quadratic algorithm allowed us to do it like 100 elements all right

but there are some caveats to this algorithm so first of all the Sorting

should not fail but we've seen that it like you can uh you can give invalid

comparator some input just reads out of bounds um uh it only tells that the

strict ordering is met or not it doesn't give you the triple there is no

algorithm to give you a triple uh which uh like no algorith quadratic algor

which gives you the triple that violates a strict we ordering and arbitrary

comparator can do anything so like sorting can fail for example um

comparator Can fail if you give it I don't know two elements like second and

998th element and probably in your sorting you never call that but we we

might um third bullet like was not very relevant we haven't seen many of such

cases um the hardest was the first one so uh what we decided to do we

decided to extend the debug do the debug builds so one thing to kind of fight uh

out of bound re is to do like asan builds uh addess sanitizer build debug

builds to find most cases or to have another I know oracle which can give you

can point you to some problems in your program and yeah we decided to go with

this request uh we decided to go with this proposal to Elam Community which

was accepted pretty fast um so first thing we did is just out of bounce check

in cases where we partition the elements in quick sort that just gives us um the

opportunity to find um if it fails in for example like online 5 510 then your

comparator is not strict Rec ordering uh then we enabled St week

ordering checks for stable partial STD sort and it was rolled out in lbm 177

which probably which which release was probably a week ago and if you use lvm

and lip C++ you can use these command line Flags to enable strict quiek

ordering check in your projects all right um at Google it took

us around seven six to seven months to fix everything I wouldn't say that we

worked like full-time on fixing everything it was just more like a thing

that okay we uh I know I had I don't know 20 minutes 20 spare minutes we have

a thing called um I know it's it's a collaborative project you can kind of

jump in fix a couple targets and move on with your project so some Community

contribution and yeah but it took us overall six to seven months to fix to

find issues uh we found fewer of them but they were much

scarer so I'm going to list like um not the actual bar bugs but the family the

families of bugs so first bug is when people tried okay I have STD less I'm

just returning I like left hand side less than right hand side oh if I want

to sort sort in the other order what I do is just I need to negate it no

unfortunately not you cannot do that because uh IR reflex IR reflexivity

would be uh would not be met um the more complicated bug in this case was looked

like that so you you were reading a flag uh do you want to sort descending or

ascending and bug was only happening when the

desending is false but when it was true everything worked

fine um another issue if you have one element in your vector no comparator is

called because why would you like one element is already a sorted Vector like

Vector of one element is already s sorted Vector but if you have um I know

comparator which I know the references some pointer and you have no like and

you have an element which doesn't have any value and the referencing is just

null um then like you're the referencing null pointer that just results in in a

failure um another thing was um like okay we like another failure was to

check if something doesn't have any value and just to partition it let's

just okay I do have some um I do have some elements which I

want to partition to the left but they don't have any value and those who do

have value they will be like um ordered by their oper operator less um this is

also not correct and because comparator of the same element can be true and this

is exactly the thing that I showed you in godbolt where I can give you 31

elements and your sorting would read out of balance

um the correct fix here is to swap first if with a second if then it's a correct

it's a correct comparison um another another bug um

another family of bugs was uh people wanted people treated ties differently

from how they compared it so in this case um like they computed something

from the element some difference between days and then they returned absolute

value like with this differences and um you can construct example which

fail such sort which is also not great um assume you're Google Drive and you

comput something should be uh what should be deleted first and you delete

in incorrect order which is pro that's probably not

great um another another family of bugs was that like I don't know they were

comparators with thousands lines of code and they were writing and writing and

writing this comparisons and they were good good and good and in the end they

just okay just return true no unfortunately that's not also correct

you need to return false when you have completely equal

elements um another family of bugs was people wanted to compare users wanted to

developers wanted to compare some elements and sometimes the comparison

just doesn't result in a correct I would say it cannot compare it because I mean

something has gone wrong I don't know the networking issue or something like

that and what they decided to go okay decided to do is oh if we cannot compare

anything with just R and false unfortunately that's also that's also

exploitable that doesn't work and you need other mechanisms to kind of

understand um what what can go wrong maybe throw an exception from here maybe

do something else um another bug which is um might be

less obvious to people uh that sometimes you want to okay uh I want to sort some

floats but I want to like some some floats should be equal I mean I don't

know I did some ranking and if the the abson absolon comparison results and

true these are probably like they have the same ranking so I would just return

by ad however this is not a correct sorting because in this almost equals

they do compare floating numbers by some Epsilon and a can be equal to a plus

Epsilon um a plus Epsilon can be equals can can can be equal to a plus 2 Epsilon

but a is not equal to a plus 2 Epsilon because 2 Epsilon might be more than

they they compare their floats and this results just in Incorrect sortings and

unfortunately transitivity of incomparability is lost here and um this

is actually probably the the bug that we Face the most so a lot of uh a lot of

developers thought that it's okay to I mean it's generally good rule to compare

floats by Epsilon so you don't want to say okay this less than this or just

looking at they like matisa and exponent however yeah in sorting it might result

in something bad um another another issue which which

probably I don't know happens from time to time on heren news or whatever um if

you sort the vector of floats it can be undefined Behavior why um floating

numbers can be none not a number U not a number uh numbers by I

e754 this is a very scary name for me but doesn't matter uh it always returns

false so X is less than none returns false none is less than x returns false

by uh incomparability it means that X is equal to none and it means that every

element is equal to none but not all elements are equal among each other so

it means that if you have nuns in your container which you want to sort

unfortunately it might result in incorrect

sorting um this is a very funny bug this is a very another funny bug so we found

that um so on the left uh you have a vector of floats and you want to sort um

from begin to end but with a greater and you may debug integers and on the right

I put w-w everything so give me all warnings

that you have and doesn't give you any warning so you want to sort Vector

floats with a greater comparator uh however you might get you might misspell

float you might forget this is a vector of float so actually you might want to

change the type of vectoral float and nothing will give you a warning so even

though C++ is a type safe language sometimes I wonder how such things even

happen um a lot of bugs in open source as well so this is some Android um yeah

this is the bug of the case where if all elements are equal to CFA regag then

this might result in in a failure uh BFB Trace same bug 5,000 stars um another

repository which just returns something less than equal for some reason I don't

know uh chromium web RTC also very interesting bug um you can exploit it

but it already has been fixed so um yeah so as you can see I can

like I could like give more examples here but um I don't have much time and

let me talk about interesting stuff which you might never consider what you

can do with your sortings um this is a snippet of code which might be hard to

read and but you don't need to to understand what gas means or candidate

or num solid whatever all you can see that here that these elements are

changed along the way of like calling anth element or STD sort so this these

elements this element is changed sometimes and then you kind of order

rate so you can construct I don't know comparator which self

modifies some array and return sorting by giving you I don't know it just um

let me put it this way the algorithm gives you some elements which it wants

to sort and you can trick this algorithm for example in the worst case scenario

and this is the snippet where you can trick I don't know SD sort or nth

element to to be the worst case scenario always um this is somewhat artificial I

would say um you don't you probably never write comparators which modify the

array like along the along the Sorting but it's valid it's it's okay it

satisfies strict quick ordering and however sdn element will always result

in quadratic worth case for that and uh this there is an issue in GitHub uh

which provided a little bit of like talks and comments like like is it

actually valid do we need to fix anything

and nth element requires linear complexity on average however we have a

comparator where every time you call the sord on any range like it results in

quadratic Behavior because you kind of can trick an element to be quadratic

that's also like a funny book it was discovered

um but like long long time ago pretty much in the case how you can you find um

problems and how can you find problems in your quick

sorts all right um some sort in trivia uh can we sort 1.02 1.01

1.0 probably yes yeah I I don't see any problems all right can we sort 3.0 none

and 4.0 I explained no because um none thinks that it's equal to 3.0 and 4.0

and 3.0 is not equal to 4.0 can we sort none 3.0 and

none surprisingly the answer is yes because all elements are equal and there

is no I don't know 4.0 which is equal to 3.0 so uh even though your vector might

have nuns you might get lucky and you can't sort this if you have like nuns

and some other element which um which is equal to all other elements so for

example if you add a couple 3.0 here that also going to be

fine all right um last last section what are we changing our sorting algorithm

with so we made cleanups that result all golden tests fixed all bugs most bugs on

strict week ordering so now we are ready to change something uh we decided to

choose a mix of block quicksort and PN def pattern defeating quicksort um let

me briefly talk about what what what they do um the necessity actually comes

from the partitioning algorithm and partitioning algorithm

has these two while Loops these two while Loops are data dependent and

Branch pred and Branch predictor like doesn't do a very good job on finding um

like on predicting these branches because I mean you have random data and

it cannot predict anything so we saw a lot of branch mispredictions and sorting

so what we decided to do we decided to use a technique called block quick sort

so what we do we just store the results in a bit set pretty pry much in an

integer of 64 bits uh from left and right uh then what we do we just find

with uh count trailing zeros uh call uh like the bits uh that um need to be

swapped we lower the bit lower um these bits from one to zero um these are just

one instructions on a x86 and just swap these elements this help to reduce

branches a lot and another change that we decided to do which is more relevant

to PN defeating quick sort is soal Takis nther so when you want to find this any

element which partitions quick sort it's called pivot um you take nine elements

find the median in every three and find the median out of these three medians

that's called n Takis nther and uh this proved to be really good for Ren data

and like reduces the number of comparisons

overall um yeah this is godbolt which gives us that yeah block quick sort is

really nice to vectorized uh when you compare a lot of

integers um so which results do we have on integers on random integers we have

up to 50% results uh on some other patterns like ascending descending we

regress but just by a bit uh but still like when you are like you have an

ascending array if you sorted like like NC per element or 1.03 NC per element

that's probably doesn't matter much um on more interesting benchmarks like

string strings because strings uh cannot be vectorized properly and you have

pointer in directions from array to Strings and so on we have around like

10% and um all right um some thoughts so we're heading towards the end just a

couple minutes more um C++ is a weird language in a way that our default

sorting is unstable other languages like python rest and Java have stable default

sorts so if you if you call do sort in python or rust is just a stable sort it

doesn't have any problems with golden tests and it has like specific like sort

and stable calls uh for C++ it's the other way around it's CD sort as a St

and you need to call stable SW kind of to stabilize

things um another interesting question why do

you even Implement some algorithms that can read out of bound why shouldn't we

just be a little bit safer U from the standard Library standpoint um good

question probably we should um um fun fact um I heard a lot

that Rangers sort and overall Rangers wanted to have one argument a lot uh

like just container argument um that because because if you have two

arguments which take the beginning and the end of the range they might be from

different containers and there is no warning against that so I we found

literally zero bugs where we had different um iterators in a standard

like two argument standard sort maybe we're just lucky maybe it's not but I

don't think that this argument for migrating from STD sword SD range of

Sword reduces a lot of bucks um C++ 20 allows us to be allows

us to make fewer bugs we return from parator spaceship the ordering that we

need um it works much better you don't need to write your own comparators which

might be buggy so what are what what what are the

results of all that so overall the whole effort we call a comparator sanitizer so

we find golden tests we find broken comparators and it prevents from out of

bound reads prevents prevents exploits I believe it it deserves a name of

sanitizer uh we saw in production around 78% Improvement not 50 as we've seen for

random sorts uh for random integer sorts well that's we expected a little bit

more okay I'll take 78% Improvement in sorting and reduced a lot of Branch

mispredictions we gave this capacity U Branch predictor capacity to something

else uh fixed literally thousands of bugs for the HMS law um like my take of

all that that even simplest things are hard to do right when I started this

project two year ago I thought all right I mean I I would finish that in six

months no it took around two and a half years definitely not fulltime I was like

on and off of this project this was fun to do okay what can you do um use these

command line Flags if you use LEP C+ Le leap C++ uh because um they enable you

um randomization checks and strict week

ordering checks

um um probably you can make randomization for other algorithms

probably you want to um make randomization for standard partition or

SD make hip um I was a little bit lazy to do that um like we probably can do

strict week order and checks for nth element I didn't implement it for nth

element because in the algorithm that I DES in the quadratic algorithm that I

described you need to sort the range probably you don't want sort the range

kind of to do anti element but that might be okay for the back

build what can you do else um remember y today talked about invalidation of

vector iterators well um I don't know there can be ideas which we which were

very successful for strict with ordering checks let's just in debug build

relocate Vector on first 100 pushbacks we'll find a lot of bugs with dangling

references dangling iterators uh probably it will help a a

lot um these are the links that you can read uh my blog where I described all

this effort lvm request for comments sort and change itself uh fabric review

in llvm um here are my details if you want

to get in touch please do um that's it I'm happy to take your

[Applause] questions

yes if you if you have questions please go to the

microphone uh thank you uh was very

interesting I believe that the reason you don't see an issue is greater and is

that leap C++ disables warnings in Stand Library hers yeah probably probably

you're right yeah yeah I know about that thank you oh I also have a suggestion

for that it's more of a comment but um instead of stood grer in use stood

greater uh that colon element type yeah yeah that also yeah that also

works but yeah finally enough second question

about this this weird bug okay thank you for the great talk uh you mentioned

finding a lot of bugs uh regarding in like improper use did you ever find any

attacks against Google Google code kind of exploiting this D or so or um okay

that's an interesting question even if I knew about some something I believe I

wouldn't be allowed to talk about them so but I don't think that I know about

anything thank have you looked at what happens if

a St map gets one of these bad compilat um I'll say just question

thanks um this is actually my next kind of plan for this adventure let let's try

not only to get to the algorithms but also like to um containers uh I believe

for standard map standard set you might get weird results because well these are

trees these are red black trees and if you mess up with red black trees you get

like you might get I don't know linear trees or something like that I might

suspect that if you if you supply some comparator which is not great it might

result in something in something very long so yeah I do think I do think you

can exploit that right now as well I guess there are no more questions

thank you so [Applause]

much