Internship report: Catching reference stability bugs with Infer
From September to December, I’ve interned at Meta (again) in London to work on Infer, a state-of-the-art static analyzer. My mission was to create a new issue type to detect C++ reference stability bugs.
What is reference stability?
The C++ standard maps, std::map
and std::unordered_map
, do provide reference stability: when the map is resized,
existing references to keys and values are not invalidated. However, these maps may not be the most efficient:
std::map
is a tree-based map and is a bad choice in most scenarios. Insertion, copy, interation, and destruction are all slow. Sadly, it was the only C++ standard map for a while and is the default in many projects.std::unordered_map
is much better, but can be cache-unfriendly.
At Meta, using the family of F14 maps in Folly by
default is recommended. These are more performant than std::unordered_map
in the general case, with one caveat: they
do not provide reference stability. For example, the following code is not safe, since any possible rehash (here:
emplace
insertion) can invalidate all existing references.
const auto& valueRef = map.at(13);
map.emplace(17, 71); // This can cause a rehash.
const auto valueCopy = valueRef;
Infer to the rescue
I added a PULSE_REFERENCE_STABILITY
issue
type to Pulse, a memory and lifetime analysis in Infer. This issue is raised when a memory location previously marked as
invalidated by a map rehash is accessed.
// The code.
map.emplace(17, 71);
// SIL, Infer's intermediate representation (simplified).
_fun_folly::f14::detail::F14BasicMap::emplace(
&map:folly::F14FastMap&,
&0$?%__sil_tmpSIL_materialize_temp__n$2:int&,
&0$?%__sil_tmpSIL_materialize_temp__n$3:int&);
// Underlying model state after the operation (simplified).
{ roots={ &valueRef=v60,
&map=v16 };
mem ={ v16 -> { __infer_map_pair -> v68, * -> v55 },
v58 -> { second -> v59, first -> v65 },
v60 -> { * -> v59 },
v68 -> { second -> v67, first -> v66 } };
attrs={ v58 -> { Invalid CppMap(was potentially invalidated by `folly::F14FastMap::emplace`) },
v59 -> { Invalid CppMap(was potentially invalidated by `folly::F14FastMap::emplace`) },
v65 -> { Invalid CppMap(was potentially invalidated by `folly::F14FastMap::emplace`) } };}
The model is simple: maps are represented by a single key-value pair. These can represent references to keys and values, and the pair itself can be used to represent an iterator. Whenever an operation that could resize the map occurs, the pair and both key and value variables are invalidated and reset. If the code then tries to access those values, Infer will raise an issue and report at which location the value was invalidated.
Fixing a few existing cases
After a few improvements to reduce the number of false positives, namely tracking double lookups and fixing the instruction sequencing of assignment operators, this issue was enabled company-wide. I was able to fix three existing issues thanks to the Infer continuous scan of parts of the codebase.
Infer ❤️
If any Infer folks are reading this, thank you so much for the great experience, I had a great time and hope to be working together again in the future ❤️.