From September to December, I’ve interned at Meta (again) in London to work on Infer, a state-of-the-art static analyzer. My mission was to create a new issue type to detect C++ reference stability bugs.

What is reference stability?

The C++ standard maps, std::map and std::unordered_map, do provide reference stability: when the map is resized, existing references to keys and values are not invalidated. However, these maps may not be the most efficient:

  • std::map is a tree-based map and is a bad choice in most scenarios. Insertion, copy, interation, and destruction are all slow. Sadly, it was the only C++ standard map for a while and is the default in many projects.
  • std::unordered_map is much better, but can be cache-unfriendly.

At Meta, using the family of F14 maps in Folly by default is recommended. These are more performant than std::unordered_map in the general case, with one caveat: they do not provide reference stability. For example, the following code is not safe, since any possible rehash (here: emplace insertion) can invalidate all existing references.

const auto& valueRef = map.at(13);
map.emplace(17, 71); // This can cause a rehash.
const auto valueCopy = valueRef;

Infer to the rescue

I added a PULSE_REFERENCE_STABILITY issue type to Pulse, a memory and lifetime analysis in Infer. This issue is raised when a memory location previously marked as invalidated by a map rehash is accessed.

// The code.
map.emplace(17, 71);

// SIL, Infer's intermediate representation (simplified).
_fun_folly::f14::detail::F14BasicMap::emplace(
    &map:folly::F14FastMap&,
    &0$?%__sil_tmpSIL_materialize_temp__n$2:int&,
    &0$?%__sil_tmpSIL_materialize_temp__n$3:int&);

// Underlying model state after the operation (simplified).
{ roots={ &valueRef=v60,
          &map=v16 };
       mem  ={ v16 -> { __infer_map_pair -> v68, * -> v55 },
               v58 -> { second -> v59, first -> v65 },
               v60 -> { * -> v59 },
               v68 -> { second -> v67, first -> v66 } };
       attrs={ v58 -> { Invalid CppMap(was potentially invalidated by `folly::F14FastMap::emplace`) },
               v59 -> { Invalid CppMap(was potentially invalidated by `folly::F14FastMap::emplace`) },
               v65 -> { Invalid CppMap(was potentially invalidated by `folly::F14FastMap::emplace`) } };}

The model is simple: maps are represented by a single key-value pair. These can represent references to keys and values, and the pair itself can be used to represent an iterator. Whenever an operation that could resize the map occurs, the pair and both key and value variables are invalidated and reset. If the code then tries to access those values, Infer will raise an issue and report at which location the value was invalidated.

screenshot of Infer catching a reference stability issue

Fixing a few existing cases

After a few improvements to reduce the number of false positives, namely tracking double lookups and fixing the instruction sequencing of assignment operators, this issue was enabled company-wide. I was able to fix three existing issues thanks to the Infer continuous scan of parts of the codebase.

Infer ❤️

If any Infer folks are reading this, thank you so much for the great experience, I had a great time and hope to be working together again in the future ❤️.