From 50da1f3eb9dea81a05b6e75a3a001802db0b20c7 Mon Sep 17 00:00:00 2001 From: David Malcolm Date: Wed, 13 May 2020 14:36:45 -0400 Subject: [PATCH 123/293] FIXME: add cover-letter.txt --- cover-letter.txt | 74 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 74 insertions(+) create mode 100644 cover-letter.txt diff --git a/cover-letter.txt b/cover-letter.txt new file mode 100644 index 00000000000..19c85309a90 --- /dev/null +++ b/cover-letter.txt @@ -0,0 +1,74 @@ +Subject: analyzer: rewrite of region and value-handling + +This patch reimplements how the analyzer tracks regions and values. + +The patch eliminates region_id and svalue_id in favor of simply +using pointers. I'd hoped that the ID classes would make it easier +to compare states, avoiding having to compare long hexadecimal addresses +in favor of small integers. Unfortunately it added lots of complexity, +with the need to remap IDs when comparing or purging states, and the +need to "canonicalize" when comparing states. + +Various "state explosion" bugs in the old implementation were due to +failures in canonicalization, where two states that ought to be equal +were non-equal due to differences in ID ordering. I spent a lot of +time trying to fix canonicalization bugs, and there always seemed to +be one more bug. By eliminating IDs in this new implementation, no +canonicalization or ID remapping should be needed, and a lot of tricky +code goes away. + +In the old implementation, each region_model had its own copies of +regions and svalues, so there was heap bloat and churn as lots of +little objects were cloned when copying program_state instances. In the +new implementation the regions and svalues are immutable and are shared +thoughout the analysis, rather than being per region_model. They are +owned by a manager class, and are effectively singletons. Regions can +now be compared by pointer rather than by comparing their fields. + +This is a huge simplification, and (I hope) will avoid lots +of heap churn as states are copied; all mutable state from regions and +svalues is now stored in the region_model. + +TODO: not yet sure if we need validation, but by eliminating region_id +and svalue_id, almost all of the old validation becomes redundant + +Region subclasses no longer represent internal structure, but instead +represent how the regions are reached. So e.g. a global "struct coord +c;" is now a decl_region, rather than a struct_region. + +Memory is now modeled via a mapping from keys to svalues, where the +keys are both concrete bit-offsets from the start of the "base region", +and "symbolic" keys (thus hopefully making unions, casts, etc easier to +deal with). So e.g. for assignments to the fields of a struct, it +records the mapping from bit-offsets of e.g. field to the values; if +that memory is cast to another type and written to, the appropriate +clobbering of the bound values can happen. + +The concept of what the "current stack" is moves from the regions to +being a field within the region_model2 ("m_current_frame"). + +TODO: + - sm-states + - symbolic bindings + - casts, unions, views + - default bindings + - constraints + - ensure no need for canonicalization + - uniq_manager improvements + - immutable map + - state purging + - leak detection + - state merging + - compound values + - known functions: + - calloc + - memcpy + - memset + - alias analysis + - generic data map? + - docs + - benchmarking/profiling + - should there still be a root_region? + - some selftests depend on bit sizes of types, hopefully via the TYPE_SIZE + macros (see e.g. INT_TYPE_SIZE in defaults.h). Remember to test on + some other targets. Similarly, big-endian vs little-endian -- 2.26.2