From 5bf498ac2f7ff6e4926a4c95ba2df873a511425d Mon Sep 17 00:00:00 2001
From: David Malcolm <dmalcolm@redhat.com>
Date: Tue, 22 May 2018 11:31:53 -0400
Subject: [PATCH 03/24] Add cover-letter.txt

---
 cover-letter.txt | 501 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 501 insertions(+)
 create mode 100644 cover-letter.txt

diff --git a/cover-letter.txt b/cover-letter.txt
new file mode 100644
index 0000000..bcc80ba
--- /dev/null
+++ b/cover-letter.txt
@@ -0,0 +1,501 @@
+Subject: RFC: Prototype of compiler-assisted performance analysis
+
+I want to provide more "actionable" information on how GCC optimizes
+code, both for us (GCC developers), and for advanced end-users.
+
+For us, I want to make it easier to get a sense of how an optimization
+can be improved (bug-fixing).
+For end-users, I want to make it easier to figure out which command-line
+flags to try, or how to rework their code.
+
+In particular, given a workload (or benchmark), how does the user figure
+out:
+  * "what did the optimizers do to the hottest code?"
+  * "what *didn't* they do, and why not?"
+  * "how can I make the code faster?" (either by editing the code, or the
+     gcc options)
+and as GCC developers:
+  * "how can we make GCC do a better job on this code?"
+
+
+The current situation
+=====================
+
+I believe we currently have four ways of emitting information on what the
+optimizers are doing:
+
+(a) -fopt-info and its variants (e.g. "-fopt-info-all").
+
+    Consider this "-fopt-info" code (from "vect_analyze_loop_form"):
+
+      if (dump_enabled_p ())
+        {
+           dump_printf_loc (MSG_NOTE, vect_location,
+                            "Symbolic number of iterations is ");
+           dump_generic_expr (MSG_NOTE, TDF_DETAILS, number_of_iterations);
+           dump_printf (MSG_NOTE, "\n");
+        }
+
+    This emits text to the destination(s) of "-fopt-info", e.g.:
+
+       foo.c:15:3: note: Symbolic number of iterations is (unsigned int) n_9(D)
+
+    With "-fopt-info-all" this would be emitted to stderr.
+    If "details" or "note" was added when enabling the dumpfile
+    (e.g. via "-fdump-tree-all-details"), it would also be added to the
+    dumpfile for the current pass.
+
+    Passes are tagged as being members of zero or more optimization groups
+    ("ipa", "loop", "inline", "omp", "vec"), and these can be used to filter
+    which messages are emitted.  For example, the "increase_alignment" IPA
+    pass is tagged with both "loop" and "vec".
+
+(b) other messages emitted to dumpfiles.  We have numerous other messages
+    that are written just to the dumpfiles, and don't go through -fopt-info.
+    Consider e.g. this from loop-unroll.c:unroll_loop_constant_iterations:
+
+      if (dump_file)
+        fprintf (dump_file,
+                 ";; Unrolled loop %d times, constant # of iterations %i insns\n",
+                 max_unroll, num_loop_insns (loop));
+
+(c) IR dumps emitted to dumpfiles.  execute_one_pass and
+    execute_one_ipa_transform_pass have this code:
+
+      if (dump_file)
+        do_per_function (execute_function_dump, pass);
+
+    which captures a textual dump of the current state of each function.
+
+(d) dot-format dumps emitted to the ".dot" dumpfiles (via the "-graph") suffix
+    e.g. "-fdump-tree-all-graph".
+
+
+Problems with the current situation
+===================================
+
+-fopt-info:
+
+* There's no integration with profile data: the report has a "wall of text"
+  feel, where every site in the code is treated as of having equal
+  importance.
+
+* -fopt-info emits lines of the form:
+
+       foo.c:15:3: note: Symbolic number of iterations is (unsigned int) n_9(D)
+
+  This is purely capturing the source location of the deepest point of the
+  inlining chain: if there are multiple places where a function has been
+  inlined, we'll get multiple reports, with no way to distinguish them.
+
+* If a program attempts to parse this data, it can get source location
+  plus a message, but there's no structure to the messages.  For example,
+  we can't extract the source location of the expressions that are referred
+  to in the messages.
+
+* There are various places where we emit MSG_NOTES that look like sections
+  and subsections, e.g.:
+
+    if (dump_enabled_p ())
+     dump_printf_loc (MSG_NOTE, vect_location,
+                      "=== vect_analyze_data_ref_dependences ===\n");
+
+  This appears to be intended to express the start of a hierarchical
+  section within the dump, but there's nothing to express the end of
+  the section, and they're current printed in a flat way, again
+  contributing to a "wall of text" feel.  In my prototype I've
+  experimented with capturing the nesting, and highlighting it using
+  indentation.
+
+dumpfiles:
+
+* These are per-pass-instance, capturing every function at each pass.
+  There's no easy way to see how a particular function "evolved" through
+  the passes.
+
+* Some messages are written to dumpfiles, but don't go through -fopt-info.
+  The division seems somewhat arbitrary.
+
+All of these options are marked as "GCC Developer Options".
+
+
+Aside: LLVM's experience with remarks
+=====================================
+
+LLVM added "remarks" as a kind of diagnostic for tracking optimization
+information.  It has some similarities to our -fopt-info.
+
+However, they ran into issues with them.  There's an excellent talk
+given by Adam Nemet at the November 2016 LLVM Developers' Meeting
+on "Compiler-assisted Performance Analysis":
+  https://www.youtube.com/watch?v=qq0q1hfzidg
+  http://llvm.org/devmtg/2016-11/Slides/Nemet-Compiler-assistedPerformanceAnalysis.pdf
+  (part of http://www.llvm.org/devmtg/2016-11/ )
+
+He ran into the "wall of text" issue: too much data, where you
+can't see what's important.
+
+His fix was to capture the data in a file format (as "optimization records")
+and to write a HTML tool for visualizing them, showing the optimizations
+prioritized by code hotness.
+
+This seems like a great idea, so I've been experimenting with it for GCC.
+
+
+Prototype of a GCC-based solution
+=================================
+
+I've spent the last three weeks experimenting with this for GCC.
+
+I'm assuming the following constraints:
+
+* Preserve existing "-fopt-info" overall (whilst allowing for minor changes
+  to formatting *within* individual messages).
+
+* Preserve existing dumpfile output, as it's what GCC contributors are
+  used to.
+
+* No extra time/memory cost for the default "disabled" case.  I assume it's
+  acceptable to spend a little more time/memory when the user opts-in by
+  enabling this instrumentation.
+
+I experimented with a few approaches, and the current prototype is
+something of a hybrid.  The prototype aims:
+
+* to capture "optimization records", emitting them in a format
+  that retains all pertinent information that can be easily parsed.
+  This separates the data from its presentation.
+
+* to associate the optimization records with code hotness,
+  and to provide a report that prioritizes the records, starting with
+  the hottest code.
+
+* other metadata to capture include(s):
+
+  * which pass is emitting this record (and data about the pass)
+
+  * the inlining chain (rather than just the "deepest" source location)
+
+  * as well as source location within the user's code, I'm capturing
+    source location within *GCC* for where the optimization happens:
+    for any message you can immediately go to the implementation
+
+I've implemented an HTML report generator, inspired by Adam Nemet's LLVM
+work, using the existing places where we're emitting "-fopt-info"
+information.
+
+Differences from Adam Nemet's work:
+
+* I've added hierarchical records, so that there can be a nesting
+  structure of optimization notes (otherwise there's still too much
+  to decipher).
+
+* capture of GCC source location
+
+* LLVM is using YAML for some reason; I used JSON.  Given that I'm
+  capturing some different things, I didn't attempt to use the same
+  file format as LLVM.
+
+Example 1
+*********
+
+Consider this "-fopt-info" code (from "vect_analyze_loop_form")::
+
+  if (dump_enabled_p ())
+    {
+       dump_printf_loc (MSG_NOTE, vect_location,
+                        "Symbolic number of iterations is ");
+       dump_generic_expr (MSG_NOTE, TDF_DETAILS, number_of_iterations);
+       dump_printf (MSG_NOTE, "\n");
+    }
+
+This purely emits text to any destination(s) of "-fopt-info", e.g.:
+
+  foo.c:15:3: note: Symbolic number of iterations is (unsigned int) n_9(D)
+
+e.g. to stderr, and/or the dumpfile for the current pass.
+
+This can be rewritten for optimization records as:
+
+  if (optinfo_enabled_p ())
+    {
+      OPTINFO_VECT_NOTE
+        << "symbolic number of iterations is "
+        << details (number_of_iterations);
+    }
+
+It can then be emitted in up to three ways:
+
+(a) as before via "-fopt-info"::
+
+    foo.c:15:3: note: symbolic number of iterations is (unsigned int) n_9(D)
+
+  (albeit with capitalization changes), to the "-fopt-info" destination(s).
+
+(b) It can be emitted as a "remark" through the diagnostics subsystem,
+    showing the extra metadata::
+
+      foo.c:15:3: remark: symbolic number of iterations is ‘(unsigned int)
+      n_9(D)’ [pass=vect] [count(guessed_local)=955630224]
+
+    getting the standard diagnostics benefits:
+
+    * announcing "In function 'blah':" etc when this changes.
+
+    * printing the corresponding source lines (unless
+      "-fno-diagnostics-show-caret"), and
+
+    * colorizing the various parts of the message if stderr is at a tty.
+
+(c) It is emitted in JSON form through "-fsave-optimization-record":
+
+      {
+          "kind": "note",
+          "count": {
+              "quality": "guessed_local",
+              "value": 9.5563e+08
+          },
+          "location": {
+              "line": 15,
+              "file": "foo.c",
+              "column": 3
+          },
+          "pass": "vect",
+          "message": [
+              "symbolic number of iterations is ",
+              {
+                  "expr": "(unsigned int) n_8(D)"
+              }
+          ],
+          "impl_location": {
+              "line": 1674,
+              "file": "../../src/gcc/tree-vect-loop.c",
+              "function": "vect_analyze_loop_form"
+          },
+          "function": "compute_sum"
+      },
+
+    Note how this captures metadata such as profile count, function name,
+    pass, etc.
+
+I've not been a fan of the "<<" C++ idiom, but it seems to
+work well here for capturing the data we need (but do we need to
+internationalize the messages?)
+
+Example 2
+*********
+
+Consider this "-fopt-info" code (from "vect_create_data_ref_ptr"):
+
+  if (dump_enabled_p ())
+     {
+       tree dr_base_type = TREE_TYPE (DR_BASE_OBJECT (dr));
+       dump_printf_loc (MSG_NOTE, vect_location,
+                        "create %s-pointer variable to type: ",
+                        get_tree_code_name (TREE_CODE (aggr_type)));
+       dump_generic_expr (MSG_NOTE, TDF_SLIM, aggr_type);
+       if (TREE_CODE (dr_base_type) == ARRAY_TYPE)
+         dump_printf (MSG_NOTE, "  vectorizing an array ref: ");
+       else if (TREE_CODE (dr_base_type) == VECTOR_TYPE)
+         dump_printf (MSG_NOTE, "  vectorizing a vector ref: ");
+       else if (TREE_CODE (dr_base_type) == RECORD_TYPE)
+         dump_printf (MSG_NOTE, "  vectorizing a record based array ref: ");
+       else
+         dump_printf (MSG_NOTE, "  vectorizing a pointer ref: ");
+       dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_BASE_OBJECT (dr));
+       dump_printf (MSG_NOTE, "\n");
+     }
+
+where the information is built up piecewise, with conditional logic.
+
+This purely emits text to any destination(s) of "-fopt-info", e.g.:
+
+    foo.c:12:5: note: create vector_type-pointer variable to type:
+    vector(4) int  vectorizing a pointer ref: *arr_9(D)
+
+e.g. to stderr, and/or the dumpfile for the current pass.
+
+This can be rewritten by exposing the "pending_optinfo":
+
+  if (optinfo_enabled_p ())
+    {
+      tree dr_base_type = TREE_TYPE (DR_BASE_OBJECT (dr));
+      pending_optinfo info = OPTINFO_VECT_NOTE
+       << optinfo_printf ("create %s-pointer variable to type: ",
+                          get_tree_code_name (TREE_CODE (aggr_type)))
+       << slim (aggr_type);
+      if (TREE_CODE (dr_base_type) == ARRAY_TYPE)
+       info << "  vectorizing an array ref: ";
+      else if (TREE_CODE (dr_base_type) == VECTOR_TYPE)
+       info << "  vectorizing a vector ref: ";
+      else if (TREE_CODE (dr_base_type) == RECORD_TYPE)
+       info << "  vectorizing a record based array ref: ";
+      else
+       info << "  vectorizing a pointer ref: ";
+      info << slim (DR_BASE_OBJECT (dr));
+    }
+
+The "pending_optinfo" is emitted when it goes out of scope.  As before, it
+can be emitted in up to three ways:
+
+(a) as before via "-fopt-info":
+
+    foo.c:12:5: note: create vector_type-pointer variable to type:
+    vector(4) int  vectorizing a pointer ref: *arr_9(D)
+
+  (albeit with capitalization changes), to the "-fopt-info" destination(s).
+
+(b) It can be emitted as a "remark" through the diagnostics subsystem:
+
+    foo.c:15:3: remark:   create vector_type-pointer variable to type:
+    ‘vector(4) int’  vectorizing a pointer ref: ‘*arr_9(D)’ [pass=vect]
+    [count(guessed_local)=860067200]
+
+(c) It is emitted in JSON form through "-fsave-optimization-record":
+
+      {
+          "kind": "note",
+          "count": {
+              "quality": "guessed_local",
+              "value": 8.60067e+08
+          },
+          "location": {
+              "line": 15,
+              "file": "foo.c",
+              "column": 3
+          },
+          "pass": "vect",
+          "message": [
+              "create vector_type-pointer variable to type: ",
+              {
+                  "expr": "vector(4) int"
+              },
+              "  vectorizing a pointer ref: ",
+              {
+                  "location": {
+                      "line": 16,
+                      "file": "foo.c",
+                      "column": 5
+                  },
+                  "expr": "*arr_9(D)"
+              }
+          ],
+          "impl_location": {
+              "line": 4702,
+              "file": "../../src/gcc/tree-vect-data-refs.c",
+              "function": "vect_create_data_ref_ptr"
+          },
+          "function": "compute_sum"
+      },
+
+    Note how the message isn't just text: here it has captured the
+    source location of the pointer ref.
+
+I've written an HTML generator (in Python 3)
+ https://github.com/davidmalcolm/gcc-opt-viewer
+which takes a directory full of these json files and emits a directory
+of HTML files, with an index.html showing the optimizations, prioritized
+from hottest code downwards.
+
+An example of HTML output can be seen here:
+
+* https://dmalcolm.fedorapeople.org/gcc/2018-05-18/pgo-demo-test/pgo-demo-test/
+
+based on this source:
+
+* https://github.com/davidmalcolm/pgo-demo
+
+The HTML shows the optimizations inline next to the source code it affects.
+It uses a little JavaScript to add expand/collapse toggles when the amount
+of text for an optimization gets too long.  You can see the hierarchical
+structure of the notes via indentation (e.g. for the vectorization example).
+An alternative presentation could add an explicit tree structure with
+more expand/collapse toggles - this is just a prototype.
+
+I've ported many messages from the existing API to the "optinfo" API, but
+I deliberately didn't make significant changes to messages.  But there's
+probably scope for improvements here.  For example, in the
+"presentation-not-inlined.c" example at the link above, the vect pass emits:
+
+   === analyzing loop ===
+    === analyze_loop_nest ===
+      === vect_analyze_loop_form ===
+        === get_loop_niters ===
+      symbolic number of iterations is (unsigned int) n_9(D)
+      not vectorized: loop contains function calls or data references that cannot be analyzed
+
+which is useful, but more useful would be to know *which* function calls
+or data references couldn't be analyzed.  So that would be something
+to tackle in followups (this post is more about discussing the overall
+direction).
+
+Caveats
+=======
+
+This is still just a rough prototype (I'm looking for feedback, especially
+from loop, vectorization, and IPA maintainers).
+
+Some known issues:
+
+* I haven't yet attempted to bootstrap this; no ChangeLog yet
+
+* there are lots of FIXMEs
+
+* the JSON format ought to be versioned
+
+* "remarks" are a bit half-baked in this patch kit, it's just on/off.
+  It probably should support the optimization group tags from
+  -fopt-info.  LLVM has a -f[no-]diagnostics-show-hotness; etc.
+
+* I'm using a little C++11 (std::shared_ptr), which would need to be
+  ported to C++98.
+
+Known unknowns
+==============
+
+* Should the viewer generate static HTML, or serve dynamic HTML?
+  When I tried this on non-trivial examples, the index.html
+  became unwieldy, so pagination would have helped a lot.  That said,
+  there could be multiple different viewers, a benefit of separating
+  the data out from its presentation.
+
+* Should there be a way to pre-filter the optimization records before
+  they're written out to JSON?  Or is it OK to dump everything, and
+  make filtering be the viewer's responsibility?  (currently the patch
+  kit dumps everything)
+
+* Should the JSON format interact with the other uses of dumpfiles
+  (both free-form messages, and capturing the IR)?  This would allow
+  a viewer e.g. to show each state of a given function as it goes
+  through the optimizers.  I've been experimenting with this, but it's
+  not in the patch kit as is.
+
+* Formatting convention mismatches:
+
+  * diagnostics have lowercase start, and no trailing period
+
+  * -fopt-info dumps have uppercase start, and often have a trailing period
+
+    * but some of them have multiple sentences
+
+* Is there a natural tension here between logs (which are timed-ordered
+  events) vs source markup (source-ordered)?  Consider e.g. a
+  collection of related messages from one subroutine.
+
+* Should optimization info be internationalized/localized? Currently
+  "-fopt-info" isn't.  I say let's *not* i18n it.
+
+* My prototype respects TDF_DETAILS for the existing "-fopt-info" output,
+  but given that it appears to have been applied haphazardly, and that we
+  can filter in other ways, my prototype ignores it for the other output
+  formats.  It could be added.
+
+* Which JSON library to use for output?  Note that we only need output, so
+  that for the final version we can directly write the JSON, rather than the
+  current approach which, inefficiently, requires first building a DOM-like
+  tree in memory.
+
+Thanks for reading this far.
+
+Thoughts?
+Dave
-- 
1.8.5.3