GCC Middle and Back End API Reference
|
Go to the source code of this file.
Data Structures | |
struct | target_gcse |
Variables | |
struct target_gcse | default_target_gcse |
struct target_gcse * | this_target_gcse |
struct target_gcse default_target_gcse |
@verbatim Partial redundancy elimination / Hoisting for RTL.
Copyright (C) 1997-2013 Free Software Foundation, Inc.
This file is part of GCC.
GCC is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3, or (at your option) any later version.
GCC is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with GCC; see the file COPYING3. If not see http://www.gnu.org/licenses/.
TODO - reordering of memory allocation and freeing to be more space efficient - calc rough register pressure information and use the info to drive all kinds of code motion (including code hoisting) in a unified way.
References searched while implementing this. Compilers Principles, Techniques and Tools Aho, Sethi, Ullman Addison-Wesley, 1988 Global Optimization by Suppression of Partial Redundancies E. Morel, C. Renvoise communications of the acm, Vol. 22, Num. 2, Feb. 1979 A Portable Machine-Independent Global Optimizer - Design and Measurements Frederick Chow Stanford Ph.D. thesis, Dec. 1983 A Fast Algorithm for Code Movement Optimization D.M. Dhamdhere SIGPLAN Notices, Vol. 23, Num. 10, Oct. 1988 A Solution to a Problem with Morel and Renvoise's Global Optimization by Suppression of Partial Redundancies K-H Drechsler, M.P. Stadel ACM TOPLAS, Vol. 10, Num. 4, Oct. 1988 Practical Adaptation of the Global Optimization Algorithm of Morel and Renvoise D.M. Dhamdhere ACM TOPLAS, Vol. 13, Num. 2. Apr. 1991 Efficiently Computing Static Single Assignment Form and the Control Dependence Graph R. Cytron, J. Ferrante, B.K. Rosen, M.N. Wegman, and F.K. Zadeck ACM TOPLAS, Vol. 13, Num. 4, Oct. 1991 Lazy Code Motion J. Knoop, O. Ruthing, B. Steffen ACM SIGPLAN Notices Vol. 27, Num. 7, Jul. 1992, '92 Conference on PLDI What's In a Region? Or Computing Control Dependence Regions in Near-Linear Time for Reducible Flow Control Thomas Ball ACM Letters on Programming Languages and Systems, Vol. 2, Num. 1-4, Mar-Dec 1993 An Efficient Representation for Sparse Sets Preston Briggs, Linda Torczon ACM Letters on Programming Languages and Systems, Vol. 2, Num. 1-4, Mar-Dec 1993 A Variation of Knoop, Ruthing, and Steffen's Lazy Code Motion K-H Drechsler, M.P. Stadel ACM SIGPLAN Notices, Vol. 28, Num. 5, May 1993 Partial Dead Code Elimination J. Knoop, O. Ruthing, B. Steffen ACM SIGPLAN Notices, Vol. 29, Num. 6, Jun. 1994 Effective Partial Redundancy Elimination P. Briggs, K.D. Cooper ACM SIGPLAN Notices, Vol. 29, Num. 6, Jun. 1994 The Program Structure Tree: Computing Control Regions in Linear Time R. Johnson, D. Pearson, K. Pingali ACM SIGPLAN Notices, Vol. 29, Num. 6, Jun. 1994 Optimal Code Motion: Theory and Practice J. Knoop, O. Ruthing, B. Steffen ACM TOPLAS, Vol. 16, Num. 4, Jul. 1994 The power of assignment motion J. Knoop, O. Ruthing, B. Steffen ACM SIGPLAN Notices Vol. 30, Num. 6, Jun. 1995, '95 Conference on PLDI Global code motion / global value numbering C. Click ACM SIGPLAN Notices Vol. 30, Num. 6, Jun. 1995, '95 Conference on PLDI Value Driven Redundancy Elimination L.T. Simpson Rice University Ph.D. thesis, Apr. 1996 Value Numbering L.T. Simpson Massively Scalar Compiler Project, Rice University, Sep. 1996 High Performance Compilers for Parallel Computing Michael Wolfe Addison-Wesley, 1996 Advanced Compiler Design and Implementation Steven Muchnick Morgan Kaufmann, 1997 Building an Optimizing Compiler Robert Morgan Digital Press, 1998 People wishing to speed up the code here should read: Elimination Algorithms for Data Flow Analysis B.G. Ryder, M.C. Paull ACM Computing Surveys, Vol. 18, Num. 3, Sep. 1986 How to Analyze Large Programs Efficiently and Informatively D.M. Dhamdhere, B.K. Rosen, F.K. Zadeck ACM SIGPLAN Notices Vol. 27, Num. 7, Jul. 1992, '92 Conference on PLDI People wishing to do something different can find various possibilities in the above papers and elsewhere.
We support GCSE via Partial Redundancy Elimination. PRE optimizations are a superset of those done by classic GCSE. Two passes of copy/constant propagation are done around PRE or hoisting because the first one enables more GCSE and the second one helps to clean up the copies that PRE and HOIST create. This is needed more for PRE than for HOIST because code hoisting will try to use an existing register containing the common subexpression rather than create a new one. This is harder to do for PRE because of the code motion (which HOIST doesn't do). Expressions we are interested in GCSE-ing are of the form (set (pseudo-reg) (expression)). Function want_to_gcse_p says what these are. In addition, expressions in REG_EQUAL notes are candidates for GCSE-ing. This allows PRE to hoist expressions that are expressed in multiple insns, such as complex address calculations (e.g. for PIC code, or loads with a high part and a low part). PRE handles moving invariant expressions out of loops (by treating them as partially redundant). ********************** We used to support multiple passes but there are diminishing returns in doing so. The first pass usually makes 90% of the changes that are doable. A second pass can make a few more changes made possible by the first pass. Experiments show any further passes don't make enough changes to justify the expense. A study of spec92 using an unlimited number of passes: [1 pass] = 1208 substitutions, [2] = 577, [3] = 202, [4] = 192, [5] = 83, [6] = 34, [7] = 17, [8] = 9, [9] = 4, [10] = 4, [11] = 2, [12] = 2, [13] = 1, [15] = 1, [16] = 2, [41] = 1 It was found doing copy propagation between each pass enables further substitutions. This study was done before expressions in REG_EQUAL notes were added as candidate expressions for optimization, and before the GIMPLE optimizers were added. Probably, multiple passes is even less efficient now than at the time when the study was conducted. PRE is quite expensive in complicated functions because the DFA can take a while to converge. Hence we only perform one pass. ********************** The steps for PRE are: 1) Build the hash table of expressions we wish to GCSE (expr_hash_table). 2) Perform the data flow analysis for PRE. 3) Delete the redundant instructions 4) Insert the required copies [if any] that make the partially redundant instructions fully redundant. 5) For other reaching expressions, insert an instruction to copy the value to a newly created pseudo that will reach the redundant instruction. The deletion is done first so that when we do insertions we know which pseudo reg to use. Various papers have argued that PRE DFA is expensive (O(n^2)) and others argue it is not. The number of iterations for the algorithm to converge is typically 2-4 so I don't view it as that expensive (relatively speaking). PRE GCSE depends heavily on the second CPROP pass to clean up the copies we create. To make an expression reach the place where it's redundant, the result of the expression is copied to a new register, and the redundant expression is deleted by replacing it with this new register. Classic GCSE doesn't have this problem as much as it computes the reaching defs of each register in each block and thus can try to use an existing register.
GCSE global vars.
struct target_gcse* this_target_gcse |