Hardware Models and Configurations¶

Each target machine types can have its own special options, starting with -m, to choose among various hardware models or configurationsfor example, 68010 vs 68020, floating coprocessor or none. A single installed version of the compiler can compile for any model or configuration, according to the options specified.

Some configurations of the compiler also support additional special options, usually for compatibility with other compilers on the same platform.

:: _aarch64-options:

AArch64 Options¶

These options are defined for AArch64 implementations:

-mabi=name¶

Generate code for the specified data model. Permissible values are ilp32 for SysV-like data model where int, long int and pointer are 32-bit, and lp64 for SysV-like data model where int is 32-bit, but long int and pointer are 64-bit.

The default depends on the specific target configuration. Note that the LP64 and ILP32 ABIs are not link-compatible; you must compile your entire program with the same ABI, and link with a compatible set of libraries.

-mbig-endian¶: Generate big-endian code. This is the default when GCC is configured for an aarch64_be-- target.

-mgeneral-regs-only¶: Generate code which uses only the general registers.

-mlittle-endian¶: Generate little-endian code. This is the default when GCC is configured for an aarch64-- but not an aarch64_be-- target.

-mcmodel=tiny¶: Generate code for the tiny code model. The program and its statically defined symbols must be within 1GB of each other. Pointers are 64 bits. Programs can be statically or dynamically linked. This model is not fully implemented and mostly treated as small.

-mcmodel=small¶: Generate code for the small code model. The program and its statically defined symbols must be within 4GB of each other. Pointers are 64 bits. Programs can be statically or dynamically linked. This is the default code model.

-mcmodel=large¶: Generate code for the large code model. This makes no assumptions about addresses and sizes of sections. Pointers are 64 bits. Programs can be statically linked only.

-mstrict-align¶: Do not assume that unaligned memory references are handled by the system.

-momit-leaf-frame-pointer, -mno-omit-leaf-frame-pointer¶: Omit or keep the frame pointer in leaf functions. The former behaviour is the default.

-mtls-dialect=desc¶: Use TLS descriptors as the thread-local storage mechanism for dynamic accesses of TLS variables. This is the default.

-mtls-dialect=traditional¶: Use traditional TLS as the thread-local storage mechanism for dynamic accesses of TLS variables.

-mfix-cortex-a53-835769, -mno-fix-cortex-a53-835769¶: Enable or disable the workaround for the ARM Cortex-A53 erratum number 835769. This involves inserting a NOP instruction between memory instructions and 64-bit integer multiply-accumulate instructions.

-mfix-cortex-a53-843419, -mno-fix-cortex-a53-843419¶: Enable or disable the workaround for the ARM Cortex-A53 erratum number 843419. This erratum workaround is made at link time and this will only pass the corresponding flag to the linker.

-march=name¶

Specify the name of the target architecture, optionally suffixed by one or more feature modifiers. This option has the form -march=``arch``{+[no]``feature`}*`, where the only permissible value for arch is armv8-a. The permissible values for feature are documented in the sub-section below. Additionally on native AArch64 GNU/Linux systems the value native is available. This option causes the compiler to pick the architecture of the host system. If the compiler is unable to recognize the architecture of the host system this option has no effect.

Where conflicting feature modifiers are specified, the right-most feature is used.

GCC uses this name to determine what kind of instructions it can emit when generating assembly code.

Where -march is specified without either of -mtune or -mcpu also being specified, the code is tuned to perform well across a range of target processors implementing the target architecture.

-mtune=name¶

Specify the name of the target processor for which GCC should tune the performance of the code. Permissible values for this option are: generic, cortex-a53, cortex-a57, cortex-a72, exynos-m1, thunderx, xgene1.

Additionally, this option can specify that GCC should tune the performance of the code for a big.LITTLE system. Permissible values for this option are: cortex-a57.cortex-a53, cortex-a72.cortex-a53.

Additionally on native AArch64 GNU/Linux systems the value native is available. This option causes the compiler to pick the architecture of and tune the performance of the code for the processor of the host system. If the compiler is unable to recognize the processor of the host system this option has no effect.

Where none of -mtune=, -mcpu= or -march= are specified, the code is tuned to perform well across a range of target processors.

This option cannot be suffixed by feature modifiers.

-mcpu=name¶

Specify the name of the target processor, optionally suffixed by one or more feature modifiers. This option has the form -mcpu=``cpu``{+[no]``feature`}*`, where the permissible values for cpu are the same as those available for -mtune. Additionally on native AArch64 GNU/Linux systems the value native is available. This option causes the compiler to tune the performance of the code for the processor of the host system. If the compiler is unable to recognize the processor of the host system this option has no effect.

The permissible values for feature are documented in the sub-section below.

Where conflicting feature modifiers are specified, the right-most feature is used.

GCC uses this name to determine what kind of instructions it can emit when generating assembly code (as if by -march) and to determine the target processor for which to tune for performance (as if by -mtune). Where this option is used in conjunction with -march or -mtune, those options take precedence over the appropriate part of this option.

-march and -mcpu Feature Modifiers -march feature modifiers -mcpu feature modifiers Feature modifiers used with -march and -mcpu can be one the following:

crc: Enable CRC extension.
crypto: Enable Crypto extension. This implies Advanced SIMD is enabled.
fp: Enable floating-point instructions.
simd: Enable Advanced SIMD instructions. This implies floating-point instructions are enabled. This is the default for all current possible values for options -march and -mcpu=.

:: _adapteva-epiphany-options:

Adapteva Epiphany Options¶

These -m options are defined for Adapteva Epiphany:

-mhalf-reg-file¶: Dont allocate any register in the range r32...``r63``. That allows code to run on hardware variants that lack these registers.

-mprefer-short-insn-regs¶: Preferrentially allocate registers that allow short instruction generation. This can result in increased instruction count, so this may either reduce or increase overall code size.

-mbranch-cost=num¶: Set the cost of branches to roughly num simple instructions. This cost is only a heuristic and is not guaranteed to produce consistent results across releases.

-mcmove¶: Enable the generation of conditional moves.

-mnops=num¶: Emit num NOPs before every other generated instruction.

-mno-soft-cmpsf¶: For single-precision floating-point comparisons, emit an fsub instruction and test the flags. This is faster than a software comparison, but can get incorrect results in the presence of NaNs, or when two different small numbers are compared such that their difference is calculated as zero. The default is -msoft-cmpsf, which uses slower, but IEEE-compliant, software comparisons.

-mstack-offset=num¶: Set the offset between the top of the stack and the stack pointer. E.g., a value of 8 means that the eight bytes in the range sp+0...sp+7 can be used by leaf functions without stack allocation. Values other than 8 or 16 are untested and unlikely to work. Note also that this option changes the ABI; compiling a program with a different stack offset than the libraries have been compiled with generally does not work. This option can be useful if you want to evaluate if a different stack offset would give you better code, but to actually use a different stack offset to build working programs, it is recommended to configure the toolchain with the appropriate --with-stack-offset=``num`` option.

-mno-round-nearest¶: Make the scheduler assume that the rounding mode has been set to truncating. The default is -mround-nearest.

-mlong-calls¶: If not otherwise specified by an attribute, assume all calls might be beyond the offset range of the b / bl instructions, and therefore load the function address into a register before performing a (otherwise direct) call. This is the default.

-mshort-calls, -short-calls¶: If not otherwise specified by an attribute, assume all direct calls are in the range of the b / bl instructions, so use these instructions for direct calls. The default is -mlong-calls.

-msmall16¶: Assume addresses can be loaded as 16-bit unsigned values. This does not apply to function addresses for which -mlong-calls semantics are in effect.

-mfp-mode=mode¶

Set the prevailing mode of the floating-point unit. This determines the floating-point mode that is provided and expected at function call and return time. Making this mode match the mode you predominantly need at function start can make your programs smaller and faster by avoiding unnecessary mode switches.

mode can be set to one the following values:

caller: Any mode at function entry is valid, and retained or restored when the function returns, and when it calls other functions. This mode is useful for compiling libraries or other compilation units you might want to incorporate into different programs with different prevailing FPU modes, and the convenience of being able to use a single object file outweighs the size and speed overhead for any extra mode switching that might be needed, compared with what would be needed with a more specific choice of prevailing FPU mode.
truncate: This is the mode used for floating-point calculations with truncating (i.e. round towards zero) rounding mode. That includes conversion from floating point to integer.
round-nearest: This is the mode used for floating-point calculations with round-to-nearest-or-even rounding mode.
int: This is the mode used to perform integer calculations in the FPU, e.g. integer multiply, or integer multiply-and-accumulate.

The default is -mfp-mode=caller

-mnosplit-lohi, -mno-postinc, -mno-postmodify¶: Code generation tweaks that disable, respectively, splitting of 32-bit loads, generation of post-increment addresses, and generation of post-modify addresses. The defaults are msplit-lohi, -mpost-inc, and -mpost-modify.

-mnovect-double, -mno-vect-double¶: Change the preferred SIMD mode to SImode. The default is -mvect-double, which uses DImode as preferred SIMD mode.

-max-vect-align=num¶: The maximum alignment for SIMD vector mode types. num may be 4 or 8. The default is 8. Note that this is an ABI change, even though many library function interfaces are unaffected if they dont use SIMD vector modes in places that affect size and/or alignment of relevant types.

-msplit-vecmove-early¶: Split vector moves into single word moves before reload. In theory this can give better register allocation, but so far the reverse seems to be generally the case.

-m1reg-reg, -m1reg-¶: Specify a register to hold the constant 1, which makes loading small negative constants and certain bitmasks faster. Allowable values for reg are r43 and r63, which specify use of that register as a fixed register, and none, which means that no register is used for this purpose. The default is -m1reg-none.

:: _arc-options:

ARC Options¶

The following options control the architecture variant for which code is being compiled:

-mbarrel-shifter¶: Generate instructions supported by barrel shifter. This is the default unless -mcpu=ARC601 is in effect.

-mcpu=cpu¶

Set architecture type, register usage, and instruction scheduling parameters for cpu. There are also shortcut alias options available for backward compatibility and convenience. Supported values for cpu are

ARC600: Compile for ARC600. Aliases: -mA6, -mARC600.

-mARC601¶: Compile for ARC601. Alias: -mARC601.

-mA7, -mARC700¶: Compile for ARC700. Aliases: -mA7, -mARC700. This is the default when configured with --with-cpu=arc700.

-mdpfp, -mdpfp-compact¶: FPX: Generate Double Precision FPX instructions, tuned for the compact implementation.

-mdpfp-fast¶: FPX: Generate Double Precision FPX instructions, tuned for the fast implementation.

-mno-dpfp-lrsr¶: Disable LR and SR instructions from using FPX extension aux registers.

-mea¶: Generate Extended arithmetic instructions. Currently only divaw, adds, subs, and sat16 are supported. This is always enabled for -mcpu=ARC700.

-mno-mpy¶: Do not generate mpy instructions for ARC700.

-mmul32x16¶: Generate 32x16 bit multiply and mac instructions.

-mmul64¶: Generate mul64 and mulu64 instructions. Only valid for -mcpu=ARC600.

-mnorm¶: Generate norm instruction. This is the default if -mcpu=ARC700 is in effect.

-mspfp, -mspfp-compact¶: FPX: Generate Single Precision FPX instructions, tuned for the compact implementation.

-mspfp-fast¶: FPX: Generate Single Precision FPX instructions, tuned for the fast implementation.

-msimd¶: Enable generation of ARC SIMD instructions via target-specific builtins. Only valid for -mcpu=ARC700.

-msoft-float¶: This option ignored; it is provided for compatibility purposes only. Software floating point code is emitted by default, and this default can overridden by FPX options; mspfp, mspfp-compact, or mspfp-fast for single precision, and mdpfp, mdpfp-compact, or mdpfp-fast for double precision.

-mswap¶: Generate swap instructions.

The following options are passed through to the assembler, and also define preprocessor macro symbols.

-mdsp-packa¶: Passed down to the assembler to enable the DSP Pack A extensions. Also sets the preprocessor symbol __Xdsp_packa.

-mdvbf¶: Passed down to the assembler to enable the dual viterbi butterfly extension. Also sets the preprocessor symbol __Xdvbf.

-mlock¶: Passed down to the assembler to enable the Locked Load/Store Conditional extension. Also sets the preprocessor symbol __Xlock.

-mmac-d16¶: Passed down to the assembler. Also sets the preprocessor symbol __Xxmac_d16.

-mmac-24¶: Passed down to the assembler. Also sets the preprocessor symbol __Xxmac_24.

-mrtsc¶: Passed down to the assembler to enable the 64-bit Time-Stamp Counter extension instruction. Also sets the preprocessor symbol __Xrtsc.

-mswape¶: Passed down to the assembler to enable the swap byte ordering extension instruction. Also sets the preprocessor symbol __Xswape.

-mtelephony¶: Passed down to the assembler to enable dual and single operand instructions for telephony. Also sets the preprocessor symbol __Xtelephony.

-mxy¶: Passed down to the assembler to enable the XY Memory extension. Also sets the preprocessor symbol __Xxy.

The following options control how the assembly code is annotated:

-misize¶: Annotate assembler instructions with estimated addresses.

-mannotate-align¶: Explain what alignment considerations lead to the decision to make an instruction short or long.

The following options are passed through to the linker:

-marclinux¶: Passed through to the linker, to specify use of the arclinux emulation. This option is enabled by default in tool chains built for arc-linux-uclibc and arceb-linux-uclibc targets when profiling is not requested.

-marclinux_prof¶: Passed through to the linker, to specify use of the arclinux_prof emulation. This option is enabled by default in tool chains built for arc-linux-uclibc and arceb-linux-uclibc targets when profiling is requested.

The following options control the semantics of generated code:

-mepilogue-cfi¶: Enable generation of call frame information for epilogues.

-mno-epilogue-cfi¶: Disable generation of call frame information for epilogues.

-mlong-calls¶: Generate call insns as register indirect calls, thus providing access to the full 32-bit address range.

-mmedium-calls¶: Dont use less than 25 bit addressing range for calls, which is the offset available for an unconditional branch-and-link instruction. Conditional execution of function calls is suppressed, to allow use of the 25-bit range, rather than the 21-bit range with conditional branch-and-link. This is the default for tool chains built for arc-linux-uclibc and arceb-linux-uclibc targets.

-mno-sdata¶: Do not generate sdata references. This is the default for tool chains built for arc-linux-uclibc and arceb-linux-uclibc targets.

-mucb-mcount¶: Instrument with mcount calls as used in UCB code. I.e. do the counting in the callee, not the caller. By default ARC instrumentation counts in the caller.

-mvolatile-cache¶: Use ordinarily cached memory accesses for volatile references. This is the default.

-mno-volatile-cache¶: Enable cache bypass for volatile references.

The following options fine tune code generation:

-malign-call¶: Do alignment optimizations for call instructions.

-mauto-modify-reg¶: Enable the use of pre/post modify with register displacement.

-mbbit-peephole¶: Enable bbit peephole2.

-mno-brcc¶: This option disables a target-specific pass in arc_reorg to generate BRcc instructions. It has no effect on BRcc generation driven by the combiner pass.

-mcase-vector-pcrel¶: Use pc-relative switch case tables - this enables case table shortening. This is the default for -Os.

-mcompact-casesi¶: Enable compact casesi pattern. This is the default for -Os.

-mno-cond-exec¶: Disable ARCompact specific pass to generate conditional execution instructions. Due to delay slot scheduling and interactions between operand numbers, literal sizes, instruction lengths, and the support for conditional execution, the target-independent pass to generate conditional execution is often lacking, so the ARC port has kept a special pass around that tries to find more conditional execution generating opportunities after register allocation, branch shortening, and delay slot scheduling have been done. This pass generally, but not always, improves performance and code size, at the cost of extra compilation time, which is why there is an option to switch it off. If you have a problem with call instructions exceeding their allowable offset range because they are conditionalized, you should consider using -mmedium-calls instead.

-mearly-cbranchsi¶: Enable pre-reload use of the cbranchsi pattern.

-mexpand-adddi¶: Expand adddi3 and subdi3 at rtl generation time into add.f, adc etc.

-mindexed-loads¶: Enable the use of indexed loads. This can be problematic because some optimizers then assume that indexed stores exist, which is not the case.

-mlra¶: Enable Local Register Allocation. This is still experimental for ARC, so by default the compiler uses standard reload (i.e. -mno-lra).

-mlra-priority-none¶: Dont indicate any priority for target registers.

-mlra-priority-compact¶: Indicate target register priority for r0..r3 / r12..r15.

-mlra-priority-noncompact¶: Reduce target regsiter priority for r0..r3 / r12..r15.

-mno-millicode¶: When optimizing for size (using -Os), prologues and epilogues that have to save or restore a large number of registers are often shortened by using call to a special function in libgcc; this is referred to as a millicode call. As these calls can pose performance issues, and/or cause linking issues when linking in a nonstandard way, this option is provided to turn off millicode call generation.

-mmixed-code¶: Tweak register allocation to help 16-bit instruction generation. This generally has the effect of decreasing the average instruction size while increasing the instruction count.

-mq-class¶: Enable q instruction alternatives. This is the default for -Os.

-mRcq¶: Enable Rcq constraint handling - most short code generation depends on this. This is the default.

-mRcw¶: Enable Rcw constraint handling - ccfsm condexec mostly depends on this. This is the default.

-msize-level=level¶

Fine-tune size optimization with regards to instruction lengths and alignment. The recognized values for level are:

0: No size optimization. This level is deprecated and treated like 1.
1: Short instructions are used opportunistically.
2: In addition, alignment of loops and of code after barriers are dropped.
3: In addition, optional data alignment is dropped, and the option Os is enabled.

This defaults to 3 when -Os is in effect. Otherwise, the behavior when this is not set is equivalent to level 1.

-mtune=cpu¶

Set instruction scheduling parameters for cpu, overriding any implied by -mcpu=.

Supported values for cpu are

ARC600: Tune for ARC600 cpu.
ARC601: Tune for ARC601 cpu.
ARC700: Tune for ARC700 cpu with standard multiplier block.
ARC700-xmac: Tune for ARC700 cpu with XMAC block.
ARC725D: Tune for ARC725D cpu.
ARC750D: Tune for ARC750D cpu.

-mmultcost=num¶: Cost to assume for a multiply instruction, with 4 being equal to a normal instruction.

-munalign-prob-threshold=probability¶: Set probability threshold for unaligning branches. When tuning for ARC700 and optimizing for speed, branches without filled delay slot are preferably emitted unaligned and long, unless profiling indicates that the probability for the branch to be taken is below probability. Cross-profiling. The default is (REG_BR_PROB_BASE/2), i.e. 5000.

The following options are maintained for backward compatibility, but are now deprecated and will be removed in a future release:

-margonaut¶: Obsolete FPX.

-mbig-endian, -EB¶: Compile code for big endian targets. Use of these options is now deprecated. Users wanting big-endian code, should use the arceb-elf32 and arceb-linux-uclibc targets when building the tool chain, for which big-endian is the default.

-mlittle-endian, -EL¶: Compile code for little endian targets. Use of these options is now deprecated. Users wanting little-endian code should use the arc-elf32 and arc-linux-uclibc targets when building the tool chain, for which little-endian is the default.

-mbarrel_shifter¶: Replaced by -mbarrel-shifter.

-mdpfp_compact¶: Replaced by -mdpfp-compact.

-mdpfp_fast¶: Replaced by -mdpfp-fast.

-mdsp_packa¶: Replaced by -mdsp-packa.

-mEA¶: Replaced by -mea.

-mmac_24¶: Replaced by -mmac-24.

-mmac_d16¶: Replaced by -mmac-d16.

-mspfp_compact¶: Replaced by -mspfp-compact.

-mspfp_fast¶: Replaced by -mspfp-fast.

-mtune=cpu¶: Values arc600, arc601, arc700 and arc700-xmac for cpu are replaced by ARC600, ARC601, ARC700 and ARC700-xmac respectively

-multcost=num¶: Replaced by -mmultcost.

:: _arm-options:

ARM Options¶

These -m options are defined for the ARM port:

-mabi=name¶: Generate code for the specified ABI. Permissible values are: apcs-gnu, atpcs, aapcs, aapcs-linux and iwmmxt.

-mapcs-frame¶: Generate a stack frame that is compliant with the ARM Procedure Call Standard for all functions, even if this is not strictly necessary for correct execution of the code. Specifying -fomit-frame-pointer with this option causes the stack frames not to be generated for leaf functions. The default is -mno-apcs-frame. This option is deprecated.

-mapcs¶

This is a synonym for -mapcs-frame and is deprecated.

@c not currently implemented @item -mapcs-stack-check @opindex mapcs-stack-check Generate code to check the amount of stack space available upon entry to every function (that actually uses some stack space). If there is insufficient space available then either the function @code{__rt_stkovf_split_small} or @code{__rt_stkovf_split_big} is called, depending upon the amount of stack space required. The runtime system is required to provide these functions. The default is @option{-mno-apcs-stack-check}, since this produces smaller code.

@c not currently implemented @item -mapcs-float @opindex mapcs-float Pass floating-point arguments using the floating-point registers. This is one of the variants of the APCS@. This option is recommended if the target hardware has a floating-point unit or if a lot of floating-point arithmetic is going to be performed by the code. The default is @option{-mno-apcs-float}, since the size of integer-only code is slightly increased if @option{-mapcs-float} is used.

@c not currently implemented @item -mapcs-reentrant @opindex mapcs-reentrant Generate reentrant, position-independent code. The default is @option{-mno-apcs-reentrant}.

-mthumb-interwork¶: Generate code that supports calling between the ARM and Thumb instruction sets. Without this option, on pre-v5 architectures, the two instruction sets cannot be reliably used inside one program. The default is -mno-thumb-interwork, since slightly larger code is generated when -mthumb-interwork is specified. In AAPCS configurations this option is meaningless.

-mno-sched-prolog¶: Prevent the reordering of instructions in the function prologue, or the merging of those instruction with the instructions in the functions body. This means that all functions start with a recognizable set of instructions (or in fact one of a choice from a small set of different function prologues), and this information can be used to locate the start of functions inside an executable piece of code. The default is -msched-prolog.

-mfloat-abi=name¶

Specifies which floating-point ABI to use. Permissible values are: soft, softfp and hard.

Specifying soft causes GCC to generate output containing library calls for floating-point operations. softfp allows the generation of code using hardware floating-point instructions, but still uses the soft-float calling conventions. hard allows generation of floating-point instructions and uses FPU-specific calling conventions.

The default depends on the specific target configuration. Note that the hard-float and soft-float ABIs are not link-compatible; you must compile your entire program with the same ABI, and link with a compatible set of libraries.

-mlittle-endian¶: Generate code for a processor running in little-endian mode. This is the default for all standard configurations.

-mbig-endian¶: Generate code for a processor running in big-endian mode; the default is to compile code for a little-endian processor.

-march=name¶

This specifies the name of the target ARM architecture. GCC uses this name to determine what kind of instructions it can emit when generating assembly code. This option can be used in conjunction with or instead of the -mcpu= option. Permissible names are: armv2, armv2a, armv3, armv3m, armv4, armv4t, armv5, armv5t, armv5e, armv5te, armv6, armv6j, armv6t2, armv6z, armv6zk, armv6-m, armv7, armv7-a, armv7-r, armv7-m, armv7e-m, armv7ve, armv8-a, armv8-a+crc, iwmmxt, iwmmxt2, ep9312.

-march=armv7ve is the armv7-a architecture with virtualization extensions.

-march=armv8-a+crc enables code generation for the ARMv8-A architecture together with the optional CRC32 extensions.

-march=native causes the compiler to auto-detect the architecture of the build computer. At present, this feature is only supported on GNU/Linux, and not all architectures are recognized. If the auto-detect is unsuccessful the option has no effect.

-mtune=name¶

This option specifies the name of the target ARM processor for which GCC should tune the performance of the code. For some ARM implementations better performance can be obtained by using this option. Permissible names are: arm2, arm250, arm3, arm6, arm60, arm600, arm610, arm620, arm7, arm7m, arm7d, arm7dm, arm7di, arm7dmi, arm70, arm700, arm700i, arm710, arm710c, arm7100, arm720, arm7500, arm7500fe, arm7tdmi, arm7tdmi-s, arm710t, arm720t, arm740t, strongarm, strongarm110, strongarm1100, strongarm1110, arm8, arm810, arm9, arm9e, arm920, arm920t, arm922t, arm946e-s, arm966e-s, arm968e-s, arm926ej-s, arm940t, arm9tdmi, arm10tdmi, arm1020t, arm1026ej-s, arm10e, arm1020e, arm1022e, arm1136j-s, arm1136jf-s, mpcore, mpcorenovfp, arm1156t2-s, arm1156t2f-s, arm1176jz-s, arm1176jzf-s, cortex-a5, cortex-a7, cortex-a8, cortex-a9, cortex-a12, cortex-a15, cortex-a53, cortex-a57, cortex-a72, cortex-r4, cortex-r4f, cortex-r5, cortex-r7, cortex-m7, cortex-m4, cortex-m3, cortex-m1, cortex-m0, cortex-m0plus, cortex-m1.small-multiply, cortex-m0.small-multiply, cortex-m0plus.small-multiply, exynos-m1, marvell-pj4, xscale, iwmmxt, iwmmxt2, ep9312, fa526, fa626, fa606te, fa626te, fmp626, fa726te, xgene1.

Additionally, this option can specify that GCC should tune the performance of the code for a big.LITTLE system. Permissible names are: cortex-a15.cortex-a7, cortex-a57.cortex-a53, cortex-a72.cortex-a53.

-mtune=generic-``arch`` specifies that GCC should tune the performance for a blend of processors within architecture arch. The aim is to generate code that run well on the current most popular processors, balancing between optimizations that benefit some CPUs in the range, and avoiding performance pitfalls of other CPUs. The effects of this option may change in future GCC versions as CPU models come and go.

-mtune=native causes the compiler to auto-detect the CPU of the build computer. At present, this feature is only supported on GNU/Linux, and not all architectures are recognized. If the auto-detect is unsuccessful the option has no effect.

-mcpu=name¶

This specifies the name of the target ARM processor. GCC uses this name to derive the name of the target ARM architecture (as if specified by -march) and the ARM processor type for which to tune for performance (as if specified by -mtune). Where this option is used in conjunction with -march or -mtune, those options take precedence over the appropriate part of this option.

Permissible names for this option are the same as those for -mtune.

-mcpu=generic-``arch`` is also permissible, and is equivalent to -march=``arch` -mtune=generic-arch`. See -mtune for more information.

-mcpu=native causes the compiler to auto-detect the CPU of the build computer. At present, this feature is only supported on GNU/Linux, and not all architectures are recognized. If the auto-detect is unsuccessful the option has no effect.

-mfpu=name¶

This specifies what floating-point hardware (or hardware emulation) is available on the target. Permissible names are: vfp, vfpv3, vfpv3-fp16, vfpv3-d16, vfpv3-d16-fp16, vfpv3xd, vfpv3xd-fp16, neon, neon-fp16, vfpv4, vfpv4-d16, fpv4-sp-d16, neon-vfpv4, fpv5-d16, fpv5-sp-d16, fp-armv8, neon-fp-armv8, and crypto-neon-fp-armv8.

If -msoft-float is specified this specifies the format of floating-point values.

If the selected floating-point hardware includes the NEON extension (e.g. -mfpu`=neon), note that floating-point operations are not generated by GCCs auto-vectorization pass unless :option:-funsafe-math-optimizations` is also specified. This is because NEON hardware does not fully implement the IEEE 754 standard for floating-point arithmetic (in particular denormal values are treated as zero), so the use of NEON instructions may lead to a loss of precision.

-mfp16-format=name¶: Specify the format of the __fp16 half-precision floating-point type. Permissible names are none, ieee, and alternative; the default is none, in which case the __fp16 type is not defined. Half-Precision, for more information.

-mstructure-size-boundary=n¶

The sizes of all structures and unions are rounded up to a multiple of the number of bits set by this option. Permissible values are 8, 32 and 64. The default value varies for different toolchains. For the COFF targeted toolchain the default value is 8. A value of 64 is only allowed if the underlying ABI supports it.

Specifying a larger number can produce faster, more efficient code, but can also increase the size of the program. Different values are potentially incompatible. Code compiled with one value cannot necessarily expect to work with code or libraries compiled with another value, if they exchange information using structures or unions.

-mabort-on-noreturn¶: Generate a call to the function abort at the end of a noreturn function. It is executed if the function tries to return.

-mlong-calls, -mno-long-calls¶

Tells the compiler to perform function calls by first loading the address of the function into a register and then performing a subroutine call on this register. This switch is needed if the target function lies outside of the 64-megabyte addressing range of the offset-based version of subroutine call instruction.

Even if this switch is enabled, not all function calls are turned into long calls. The heuristic is that static functions, functions that have the short_call attribute, functions that are inside the scope of a #pragma no_long_calls directive, and functions whose definitions have already been compiled within the current compilation unit are not turned into long calls. The exceptions to this rule are that weak function definitions, functions with the long_call attribute or the section attribute, and functions that are within the scope of a #pragma long_calls directive are always turned into long calls.

This feature is not enabled by default. Specifying -mno-long-calls restores the default behavior, as does placing the function calls within the scope of a #pragma long_calls_off directive. Note these switches have no effect on how the compiler generates code to handle function calls via function pointers.

-msingle-pic-base¶: Treat the register used for PIC addressing as read-only, rather than loading it in the prologue for each function. The runtime system is responsible for initializing this register with an appropriate value before execution begins.

-mpic-register=reg¶: Specify the register to be used for PIC addressing. For standard PIC base case, the default is any suitable register determined by compiler. For single PIC base case, the default is R9 if target is EABI based or stack-checking is enabled, otherwise the default is R10.

-mpic-data-is-text-relative¶: Assume that each data segments are relative to text segment at load time. Therefore, it permits addressing data using PC-relative operations. This option is on by default for targets other than VxWorks RTP.

-mpoke-function-name¶

Write the name of each function into the text section, directly preceding the function prologue. The generated code is similar to this:

t0
    .ascii "arm_poke_function_name", 0
    .align
t1
    .word 0xff000000 + (t1 - t0)
arm_poke_function_name
    mov     ip, sp
    stmfd   sp!, {fp, ip, lr, pc}
    sub     fp, ip, #4

When performing a stack backtrace, code can inspect the value of pc stored at fp + 0. If the trace function then looks at location pc - 12 and the top 8 bits are set, then we know that there is a function name embedded immediately preceding this location and has length ((pc[-3]) & 0xff000000).

-mthumb, -marm¶: Select between generating code that executes in ARM and Thumb states. The default for most configurations is to generate code that executes in ARM state, but the default can be changed by configuring GCC with the --with-mode=```state` configure option.

-mtpcs-frame¶: Generate a stack frame that is compliant with the Thumb Procedure Call Standard for all non-leaf functions. (A leaf function is one that does not call any other functions.) The default is -mno-tpcs-frame.

-mtpcs-leaf-frame¶: Generate a stack frame that is compliant with the Thumb Procedure Call Standard for all leaf functions. (A leaf function is one that does not call any other functions.) The default is -mno-apcs-leaf-frame.

-mcallee-super-interworking¶: Gives all externally visible functions in the file being compiled an ARM instruction set header which switches to Thumb mode before executing the rest of the function. This allows these functions to be called from non-interworking code. This option is not valid in AAPCS configurations because interworking is enabled by default.

-mcaller-super-interworking¶: Allows calls via function pointers (including virtual functions) to execute correctly regardless of whether the target code has been compiled for interworking or not. There is a small overhead in the cost of executing a function pointer if this option is enabled. This option is not valid in AAPCS configurations because interworking is enabled by default.

-mtp=name¶: Specify the access model for the thread local storage pointer. The valid models are soft, which generates calls to __aeabi_read_tp, cp15, which fetches the thread pointer from cp15 directly (supported in the arm6k architecture), and auto, which uses the best available method for the selected processor. The default setting is auto.

-mtls-dialect=dialect¶: Specify the dialect to use for accessing thread local storage. Two ``dialect``s are supportedgnu and gnu2. The gnu dialect selects the original GNU scheme for supporting local and global dynamic TLS models. The gnu2 dialect selects the GNU descriptor scheme, which provides better performance for shared libraries. The GNU descriptor scheme is compatible with the original scheme, but does require new assembler, linker and library support. Initial and local exec TLS models are unaffected by this option and always use the original scheme.

-mword-relocations¶: Only generate absolute relocations on word-sized values (i.e. R_ARM_ABS32). This is enabled by default on targets (uClinux, SymbianOS) where the runtime loader imposes this restriction, and when -fpic or -fPIC is specified.

-mfix-cortex-m3-ldrd¶: Some Cortex-M3 cores can cause data corruption when ldrd instructions with overlapping destination and base registers are used. This option avoids generating these instructions. This option is enabled by default when -mcpu=cortex-m3 is specified.

-munaligned-access, -mno-unaligned-access¶

Enables (or disables) reading and writing of 16- and 32- bit values from addresses that are not 16- or 32- bit aligned. By default unaligned access is disabled for all pre-ARMv6 and all ARMv6-M architectures, and enabled for all other architectures. If unaligned access is not enabled then words in packed data structures are accessed a byte at a time.

The ARM attribute Tag_CPU_unaligned_access is set in the generated object file to either true or false, depending upon the setting of this option. If unaligned access is enabled then the preprocessor symbol __ARM_FEATURE_UNALIGNED is also defined.

-mneon-for-64bits¶: Enables using Neon to handle scalar 64-bits operations. This is disabled by default since the cost of moving data from core registers to Neon is high.

-mslow-flash-data¶: Assume loading data from flash is slower than fetching instruction. Therefore literal load is minimized for better performance. This option is only supported when compiling for ARMv7 M-profile and off by default.

-masm-syntax-unified¶: Assume inline assembler is using unified asm syntax. The default is currently off which implies divided syntax. Currently this option is available only for Thumb1 and has no effect on ARM state and Thumb2. However, this may change in future releases of GCC. Divided syntax should be considered deprecated.

-mrestrict-it¶: Restricts generation of IT blocks to conform to the rules of ARMv8. IT blocks can only contain a single 16-bit instruction from a select set of instructions. This option is on by default for ARMv8 Thumb mode.

-mprint-tune-info¶: Print CPU tuning information as comment in assembler file. This is an option used only for regression testing of the compiler and not intended for ordinary use in compiling code. This option is disabled by default.

:: _avr-options:

AVR Options¶

These options are defined for AVR implementations:

-mmcu=mcu¶

Specify Atmel AVR instruction set architectures (ISA) or MCU type.

The default for this option isavr2.

GCC supports the following AVR devices and ISAs:

-maccumulate-args¶

Accumulate outgoing function arguments and acquire/release the needed stack space for outgoing function arguments once in function prologue/epilogue. Without this option, outgoing arguments are pushed before calling a function and popped afterwards.

Popping the arguments after the function call can be expensive on AVR so that accumulating the stack space might lead to smaller executables because arguments need not to be removed from the stack after such a function call.

This option can lead to reduced code size for functions that perform several calls to functions that get their arguments on the stack like calls to printf-like functions.

-mbranch-cost=cost¶: Set the branch costs for conditional branch instructions to cost. Reasonable values for cost are small, non-negative integers. The default branch cost is 0.

-mcall-prologues¶: Functions prologues/epilogues are expanded as calls to appropriate subroutines. Code size is smaller.

-mint8¶: Assume int to be 8-bit integer. This affects the sizes of all types: a char is 1 byte, an int is 1 byte, a long is 2 bytes, and long long is 4 bytes. Please note that this option does not conform to the C standards, but it results in smaller code size.

-mn-flash=num¶: Assume that the flash memory has a size of num times 64KiB.

-mno-interrupts¶: Generated code is not compatible with hardware interrupts. Code size is smaller.

-mrelax¶

Try to replace CALL resp. JMP instruction by the shorter RCALL resp. RJMP instruction if applicable. Setting -mrelax just adds the --mlink-relax option to the assemblers command line and the --relax option to the linkers command line.

Jump relaxing is performed by the linker because jump offsets are not known before code is located. Therefore, the assembler code generated by the compiler is the same, but the instructions in the executable may differ from instructions in the assembler code.

Relaxing must be turned on if linker stubs are needed, see the section on EIND and linker stubs below.

-mrmw¶: Assume that the device supports the Read-Modify-Write instructions XCH, LAC, LAS and LAT.

-msp8¶

Treat the stack pointer register as an 8-bit register, i.e. assume the high byte of the stack pointer is zero. In general, you dont need to set this option by hand.

This option is used internally by the compiler to select and build multilibs for architectures avr2 and avr25. These architectures mix devices with and without SPH. For any setting other than -mmcu=avr2 or -mmcu=avr25 the compiler driver adds or removes this option from the compiler propers command line, because the compiler then knows if the device or architecture has an 8-bit stack pointer and thus no SPH register or not.

-mstrict-X¶

Use address register X in a way proposed by the hardware. This means that X is only used in indirect, post-increment or pre-decrement addressing.

Without this option, the X register may be used in the same way as Y or Z which then is emulated by additional instructions. For example, loading a value with X+const addressing with a small non-negative const < 64 to a register Rn is performed as

adiw r26, const   ; X += const
ld   ``Rn``, X        ; ``Rn`` = *X
sbiw r26, const   ; X -= const

-mtiny-stack¶: Only change the lower 8bits of the stack pointer.

-nodevicelib¶: Dont link against AVR-LibCs device specific library libdev.a.

-Waddr-space-convert¶: Warn about conversions between address spaces in the case where the resulting address space is not contained in the incoming address space.

EIND and Devices with More Than 128 Ki Bytes of Flash EIND Pointers in the implementation are 16bits wide. The address of a function or label is represented as word address so that indirect jumps and calls can target any code address in the range of 64Ki words.

In order to facilitate indirect jump on devices with more than 128Ki bytes of program memory space, there is a special function register called EIND that serves as most significant part of the target address when EICALL or EIJMP instructions are used.

Indirect jumps and calls on these devices are handled as follows by the compiler and are subject to some limitations:

The compiler never sets EIND.
- The compiler uses EIND implicitely in EICALL/EIJMP
instructions or might read EIND directly in order to emulate an indirect call/jump by means of a RET instruction.
- The compiler assumes that EIND never changes during the startup
code or during the application. In particular, EIND is not saved/restored in function or interrupt service routine prologue/epilogue.
- For indirect calls to functions and computed goto, the linker
generates stubs. Stubs are jump pads sometimes also called trampolines. Thus, the indirect call/jump jumps to such a stub. The stub contains a direct jump to the desired address.
- Linker relaxation must be turned on so that the linker generates
the stubs correctly in all situations. See the compiler option -mrelax and the linker option --relax. There are corner cases where the linker is supposed to generate stubs but aborts without relaxation and without a helpful error message.
- The default linker script is arranged for code with EIND = 0.
If code is supposed to work for a setup with EIND != 0, a custom linker script has to be used in order to place the sections whose name start with .trampolines into the segment where EIND points to.
- The startup code from libgcc never sets EIND.
Notice that startup code is a blend of code from libgcc and AVR-LibC. For the impact of AVR-LibC on EIND, see the http://nongnu.org/avr-libc/user-manual/AVR-LibC user manual.
- It is legitimate for user-specific startup code to set up EIND
early, for example by means of initialization code located in section .init3. Such code runs prior to general startup code that initializes RAM and calls constructors, but after the bit of startup code from AVR-LibC that sets EIND to the segment where the vector table is located.
```
#include <avr/io.h>

static void
__attribute__((section(".init3"),naked,used,no_instrument_function))
init3_set_eind (void)
{
  __asm volatile ("ldi r24,pm_hh8(__trampolines_start)\n\t"
                  "out %i0,r24" :: "n" (&EIND) : "r24","memory");
}
```
The __trampolines_start symbol is defined in the linker script.
- Stubs are generated automatically by the linker if
the following two conditions are met:
- The address of a label is taken by means of the gs modifier (short for generate stubs) like so:
```
LDI r24, lo8(gs(``func``))
LDI r25, hi8(gs(``func``))
```
  - The final location of that label is in a code segment
  outside the segment where the stubs are located.
- The compiler emits such gs modifiers for code labels in the
following situations:
- Taking address of a function or code label. * Computed goto. * If prologue-save function is used, see -mcall-prologues command-line option. * Switch/case dispatch tables. If you do not want such dispatch tables you can specify the -fno-jump-tables command-line option. * C and C++ constructors/destructors called during startup/shutdown. * If the tools hit a gs() modifier explained above.
- Jumping to non-symbolic addresses like so is not supported:
```
int main (void)
{
    /* Call function at word address 0x2 */
    return ((int(*)(void)) 0x2)();
}
```
Instead, a stub has to be set up, i.e. the function has to be called through a symbol (func_4 in the example):
```
int main (void)
{
    extern int func_4 (void);

    /* Call function at byte address 0x4 */
    return func_4();
}
```
and the application be linked with -Wl,--defsym,func_4=0x4. Alternatively, func_4 can be defined in the linker script.

Handling of the RAMPD, RAMPX, RAMPY and RAMPZ Special Function Registers RAMPD RAMPX RAMPY RAMPZ Some AVR devices support memories larger than the 64KiB range that can be accessed with 16-bit pointers. To access memory locations outside this 64KiB range, the contentent of a RAMP register is used as high part of the address: The X, Y, Z address register is concatenated with the RAMPX, RAMPY, RAMPZ special function register, respectively, to get a wide address. Similarly, RAMPD is used together with direct addressing.

The startup code initializes the RAMP special function registers with zero.
- If a AVR Named Address Spacesnamed address space other than
generic or __flash is used, then RAMPZ is set as needed before the operation.
- If the device supports RAM larger than 64KiB and the compiler
needs to change RAMPZ to accomplish an operation, RAMPZ is reset to zero after the operation.
- If the device comes with a specific RAMP register, the ISR
prologue/epilogue saves/restores that SFR and initializes it with zero in case the ISR code might (implicitly) use it.
- RAM larger than 64KiB is not supported by GCC for AVR targets.
If you use inline assembler to read from locations outside the 16-bit address range and change one of the RAMP registers, you must reset it to zero after the access.

AVR Built-in Macros¶

GCC defines several built-in macros so that the user code can test for the presence or absence of features. Almost any of the following built-in macros are deduced from device capabilities and thus triggered by the -mmcu= command-line option.

For even more AVR-specific built-in macros see AVR Named Address Spaces and AVR Built-in Functions.

__AVR_ARCH__

Build-in macro that resolves to a decimal number that identifies the architecture and depends on the -mmcu=``mcu`` option. Possible values are:

2, 25, 3, 31, 35, 4, 5, 51, 6

for mcu``=``avr2, avr25, avr3, avr31, avr35, avr4, avr5, avr51, avr6,

respectively and

100, 102, 104, 105, 106, 107

for mcu``=``avrtiny, avrxmega2, avrxmega4, avrxmega5, avrxmega6, avrxmega7, respectively. If mcu specifies a device, this built-in macro is set accordingly. For example, with -mmcu=atmega8 the macro is defined to 4.

__AVR_``Device``__

Setting -mmcu=``device`` defines this built-in macro which reflects the devices name. For example, -mmcu=atmega8 defines the built-in macro __AVR_ATmega8__, -mmcu=attiny261a defines __AVR_ATtiny261A__, etc.

The built-in macros names follow the scheme __AVR_``Device``__ where Device is the device name as from the AVR user manual. The difference between Device in the built-in macro and device in -mmcu=``device`` is that the latter is always lowercase.

If device is not a device but only a core architecture like avr51, this macro is not defined.

__AVR_DEVICE_NAME__

Setting -mmcu=``device`` defines this built-in macro to the devices name. For example, with -mmcu=atmega8 the macro is defined to atmega8.

If device is not a device but only a core architecture like avr51, this macro is not defined.

__AVR_XMEGA__

The device / architecture belongs to the XMEGA family of devices.

__AVR_HAVE_ELPM__

The device has the the ELPM instruction.

__AVR_HAVE_ELPMX__

The device has the ELPM R``n,Z`` and ELPM R``n,Z+`` instructions.

__AVR_HAVE_MOVW__

The device has the MOVW instruction to perform 16-bit register-register moves.

__AVR_HAVE_LPMX__

The device has the LPM R``n,Z`` and LPM R``n,Z+`` instructions.

__AVR_HAVE_MUL__

The device has a hardware multiplier.

__AVR_HAVE_JMP_CALL__

The device has the JMP and CALL instructions. This is the case for devices with at least 16KiB of program memory.

__AVR_HAVE_EIJMP_EICALL____AVR_3_BYTE_PC__

The device has the EIJMP and EICALL instructions. This is the case for devices with more than 128KiB of program memory. This also means that the program counter (PC) is 3bytes wide.

__AVR_2_BYTE_PC__

The program counter (PC) is 2bytes wide. This is the case for devices with up to 128KiB of program memory.

__AVR_HAVE_8BIT_SP____AVR_HAVE_16BIT_SP__

The stack pointer (SP) register is treated as 8-bit respectively 16-bit register by the compiler. The definition of these macros is affected by -mtiny-stack.

__AVR_HAVE_SPH____AVR_SP8__

The device has the SPH (high part of stack pointer) special function register or has an 8-bit stack pointer, respectively. The definition of these macros is affected by -mmcu= and in the cases of -mmcu=avr2 and -mmcu=avr25 also by -msp8.

__AVR_HAVE_RAMPD____AVR_HAVE_RAMPX____AVR_HAVE_RAMPY____AVR_HAVE_RAMPZ__

The device has the RAMPD, RAMPX, RAMPY, RAMPZ special function register, respectively.

__NO_INTERRUPTS__

This macro reflects the -mno-interrupts command-line option.

__AVR_ERRATA_SKIP____AVR_ERRATA_SKIP_JMP_CALL__

Some AVR devices (AT90S8515, ATmega103) must not skip 32-bit instructions because of a hardware erratum. Skip instructions are SBRS, SBRC, SBIS, SBIC and CPSE. The second macro is only defined if __AVR_HAVE_JMP_CALL__ is also set.

__AVR_ISA_RMW__

The device has Read-Modify-Write instructions (XCH, LAC, LAS and LAT).

__AVR_SFR_OFFSET__=``offset``

Instructions that can address I/O special function registers directly like IN, OUT, SBI, etc. may use a different address as if addressed by an instruction to access RAM like LD or STS. This offset depends on the device architecture and has to be subtracted from the RAM address in order to get the respective I/Oaddress.

__WITH_AVRLIBC__

The compiler is configured to be used together with AVR-Libc. See the --with-avrlibc configure option.

:: _blackfin-options:

Blackfin Options¶

-mcpu=cpu[-sirevision]¶

Specifies the name of the target Blackfin processor. Currently, cpu can be one of bf512, bf514, bf516, bf518, bf522, bf523, bf524, bf525, bf526, bf527, bf531, bf532, bf533, bf534, bf536, bf537, bf538, bf539, bf542, bf544, bf547, bf548, bf549, bf542m, bf544m, bf547m, bf548m, bf549m, bf561, bf592.

The optional sirevision specifies the silicon revision of the target Blackfin processor. Any workarounds available for the targeted silicon revision are enabled. If sirevision is none, no workarounds are enabled. If sirevision is any, all workarounds for the targeted processor are enabled. The __SILICON_REVISION__ macro is defined to two hexadecimal digits representing the major and minor numbers in the silicon revision. If sirevision is none, the __SILICON_REVISION__ is not defined. If sirevision is any, the __SILICON_REVISION__ is defined to be 0xffff. If this optional sirevision is not used, GCC assumes the latest known silicon revision of the targeted Blackfin processor.

GCC defines a preprocessor macro for the specified cpu. For the bfin-elf toolchain, this option causes the hardware BSP provided by libgloss to be linked in if -msim is not given.

Without this option, bf532 is used as the processor by default.

Note that support for bf561 is incomplete. For bf561, only the preprocessor macro is defined.

-msim¶: Specifies that the program will be run on the simulator. This causes the simulator BSP provided by libgloss to be linked in. This option has effect only for bfin-elf toolchain. Certain other options, such as -mid-shared-library and -mfdpic, imply -msim.

-momit-leaf-frame-pointer¶: Dont keep the frame pointer in a register for leaf functions. This avoids the instructions to save, set up and restore frame pointers and makes an extra register available in leaf functions. The option -fomit-frame-pointer removes the frame pointer for all functions, which might make debugging harder.

-mspecld-anomaly¶: When enabled, the compiler ensures that the generated code does not contain speculative loads after jump instructions. If this option is used, __WORKAROUND_SPECULATIVE_LOADS is defined.

-mno-specld-anomaly¶: Dont generate extra code to prevent speculative loads from occurring.

-mcsync-anomaly¶: When enabled, the compiler ensures that the generated code does not contain CSYNC or SSYNC instructions too soon after conditional branches. If this option is used, __WORKAROUND_SPECULATIVE_SYNCS is defined.

-mno-csync-anomaly¶: Dont generate extra code to prevent CSYNC or SSYNC instructions from occurring too soon after a conditional branch.

-mlow-64k¶: When enabled, the compiler is free to take advantage of the knowledge that the entire program fits into the low 64k of memory.

-mno-low-64k¶: Assume that the program is arbitrarily large. This is the default.

-mstack-check-l1¶: Do stack checking using information placed into L1 scratchpad memory by the uClinux kernel.

-mid-shared-library¶: Generate code that supports shared libraries via the library ID method. This allows for execute in place and shared libraries in an environment without virtual memory management. This option implies -fPIC. With a bfin-elf target, this option implies -msim.

-mno-id-shared-library¶: Generate code that doesnt assume ID-based shared libraries are being used. This is the default.

-mleaf-id-shared-library¶: Generate code that supports shared libraries via the library ID method, but assumes that this library or executable wont link against any other ID shared libraries. That allows the compiler to use faster code for jumps and calls.

-mno-leaf-id-shared-library¶: Do not assume that the code being compiled wont link against any ID shared libraries. Slower code is generated for jump and call insns.

-mshared-library-id=n¶: Specifies the identification number of the ID-based shared library being compiled. Specifying a value of 0 generates more compact code; specifying other values forces the allocation of that number to the current library but is no more space- or time-efficient than omitting this option.

-msep-data¶: Generate code that allows the data segment to be located in a different area of memory from the text segment. This allows for execute in place in an environment without virtual memory management by eliminating relocations against the text section.

-mno-sep-data¶: Generate code that assumes that the data segment follows the text segment. This is the default.

-mlong-calls, -mno-long-calls¶

Tells the compiler to perform function calls by first loading the address of the function into a register and then performing a subroutine call on this register. This switch is needed if the target function lies outside of the 24-bit addressing range of the offset-based version of subroutine call instruction.

This feature is not enabled by default. Specifying -mno-long-calls restores the default behavior. Note these switches have no effect on how the compiler generates code to handle function calls via function pointers.

-mfast-fp¶: Link with the fast floating-point library. This library relaxes some of the IEEE floating-point standards rules for checking inputs against Not-a-Number (NAN), in the interest of performance.

-minline-plt¶: Enable inlining of PLT entries in function calls to functions that are not known to bind locally. It has no effect without -mfdpic.

-mmulticore¶

Build a standalone application for multicore Blackfin processors. This option causes proper start files and link scripts supporting multicore to be used, and defines the macro __BFIN_MULTICORE. It can only be used with -mcpu=bf561[-``sirevision`]`.

This option can be used with -mcorea or -mcoreb, which selects the one-application-per-core programming model. Without -mcorea or -mcoreb, the single-application/dual-core programming model is used. In this model, the main function of Core B should be named as coreb_main.

If this option is not used, the single-core application programming model is used.

-mcorea¶: Build a standalone application for Core A of BF561 when using the one-application-per-core programming model. Proper start files and link scripts are used to support Core A, and the macro __BFIN_COREA is defined. This option can only be used in conjunction with -mmulticore.

-mcoreb¶: Build a standalone application for Core B of BF561 when using the one-application-per-core programming model. Proper start files and link scripts are used to support Core B, and the macro __BFIN_COREB is defined. When this option is used, coreb_main should be used instead of main. This option can only be used in conjunction with -mmulticore.

-msdram¶: Build a standalone application for SDRAM. Proper start files and link scripts are used to put the application into SDRAM, and the macro __BFIN_SDRAM is defined. The loader should initialize SDRAM before loading the application.

-micplb¶: Assume that ICPLBs are enabled at run time. This has an effect on certain anomaly workarounds. For Linux targets, the default is to assume ICPLBs are enabled; for standalone applications the default is off.

:: _c6x-options:

C6X Options¶

-march=name¶: This specifies the name of the target architecture. GCC uses this name to determine what kind of instructions it can emit when generating assembly code. Permissible names are: c62x, c64x, c64x+, c67x, c67x+, c674x.

-mbig-endian¶: Generate code for a big-endian target.

-mlittle-endian¶: Generate code for a little-endian target. This is the default.

-msim¶: Choose startup files and linker script suitable for the simulator.

-msdata=default¶: Put small global and static data in the .neardata section, which is pointed to by register B14. Put small uninitialized global and static data in the .bss section, which is adjacent to the .neardata section. Put small read-only data into the .rodata section. The corresponding sections used for large pieces of data are .fardata, .far and .const.

-msdata=all¶: Put all data, not just small objects, into the sections reserved for small data, and use addressing relative to the B14 register to access them.

-msdata=none¶: Make no use of the sections reserved for small data, and use absolute addresses to access all data. Put all initialized global and static data in the .fardata section, and all uninitialized data in the .far section. Put all constant data into the .const section.

:: _cris-options:

CRIS Options¶

These options are defined specifically for the CRIS ports.

-march=architecture-type¶: Generate code for the specified architecture. The choices for architecture-type are v3, v8 and v10 for respectively ETRAX 4, ETRAX 100, and ETRAX 100 LX. Default is v0 except for cris-axis-linux-gnu, where the default is v10.

-mtune=architecture-type¶: Tune to architecture-type everything applicable about the generated code, except for the ABI and the set of available instructions. The choices for architecture-type are the same as for -march=``architecture-type``.

-mmax-stack-frame=n¶: Warn when the stack frame of a function exceeds n bytes.

-metrax4, -metrax100¶: The options -metrax4 and -metrax100 are synonyms for -march=v3 and -march=v8 respectively.

-mmul-bug-workaround, -mno-mul-bug-workaround¶: Work around a bug in the muls and mulu instructions for CPU models where it applies. This option is active by default.

-mpdebug¶: Enable CRIS-specific verbose debug-related information in the assembly code. This option also has the effect of turning off the #NO_APP formatted-code indicator to the assembler at the beginning of the assembly file.

-mcc-init¶: Do not use condition-code results from previous instruction; always emit compare and test instructions before use of condition codes.

-mno-side-effects¶: Do not emit instructions with side effects in addressing modes other than post-increment.

-mstack-align, -mno-stack-align, -mdata-align, -mno-data-align, -mconst-align, -mno-const-align¶: These options (no- options) arrange (eliminate arrangements) for the stack frame, individual data and constants to be aligned for the maximum single data access size for the chosen CPU model. The default is to arrange for 32-bit alignment. ABI details such as structure layout are not affected by these options.

-m32-bit, -m16-bit, -m8-bit¶: Similar to the stack- data- and const-align options above, these options arrange for stack frame, writable data and constants to all be 32-bit, 16-bit or 8-bit aligned. The default is 32-bit alignment.

-mno-prologue-epilogue, -mprologue-epilogue¶: With -mno-prologue-epilogue, the normal function prologue and epilogue which set up the stack frame are omitted and no return instructions or return sequences are generated in the code. Use this option only together with visual inspection of the compiled code: no warnings or errors are generated when call-saved registers must be saved, or storage for local variables needs to be allocated.

-mno-gotplt, -mgotplt¶: With -fpic and -fPIC, dont generate (do generate) instruction sequences that load addresses for functions from the PLT part of the GOT rather than (traditional on other architectures) calls to the PLT. The default is -mgotplt.

-melf¶: Legacy no-op option only recognized with the cris-axis-elf and cris-axis-linux-gnu targets.

-mlinux¶: Legacy no-op option only recognized with the cris-axis-linux-gnu target.

-sim¶: This option, recognized for the cris-axis-elf, arranges to link with input-output functions from a simulator library. Code, initialized data and zero-initialized data are allocated consecutively.

-sim2¶: Like -sim, but pass linker options to locate initialized data at 0x40000000 and zero-initialized data at 0x80000000.

:: _cr16-options:

CR16 Options¶

These options are defined specifically for the CR16 ports.

-mmac¶: Enable the use of multiply-accumulate instructions. Disabled by default.

-mcr16cplus, -mcr16c¶: Generate code for CR16C or CR16C+ architecture. CR16C+ architecture is default.

-msim¶: Links the library libsim.a which is in compatible with simulator. Applicable to ELF compiler only.

-mint32¶: Choose integer type as 32-bit wide.

-mbit-ops¶: Generates sbit/cbit instructions for bit manipulations.

-mdata-model=model¶: Choose a data model. The choices for model are near, far or medium. medium is default. However, far is not valid with -mcr16c, as the CR16C architecture does not support the far data model.

:: _darwin-options:

Darwin Options¶

These options are defined for all architectures running the Darwin operating system.

FSF GCC on Darwin does not create fat object files; it creates an object file for the single architecture that GCC was built to target. Apples GCC on Darwin does create fat files if multiple -arch options are used; it does so by running the compiler or linker multiple times and joining the results together with lipo.

The subtype of the file created (like ppc7400 or ppc970 or i686) is determined by the flags that specify the ISA that GCC is targeting, like -mcpu or -march. The -force_cpusubtype_ALL option can be used to override this.

The Darwin tools vary in their behavior when presented with an ISA mismatch. The assembler, as, only permits instructions to be used that are valid for the subtype of the file it is generating, so you cannot put 64-bit instructions in a ppc750 object file. The linker for shared libraries, /usr/bin/libtool, fails and prints an error if asked to create a shared library with a less restrictive subtype than its input files (for instance, trying to put a ppc970 object file in a ppc7400 library). The linker for executables, ld, quietly gives the executable the most restrictive subtype of any of its input files.

-Fdir, -F¶

Add the framework directory dir to the head of the list of directories to be searched for header files. These directories are interleaved with those specified by -I options and are scanned in a left-to-right order.

A framework directory is a directory with frameworks in it. A framework is a directory with a Headers and/or PrivateHeaders directory contained directly in it that ends in .framework. The name of a framework is the name of this directory excluding the .framework. Headers associated with the framework are found in one of those two directories, with Headers being searched first. A subframework is a framework directory that is in a frameworks Frameworks directory. Includes of subframework headers can only appear in a header of a framework that contains the subframework, or in a sibling subframework header. Two subframeworks are siblings if they occur in the same framework. A subframework should not have the same name as a framework; a warning is issued if this is violated. Currently a subframework cannot have subframeworks; in the future, the mechanism may be extended to support this. The standard frameworks can be found in /System/Library/Frameworks and /Library/Frameworks. An example include looks like #include <Framework/header.h>, where Framework denotes the name of the framework and header.h is found in the PrivateHeaders or Headers directory.

-iframeworkdir, -iframework¶: Like -F except the directory is a treated as a system directory. The main difference between this -iframework and -F is that with -iframework the compiler does not warn about constructs contained within header files found via dir. This option is valid only for the C family of languages.

-gused¶: Emit debugging information for symbols that are used. For stabs debugging format, this enables -feliminate-unused-debug-symbols. This is by default ON.

-gfull¶: Emit debugging information for all symbols and types.

-mmacosx-version-min=``version``

The earliest version of MacOS X that this executable will run on is version. Typical values of version include 10.1, 10.2, and 10.3.9.

If the compiler was built to use the systems headers by default, then the default for this option is the system version on which the compiler is running, otherwise the default is to make choices that are compatible with as many systems and code bases as possible.

-mkernel¶: Enable kernel development mode. The -mkernel option sets -static, -fno-common, -fno-use-cxa-atexit, -fno-exceptions, -fno-non-call-exceptions, -fapple-kext, -fno-weak and -fno-rtti where applicable. This mode also sets -mno-altivec, -msoft-float, -fno-builtin and -mlong-branch for PowerPC targets.

-mone-byte-bool¶

Override the defaults for bool so that sizeof(bool)==1. By default sizeof(bool) is 4 when compiling for Darwin/PowerPC and 1 when compiling for Darwin/x86, so this option has no effect on x86.

Warning: The -mone-byte-bool switch causes GCC to generate code that is not binary compatible with code generated without that switch. Using this switch may require recompiling all other modules in a program, including system libraries. Use this switch to conform to a non-default data model.

-mfix-and-continue, -ffix-and-continue, -findirect-data¶: Generate code suitable for fast turnaround development, such as to allow GDB to dynamically load .o files into already-running programs. -findirect-data and -ffix-and-continue are provided for backwards compatibility.

-all_load¶: Loads all members of static archive libraries. See man ld(1) for more information.

-arch_errors_fatal¶: Cause the errors having to do with files that have the wrong architecture to be fatal.

-bind_at_load¶: Causes the output file to be marked such that the dynamic linker will bind all undefined references when the file is loaded or launched.

-bundle¶: Produce a Mach-o bundle format file. See man ld(1) for more information.

-bundle_loader executable, -bundle_loader¶: This option specifies the executable that will load the build output file being linked. See man ld(1) for more information.

-dynamiclib¶: When passed this option, GCC produces a dynamic library instead of an executable when linking, using the Darwin libtool command.

-force_cpusubtype_ALL¶: This causes GCCs output file to have the ALL subtype, instead of one controlled by the -mcpu or -march option.

-allowable_client client_name, -allowable_client, -client_name, -compatibility_version, -current_version, -dead_strip, -dependency-file, -dylib_file, -dylinker_install_name, -dynamic, -exported_symbols_list, -filelist, -flat_namespace, -force_flat_namespace, -headerpad_max_install_names, -image_base, -init, -install_name, -keep_private_externs, -multi_module, -multiply_defined, -multiply_defined_unused, -noall_load, -no_dead_strip_inits_and_terms, -nofixprebinding, -nomultidefs, -noprebind, -noseglinkedit, -pagezero_size, -prebind, -prebind_all_twolevel_modules, -private_bundle, -read_only_relocs, -sectalign, -sectobjectsymbols, -whyload, -seg1addr, -sectcreate, -sectorder, -segaddr, -segs_read_only_addr, -segs_read_write_addr, -seg_addr_table, -seg_addr_table_filename, -seglinkedit, -segprot, -single_module, -static, -sub_library, -sub_umbrella, -twolevel_namespace, -umbrella, -undefined, -unexported_symbols_list, -weak_reference_mismatches, -whatsloaded¶: These options are passed to the Darwin linker. The Darwin linker man page describes them in detail.

:: _dec-alpha-options:

DEC Alpha Options¶

These -m options are defined for the DEC Alpha implementations:

-mno-soft-float, -msoft-float¶

Use (do not use) the hardware floating-point instructions for floating-point operations. When -msoft-float is specified, functions in libgcc.a are used to perform floating-point operations. Unless they are replaced by routines that emulate the floating-point operations, or compiled in such a way as to call such emulations routines, these routines issue floating-point operations. If you are compiling for an Alpha without floating-point operations, you must ensure that the library is built so as not to call them.

Note that Alpha implementations without floating-point operations are required to have floating-point registers.

-mfp-reg, -mno-fp-regs¶

Generate code that uses (does not use) the floating-point register set. -mno-fp-regs implies -msoft-float. If the floating-point register set is not used, floating-point operands are passed in integer registers as if they were integers and floating-point results are passed in $0 instead of $f0. This is a non-standard calling sequence, so any function with a floating-point argument or return value called by code compiled with -mno-fp-regs must also be compiled with that option.

A typical use of this option is building a kernel that does not use, and hence need not save and restore, any floating-point registers.

-mieee¶: The Alpha architecture implements floating-point hardware optimized for maximum performance. It is mostly compliant with the IEEE floating-point standard. However, for full compliance, software assistance is required. This option generates code fully IEEE-compliant code except that the inexact-flag is not maintained (see below). If this option is turned on, the preprocessor macro _IEEE_FP is defined during compilation. The resulting code is less efficient but is able to correctly support denormalized numbers and exceptional IEEE values such as not-a-number and plus/minus infinity. Other Alpha compilers call this option -ieee_with_no_inexact.

-mieee-with-inexact¶: This is like -mieee except the generated code also maintains the IEEE inexact-flag. Turning on this option causes the generated code to implement fully-compliant IEEE math. In addition to _IEEE_FP, _IEEE_FP_EXACT is defined as a preprocessor macro. On some Alpha implementations the resulting code may execute significantly slower than the code generated by default. Since there is very little code that depends on the inexact-flag, you should normally not specify this option. Other Alpha compilers call this option -ieee_with_inexact.

-mfp-trap-mode=trap-mode¶

This option controls what floating-point related traps are enabled. Other Alpha compilers call this option -fptm ``trap-mode``. The trap mode can be set to one of four values:

n: This is the default (normal) setting. The only traps that are enabled are the ones that cannot be disabled in software (e.g., division by zero trap).
u: In addition to the traps enabled by n, underflow traps are enabled as well.
su: Like u, but the instructions are marked to be safe for software completion (see Alpha architecture manual for details).
sui: Like su, but inexact traps are enabled as well.

-mfp-rounding-mode=rounding-mode¶

Selects the IEEE rounding mode. Other Alpha compilers call this option -fprm ``rounding-mode``. The rounding-mode can be one of:

n: Normal IEEE rounding mode. Floating-point numbers are rounded towards the nearest machine number or towards the even machine number in case of a tie.
m: Round towards minus infinity.
c: Chopped rounding mode. Floating-point numbers are rounded towards zero.
d: Dynamic rounding mode. A field in the floating-point control register (fpcr, see Alpha architecture reference manual) controls the rounding mode in effect. The C library initializes this register for rounding towards plus infinity. Thus, unless your program modifies the fpcr, d corresponds to round towards plus infinity.

-mtrap-precision=trap-precision¶

In the Alpha architecture, floating-point traps are imprecise. This means without software assistance it is impossible to recover from a floating trap and program execution normally needs to be terminated. GCC can generate code that can assist operating system trap handlers in determining the exact location that caused a floating-point trap. Depending on the requirements of an application, different levels of precisions can be selected:

p: Program precision. This option is the default and means a trap handler can only identify which program caused a floating-point exception.
f: Function precision. The trap handler can determine the function that caused a floating-point exception.
i: Instruction precision. The trap handler can determine the exact instruction that caused a floating-point exception.

Other Alpha compilers provide the equivalent options called -scope_safe and -resumption_safe.

-mieee-conformant¶: This option marks the generated code as IEEE conformant. You must not use this option unless you also specify -mtrap-precision=i and either -mfp-trap-mode=su or -mfp-trap-mode=sui. Its only effect is to emit the line .eflag 48 in the function prologue of the generated assembly file.

-mbuild-constants¶

Normally GCC examines a 32- or 64-bit integer constant to see if it can construct it from smaller constants in two or three instructions. If it cannot, it outputs the constant as a literal and generates code to load it from the data segment at run time.

Use this option to require GCC to construct all integer constants using code, even if it takes more instructions (the maximum is six).

You typically use this option to build a shared library dynamic loader. Itself a shared library, it must relocate itself in memory before it can find the variables and constants in its own data segment.

-mbwx, -mno-bwx, -mcix, -mno-cix, -mfix, -mno-fix, -mmax, -mno-max¶: Indicate whether GCC should generate code to use the optional BWX, CIX, FIX and MAX instruction sets. The default is to use the instruction sets supported by the CPU type specified via -mcpu= option or that of the CPU on which GCC was built if none is specified.

-mfloat-vax, -mfloat-ieee¶: Generate code that uses (does not use) VAX F and G floating-point arithmetic instead of IEEE single and double precision.

-mexplicit-relocs, -mno-explicit-relocs¶: Older Alpha assemblers provided no way to generate symbol relocations except via assembler macros. Use of these macros does not allow optimal instruction scheduling. GNU binutils as of version 2.12 supports a new syntax that allows the compiler to explicitly mark which relocations should apply to which instructions. This option is mostly useful for debugging, as GCC detects the capabilities of the assembler when it is built and sets the default accordingly.

-msmall-data, -mlarge-data¶

When -mexplicit-relocs is in effect, static data is accessed via gp-relative relocations. When -msmall-data is used, objects 8 bytes long or smaller are placed in a small data area (the .sdata and .sbss sections) and are accessed via 16-bit relocations off of the $gp register. This limits the size of the small data area to 64KB, but allows the variables to be directly accessed via a single instruction.

The default is -mlarge-data. With this option the data area is limited to just below 2GB. Programs that require more than 2GB of data must use malloc or mmap to allocate the data in the heap instead of in the programs data segment.

When generating code for shared libraries, -fpic implies -msmall-data and -fPIC implies -mlarge-data.

-msmall-text, -mlarge-text¶

When -msmall-text is used, the compiler assumes that the code of the entire program (or shared library) fits in 4MB, and is thus reachable with a branch instruction. When -msmall-data is used, the compiler can assume that all local symbols share the same $gp value, and thus reduce the number of instructions required for a function call from 4 to 1.

The default is -mlarge-text.

-mcpu=cpu_type¶

Set the instruction set and instruction scheduling parameters for machine type cpu_type. You can specify either the EV style name or the corresponding chip number. GCC supports scheduling parameters for the EV4, EV5 and EV6 family of processors and chooses the default values for the instruction set from the processor you specify. If you do not specify a processor type, GCC defaults to the processor on which the compiler was built.

Supported values for cpu_type are

ev4ev4521064: Schedules as an EV4 and has no instruction set extensions.
ev521164: Schedules as an EV5 and has no instruction set extensions.
ev5621164a: Schedules as an EV5 and supports the BWX extension.
pca5621164pc21164PC: Schedules as an EV5 and supports the BWX and MAX extensions.
ev621264: Schedules as an EV6 and supports the BWX, FIX, and MAX extensions.
ev6721264a: Schedules as an EV6 and supports the BWX, CIX, FIX, and MAX extensions.

Native toolchains also support the value native, which selects the best architecture option for the host processor. -mcpu=native has no effect if GCC does not recognize the processor.

-mtune=cpu_type¶

Set only the instruction scheduling parameters for machine type cpu_type. The instruction set is not changed.

Native toolchains also support the value native, which selects the best architecture option for the host processor. -mtune=native has no effect if GCC does not recognize the processor.

-mmemory-latency=time¶

Sets the latency the scheduler should assume for typical memory references as seen by the application. This number is highly dependent on the memory access patterns used by the application and the size of the external cache on the machine.

Valid options for time are

number: A decimal number representing clock cycles.
L1L2L3main: The compiler contains estimates of the number of clock cycles for typical EV4 & EV5 hardware for the Level 1, 2 & 3 caches (also called Dcache, Scache, and Bcache), as well as to main memory. Note that L3 is only valid for EV5.

:: _fr30-options:

FR30 Options¶

These options are defined specifically for the FR30 port.

-msmall-model¶: Use the small address space model. This can produce smaller code, but it does assume that all symbolic values and addresses fit into a 20-bit range.

-mno-lsim¶: Assume that runtime support has been provided and so there is no need to include the simulator library (libsim.a) on the linker command line.

:: _frv-options:

FRV Options¶

-mgpr-32¶: Only use the first 32 general-purpose registers.

-mgpr-64¶: Use all 64 general-purpose registers.

-mfpr-32¶: Use only the first 32 floating-point registers.

-mfpr-64¶: Use all 64 floating-point registers.

-mhard-float¶: Use hardware instructions for floating-point operations.

-msoft-float¶: Use library routines for floating-point operations.

-malloc-cc¶: Dynamically allocate condition code registers.

-mfixed-cc¶: Do not try to dynamically allocate condition code registers, only use icc0 and fcc0.

-mdword¶: Change ABI to use double word insns.

-mno-dword¶: Do not use double word instructions.

-mdouble¶: Use floating-point double instructions.

-mno-double¶: Do not use floating-point double instructions.

-mmedia¶: Use media instructions.

-mno-media¶: Do not use media instructions.

-mmuladd¶: Use multiply and add/subtract instructions.

-mno-muladd¶: Do not use multiply and add/subtract instructions.

-mfdpic¶: Select the FDPIC ABI, which uses function descriptors to represent pointers to functions. Without any PIC/PIE-related options, it implies -fPIE. With -fpic or -fpie, it assumes GOT entries and small data are within a 12-bit range from the GOT base address; with -fPIC or -fPIE, GOT offsets are computed with 32 bits. With a bfin-elf target, this option implies -msim.

-minline-plt¶: Enable inlining of PLT entries in function calls to functions that are not known to bind locally. It has no effect without -mfdpic. Its enabled by default if optimizing for speed and compiling for shared libraries (i.e., -fPIC or -fpic), or when an optimization option such as -O3 or above is present in the command line.

-mTLS¶: Assume a large TLS segment when generating thread-local code.

-mtls¶: Do not assume a large TLS segment when generating thread-local code.

-mgprel-ro¶: Enable the use of GPREL relocations in the FDPIC ABI for data that is known to be in read-only sections. Its enabled by default, except for -fpic or -fpie: even though it may help make the global offset table smaller, it trades 1 instruction for 4. With -fPIC or -fPIE, it trades 3 instructions for 4, one of which may be shared by multiple symbols, and it avoids the need for a GOT entry for the referenced symbol, so its more likely to be a win. If it is not, -mno-gprel-ro can be used to disable it.

-multilib-library-pic¶: Link with the (library, not FD) pic libraries. Its implied by -mlibrary-pic, as well as by -fPIC and -fpic without -mfdpic. You should never have to use it explicitly.

-mlinked-fp¶: Follow the EABI requirement of always creating a frame pointer whenever a stack frame is allocated. This option is enabled by default and can be disabled with -mno-linked-fp.

-mlong-calls¶: Use indirect addressing to call functions outside the current compilation unit. This allows the functions to be placed anywhere within the 32-bit address space.

-malign-labels¶: Try to align labels to an 8-byte boundary by inserting NOPs into the previous packet. This option only has an effect when VLIW packing is enabled. It doesnt create new packets; it merely adds NOPs to existing ones.

-mlibrary-pic¶: Generate position-independent EABI code.

-macc-4¶: Use only the first four media accumulator registers.

-macc-8¶: Use all eight media accumulator registers.

-mpack¶: Pack VLIW instructions.

-mno-pack¶: Do not pack VLIW instructions.

-mno-eflags¶: Do not mark ABI switches in e_flags.

-mcond-move¶

Enable the use of conditional-move instructions (default).