.. Copyright (C) 2016-2023 Free Software Foundation, Inc. Originally contributed by David Malcolm This is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . Inside :program:`cc1` --------------------- GCC is a collection of compilers for different languages, which can generate code for a wide variety of different CPU architectures. GCC currently supports 10 source languages, and 55 target CPU architectures. We don't want to have to write 550 different compilers, so, as is common, GCC is structured into three parts: * source language "front ends" - one for each source language, taking the source language as input, using it to generating an "intermediate represention" (IR) of the user's code in a language-independent internal format * target-specific "back ends" - one for each target, taking the intermediate representation as input, and generating assembler for that target * the so-called "middle end", sitting between these, which optimizes the code, so that the back end has better code to work with So :program:`cc1` built for ``x86_64`` has the C frontend, the shared optimizing code, and the x86_64 backend:: +-----+ +-| cc1 |--------------------------------------------------+ | +-----+ | | | | C frontend Optimizer x86_64 backend | C source ====+==============> IR =============> IR ===================+==> x86_64 asm | | +----------------------------------------------------------+ Sadly the above picture is an idealized picture, in that the code is rather messier than that in practice. It's also over-simplified, in that GCC actually has several intermediate representations, which we'll talk about in detail below. * if you're interested in adding new warnings to GCC, you'll probably want to just look at the frontends for now, and to merely skim the sections below. * if you want to make GCC generate better code, you'll want to look at the middle-end and backends, and find out more about gcc's intermediate representations by reading the section below * if you're interested in adding support to GCC for a new kind of CPU, you'll want to look mostly at the backend/RTL sections below How does :program:`cc1` turn C into assembler? ********************************************** When figuring out how the compiler works, it's best to start with a really simple source file - in particular, one with doesn't ``#include`` any header files. Our earlier example used :file:``, so let's make an even simpler :file:`test.c`: .. code-block:: c int test (int a, int b, int c, int d) { if (a) return (b - c) * d; else return (c - b) * d; } Compiling it to assembler, with some optimization (:option:`-O2`) and enabling verbose assembler output (:option:`-fverbose-asm`): .. code-block:: sh $ gcc -S test.c -O2 -fverbose-asm gives this :file:`test.s`: .. code-block:: asm .file "test.c" ; ...omitting dump of options for brevity... .text .p2align 4 .globl test .type test, @function test: .LFB0: .cfi_startproc # test.c:2: { movl %ecx, %eax # tmp97, d # test.c:3: if (a) testl %edi, %edi # tmp94 je .L2 #, # test.c:4: return (b - c) * d; subl %edx, %esi # c, tmp89 # test.c:4: return (b - c) * d; imull %esi, %eax # tmp89, ret .p2align 4,,10 .p2align 3 .L2: # test.c:6: return (c - b) * d; subl %esi, %edx # b, tmp90 # test.c:6: return (c - b) * d; imull %edx, %eax # tmp90, # test.c:7: } ret .cfi_endproc .LFE0: .size test, .-test .ident "GCC: (GNU) 10.3.1 20210422 (Red Hat 10.3.1-1)" .section .note.GNU-stack,"",@progbits where :program:`cc1` has converted the C code into a pair of subtractions (:code:`subl`), and a pair of multiplies (:code:`imull`) that populate the ``%eax`` register, used for the return value. You can see GCC's intermediate representations using GCC's dump options. If we add :option:`-fdump-tree-all -fdump-ipa-all -fdump-rtl-all` to the above command line, giving: .. code-block:: sh gcc -S test.c -O2 -fverbose-asm -fdump-tree-all -fdump-ipa-all -fdump-rtl-all then these dump options lead :program:`cc1` to emit *many* dump files (192 in the following example): .. code-block:: sh $ ls test.c test.c.101t.alias test.c.238r.vregs test.c.000i.cgraph test.c.102t.retslot test.c.239r.into_cfglayout test.c.000i.ipa-clones test.c.103t.fre3 test.c.240r.jump test.c.000i.type-inheritance test.c.104t.mergephi2 test.c.241r.subreg1 test.c.004t.original test.c.105t.thread1 test.c.242r.dfinit test.c.005t.gimple test.c.106t.vrp1 test.c.243r.cse1 test.c.007t.omplower test.c.107t.dce2 test.c.244r.fwprop1 test.c.008t.lower test.c.108t.stdarg test.c.245r.cprop1 test.c.011t.eh test.c.109t.cdce test.c.246r.pre test.c.013t.cfg test.c.110t.cselim test.c.248r.cprop2 test.c.015t.ompexp test.c.111t.copyprop1 test.c.251r.ce1 test.c.018i.visibility test.c.112t.ifcombine test.c.252r.reginfo test.c.019i.build_ssa_passes test.c.113t.mergephi3 test.c.253r.loop2 test.c.020t.fixup_cfg1 test.c.114t.phiopt2 test.c.254r.loop2_init test.c.021t.ssa test.c.115t.tailr2 test.c.255r.loop2_invariant test.c.023t.nothrow test.c.116t.ch2 test.c.258r.loop2_done test.c.024i.opt_local_passes test.c.117t.cplxlower1 test.c.261r.cprop3 test.c.025t.fixup_cfg2 test.c.118t.sra test.c.262r.stv1 test.c.026t.local-fnsummary1 test.c.119t.thread2 test.c.263r.cse2 test.c.027t.einline test.c.120t.dom2 test.c.264r.dse1 test.c.028t.early_optimizations test.c.121t.copyprop2 test.c.265r.fwprop2 test.c.029t.objsz1 test.c.122t.isolate-paths test.c.267r.init-regs test.c.030t.ccp1 test.c.123t.dse2 test.c.268r.ud_dce test.c.031t.forwprop1 test.c.124t.reassoc1 test.c.269r.combine test.c.032t.ethread test.c.125t.dce3 test.c.271r.stv2 test.c.033t.esra test.c.126t.forwprop3 test.c.272r.ce2 test.c.034t.ealias test.c.127t.phiopt3 test.c.273r.jump_after_combine test.c.035t.fre1 test.c.128t.ccp3 test.c.274r.bbpart test.c.036t.evrp test.c.129t.sincos test.c.275r.outof_cfglayout test.c.037t.mergephi1 test.c.130t.bswap test.c.276r.split1 test.c.038t.dse1 test.c.131t.laddress test.c.277r.subreg3 test.c.039t.cddce1 test.c.132t.lim2 test.c.279r.mode_sw test.c.040t.phiopt1 test.c.133t.walloca2 test.c.280r.asmcons test.c.041t.tailr1 test.c.134t.pre test.c.285r.ira test.c.042t.switchconv test.c.135t.sink test.c.286r.reload test.c.044t.profile_estimate test.c.139t.dce4 test.c.288r.postreload test.c.045t.local-pure-const1 test.c.140t.fix_loops test.c.290r.split2 test.c.046t.fnsplit test.c.171t.no_loop test.c.291r.ree test.c.047t.release_ssa test.c.174t.veclower21 test.c.292r.cmpelim test.c.048t.local-fnsummary2 test.c.175t.switchlower1 test.c.293r.pro_and_epilogue test.c.049i.remove_symbols test.c.177t.reassoc2 test.c.294r.dse2 test.c.061i.targetclone test.c.178t.slsr test.c.295r.csa test.c.065i.free-fnsummary1 test.c.181t.fre4 test.c.296r.jump2 test.c.071i.whole-program test.c.182t.thread3 test.c.297r.compgotos test.c.072i.profile_estimate test.c.183t.dom3 test.c.299r.peephole2 test.c.073i.icf test.c.184t.strlen1 test.c.300r.ce3 test.c.074i.devirt test.c.185t.thread4 test.c.302r.cprop_hardreg test.c.075i.cp test.c.186t.vrp2 test.c.303r.rtl_dce test.c.076i.sra test.c.187t.copyprop5 test.c.304r.bbro test.c.079i.fnsummary test.c.188t.wrestrict test.c.305r.split3 test.c.080i.inline test.c.189t.dse3 test.c.306r.sched2 test.c.081i.pure-const test.c.190t.cddce3 test.c.308r.stack test.c.082i.free-fnsummary2 test.c.191t.forwprop4 test.c.309r.alignments test.c.083i.static-var test.c.192t.phiopt4 test.c.311r.mach test.c.084i.single-use test.c.193t.fab1 test.c.312r.barriers test.c.085i.comdats test.c.194t.widening_mul test.c.317r.shorten test.c.086i.materialize-all-clones test.c.195t.store-merging test.c.318r.nothrow test.c.088i.simdclone test.c.196t.tailc test.c.319r.dwarf2 test.c.089t.fixup_cfg3 test.c.197t.dce7 test.c.320r.final test.c.094t.ccp2 test.c.198t.crited1 test.c.321r.dfinish test.c.096t.cunrolli test.c.200t.uncprop1 test.c.322t.statistics test.c.097t.backprop test.c.201t.local-pure-const2 test.c.323t.earlydebug test.c.098t.phiprop test.c.234t.nrv test.c.324t.debug test.c.099t.forwprop2 test.c.235t.optimized test.s test.c.100t.objsz2 test.c.237r.expand You'll see that they're of the form :file:`test.c.` followed by a 3 digit number, followed by "t", "i", or "r", then a suffix. The precise numbering and suffixes of dump files varies from release to release of GCC, and the subset that gets emitted will vary depending on the optimization option you choose - there were 192 in the above example (GCC 10, with :option:`-O2`). The dump files show the state of GCC's intermediate representation of the code at each "optimization pass". The numbering roughly corresponds to a time-ordering of the states within the compiler, so that e.g. :file:`test.c.004t.original` shows the initial state of the IR coming out of the C frontend, whilst :file:`test.c.320r.final` shows the final state as assembler is written out. Beware, though that the "i" dumps are numbered out-of-order relative to the other "t" and "r" passes. At a high level, :program:`cc1` works as follows. Lexing ****** First the input source is "tokenized", so that the stream of input characters is divided into a stream of tokens. This is called "lexing", and largely implemented in gcc in `libcpp `_ (which also implements the preprocessor - hence the name) so that e.g. we go from the sequence of characters: .. code-block:: none return (b - c) * d; to the sequence of tokens: .. code-block:: none CPP_KEYWORD(RID_RETURN) CPP_OPEN_PAREN CPP_NAME("b") CPP_MINUS CPP_NAME("c") CPP_CLOSE_PAREN CPP_MULT CPP_NAME("d") CPP_SEMICOLON annotated with information about where in the user's source they occurred. Parsing and the :c:type:`tree` type *********************************** Next the frontend parses the tokens from a flat stream into a tree-like structure reflecting the grammar of the language (or complains about syntax errors or type errors, and bails out). Most warnings are implemented here, so if you're interested in adding new warnings, this is the place to look. This stage uses gcc's :c:type:`tree` type. There may be frontend-specific kinds of node, in the tree but the frontend will convert these to a generic form, so that after each frontend the middle end "sees" a tree representation that we call `generic `_ (unless the frontend gave up due to a sufficiently serious error in the user's code). You can see the "generic" representation in the :file:`test.c.004t.original` dump: .. code-block:: c ;; Function test (null) ;; enabled by -tree-original { if (a != 0) { return (b - c) * d; } else { return (c - b) * d; } } In this example, the dump of the tree IR closely resembles the original C code, but sometimes you will see control flow expressed via "goto" statements that go to numbered labels, and temporary variables introduced by the frontend. If we're running under the debugger (see :ref:`debugging`), we can see the tree for a function body like this:: (gdb) call debug_tree(fndecl->function_decl.saved_tree) > side-effects body side-effects arg:0 arg:0 arg:1 test.c:3:7 start: test.c:3:7 finish: test.c:3:7> arg:1 side-effects arg:0 side-effects arg:0 arg:1 arg:0 arg:0 arg:1 test.c:4:15 start: test.c:4:12 finish: test.c:4:18> arg:1 test.c:4:20 start: test.c:4:12 finish: test.c:4:22> test.c:4:20 start: test.c:4:12 finish: test.c:4:22> test.c:4:20 start: test.c:4:12 finish: test.c:4:22> arg:2 side-effects arg:0 side-effects arg:0 arg:1 arg:0 arg:0 arg:1 test.c:6:15 start: test.c:6:12 finish: test.c:6:18> arg:1 test.c:6:20 start: test.c:6:12 finish: test.c:6:22> test.c:6:20 start: test.c:6:12 finish: test.c:6:22> test.c:6:20 start: test.c:6:12 finish: test.c:6:22> test.c:3:6 start: test.c:3:6 finish: test.c:3:6> block public static QI test.c:1:5 align:8 warn_if_not_align:0 context initial result arguments struct-function 0x7fffea3f9000>> test.c:2:1 start: test.c:2:1 finish: test.c:2:1> where for example: * `cond_expr` is the conditional expression, with three arguments: * `ne_expr` is a "not-equal expression" for ``a != 0`` * each `return_expr` is one of the two return expressions * `parm_decl` is a parameter declaration (such as ``a``) * `integer_cst` is an integer constant (as opposed to a cast), such as ``0`` gimple ****** The tree-based IR can contain arbitrarily-complicated nested expressions, which is relatively easy for the frontends to generate, but difficult for the optimizer to work with, so GCC almost immediately converts it into a form named "gimple", in which compound expressions such as: .. code-block:: c (b - c) * d get flattened into a series of assignments to temporary variables. We can see the initial form of the gimple in the :file:`test.c.005t.gimple` dump: .. code-block:: c test (int a, int b, int c, int d) { int D.1938; if (a != 0) goto ; else goto ; : _1 = b - c; D.1938 = d * _1; // predicted unlikely by early return (on trees) predictor. return D.1938; : _2 = c - b; D.1938 = d * _2; // predicted unlikely by early return (on trees) predictor. return D.1938; } Note how the if/else control flow has become "goto" statements, and how the "gimplifier" has flattened: .. code-block:: c (b - c) * d into assignments to two temporaries (named ``_1`` and ``D.1938``): .. code-block:: c _1 = b - c; D.1938 = d * _1; This gives us a sequence of gimple statements, some of which are labels, and some of which ``goto`` those labels. The official documentation on gimple is `here `_. gimple with a CFG ***************** Although some optimization passes do work on this "gimple with labels" representation, it is almost immediately converted to a Control Flow Graph (CFG), a directed graph of "basic blocks" - sequences of statements with no control flow, where the control flow is expressed as the edges between the basic blocks. This can be seen in the :file:`test.c.013t.cfg` dump: .. code-block:: c ;; Function test (test, funcdef_no=0, decl_uid=1933, cgraph_uid=1, symbol_order=0) ;; 1 loops found ;; ;; Loop 0 ;; header 0, latch 1 ;; depth 0, outer -1 ;; nodes: 0 1 2 3 4 5 ;; 2 succs { 3 4 } ;; 3 succs { 5 } ;; 4 succs { 5 } ;; 5 succs { 1 } test (int a, int b, int c, int d) { int D.1938; : if (a != 0) goto ; [INV] else goto ; [INV] : _1 = b - c; D.1938 = d * _1; // predicted unlikely by early return (on trees) predictor. goto ; [INV] : _2 = c - b; D.1938 = d * _2; // predicted unlikely by early return (on trees) predictor. : return D.1938; } You can see the basic blocks via e.g. the ```` headers. There is also now a single ``return`` statement from the function; the multiple ``return`` statements are now all expressed by assigning to a temporary (``D.1938``), and then a ``goto`` to the basic block containing the return statement. If you add the ``-graph`` suffix to the ``dump`` command line options: .. code-block:: sh $ gcc -S test.c -O2 -fverbose-asm \ -fdump-tree-all-graph -fdump-ipa-all-graph -fdump-rtl-all-graph then in addition to the dump files listed above, :program:`cc1` will also generate :file:`.dot` files, suitable for use with GraphViz. My favorite :file:`.dot` file viewer is `xdot `_, which shows :file:`test.c.013t.cfg.dot` as follows: .. image:: images/test.c.013t.cfg.dot-in-xdot.png which makes it easy to see the individual basic blocks, the statements within them, and the control flow linking them. gimple-SSA ********** After a few more optimization passes, the gimple-cfg IR is then converted to Static Single Assignment form (SSA). SSA form is commonly used inside compilers, as it makes many kinds of optimization much easier to implement. In SSA, every local variable is only ever assigned to once; if there are multiple assignments to a local variable, it gets split up into multiple versions. In our example, you can see the SSA form of the IR in :file:`test.c.021t.ssa`: .. code-block:: c ;; Function test (test, funcdef_no=0, decl_uid=1933, cgraph_uid=1, symbol_order=0) test (int a, int b, int c, int d) { int _1; int _2; int _3; int _8; int _9; : if (a_4(D) != 0) goto ; [INV] else goto ; [INV] : _1 = b_6(D) - c_5(D); _9 = d_7(D) * _1; // predicted unlikely by early return (on trees) predictor. goto ; [INV] : _2 = c_5(D) - b_6(D); _8 = d_7(D) * _2; // predicted unlikely by early return (on trees) predictor. : # _3 = PHI <_9(3), _8(4)> return _3; } and (in dot form) as :file:`test.c.021t.ssa.dot`: .. image:: images/test.c.021t.ssa.dot-in-xdot.png You can see that the single temporary ``D.1938`` from the earlier form of the IR has been split into three separate temporaries, where the two assignments: .. code-block:: c D.1938 = d * _1; and: .. code-block:: c D.1938 = d * _2; have now become these two assignments to separate temporaries: .. code-block:: c _9 = d_7(D) * _1; and: .. code-block:: c _8 = d_7(D) * _2; and at the point where control flow merges, we have a special construct called a "phi node" which assigns to the new temporary ``_3`` from either one of the ``_9`` or ``_8``, depending on whether control flow came from block 3 or block 4: .. code-block:: c # _3 = PHI <_9(3), _8(4)> You can see that the parameters ``b`` and ``c`` from the earlier form of the IR have also been numbered, so that the SSA form captures e.g. that we're accessing ``b_6(D)``, meaning version 6 of parameter ``b``, where the ``(D)`` means the initial value at the function entry: if code wrote to one of these parameters, the SSA form would have a different numbered version of it after the write. The official documentation for GCC's gimple-ssa form is `here `_. Once we're in gimple-SSA form, there are almost 200 optimization passes, which can be roughly divided into: * "intraprocedural" passes. These work on one function at a time. They have a "t" code in their dump file. For example, :file:`test.c.175t.switchlower` is the dump file for an optimization pass which converts gimple ``switch`` statements into lower-level gimple statements and control flow (which doesn't do anything in our example above, as it doesn't have any switch statements; try writing a simple C source file with a switch statement and see what it does) * "interprocedural passes" which consider all of the functions at once, such as which functions call which other functions. These have an "i" code in their dump file. An example is :file:`test.c.080i.inline` though given that our example has only one function, the interprocedural passes won't do anything useful The full set of optimizations passes can be see in GCC's source tree in the file `gcc/passes.def `_ After about 200 gimple optimizations passes, we're done with the gimple-SSA form; its final state can be seen in :file:`test.c.235t.optimized`: .. code-block:: c ;; Function test (test, funcdef_no=0, decl_uid=1933, cgraph_uid=1, symbol_order=0) test (int a, int b, int c, int d) { int _1; int _2; int _3; int _8; int _9; [local count: 1073741824]: if (a_4(D) != 0) goto ; [50.00%] else goto ; [50.00%] [local count: 536870913]: _1 = b_6(D) - c_5(D); _9 = _1 * d_7(D); goto ; [100.00%] [local count: 536870913]: _2 = c_5(D) - b_6(D); _8 = _2 * d_7(D); [local count: 1073741824]: # _3 = PHI <_9(3), _8(4)> return _3; } and :file:`test.c.235t.optimized.dot`: .. image:: images/test.c.235t.optimized.dot-in-xdot.png For our simple example, this hasn't been changed much since the initial conversion to SSA form; it's gained some estimates about how many times each basic block will be run (in lieu of real profiling data). RTL *** At this point, the gimple is converted to Register Transfer Language (RTL), a much lower-level representation of the code, which will allow us to eventually go all the way to assembler. The conversion happens in an optimization pass called "expand"; we can see the initial RTL form of the code in the :file:`test.c.237r.expand` dump file: .. code-block:: lisp ;; Function test (test, funcdef_no=0, decl_uid=1933, cgraph_uid=1, symbol_order=0) ;; Generating RTL for gimple basic block 2 ;; Generating RTL for gimple basic block 3 ;; Generating RTL for gimple basic block 4 ;; Generating RTL for gimple basic block 5 try_optimize_cfg iteration 1 Merging block 3 into block 2... Merged blocks 2 and 3. Merged 2 and 3 without moving. Redirecting jump 14 from 6 to 7. Merging block 6 into block 5... Merged blocks 5 and 6. Merged 5 and 6 without moving. Removing jump 22. try_optimize_cfg iteration 2 ;; ;; Full RTL generated for this function: ;; (note 1 0 7 NOTE_INSN_DELETED) (note 7 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK) (insn 2 7 3 2 (set (reg/v:SI 85 [ a ]) (reg:SI 5 di [ a ])) "test.c":2:1 -1 (nil)) (insn 3 2 4 2 (set (reg/v:SI 86 [ b ]) (reg:SI 4 si [ b ])) "test.c":2:1 -1 (nil)) (insn 4 3 5 2 (set (reg/v:SI 87 [ c ]) (reg:SI 1 dx [ c ])) "test.c":2:1 -1 (nil)) (insn 5 4 6 2 (set (reg/v:SI 88 [ d ]) (reg:SI 2 cx [ d ])) "test.c":2:1 -1 (nil)) (note 6 5 9 2 NOTE_INSN_FUNCTION_BEG) (insn 9 6 10 2 (set (reg:CCZ 17 flags) (compare:CCZ (reg/v:SI 85 [ a ]) (const_int 0 [0]))) "test.c":3:6 -1 (nil)) (jump_insn 10 9 11 2 (set (pc) (if_then_else (eq (reg:CCZ 17 flags) (const_int 0 [0])) (label_ref 16) (pc))) "test.c":3:6 -1 (int_list:REG_BR_PROB 536870916 (nil)) -> 16) (note 11 10 12 4 [bb 4] NOTE_INSN_BASIC_BLOCK) (insn 12 11 13 4 (parallel [ (set (reg:SI 89) (minus:SI (reg/v:SI 86 [ b ]) (reg/v:SI 87 [ c ]))) (clobber (reg:CC 17 flags)) ]) "test.c":4:15 -1 (nil)) (insn 13 12 14 4 (parallel [ (set (reg:SI 84 [ ]) (mult:SI (reg:SI 89) (reg/v:SI 88 [ d ]))) (clobber (reg:CC 17 flags)) ]) "test.c":4:20 -1 (nil)) (jump_insn 14 13 15 4 (set (pc) (label_ref:DI 24)) "test.c":4:20 737 {jump} (nil) -> 24) (barrier 15 14 16) (code_label 16 15 17 5 2 (nil) [1 uses]) (note 17 16 18 5 [bb 5] NOTE_INSN_BASIC_BLOCK) (insn 18 17 19 5 (parallel [ (set (reg:SI 90) (minus:SI (reg/v:SI 87 [ c ]) (reg/v:SI 86 [ b ]))) (clobber (reg:CC 17 flags)) ]) "test.c":6:15 -1 (nil)) (insn 19 18 24 5 (parallel [ (set (reg:SI 84 [ ]) (mult:SI (reg:SI 90) (reg/v:SI 88 [ d ]))) (clobber (reg:CC 17 flags)) ]) "test.c":6:20 -1 (nil)) (code_label 24 19 27 7 1 (nil) [1 uses]) (note 27 24 25 7 [bb 7] NOTE_INSN_BASIC_BLOCK) (insn 25 27 26 7 (set (reg/i:SI 0 ax) (reg:SI 84 [ ])) "test.c":7:1 -1 (nil)) (insn 26 25 0 7 (use (reg/i:SI 0 ax)) "test.c":7:1 -1 (nil)) and :file:`test.c.237r.expand.dot`: .. image:: images/test.c.237r.expand.dot-in-xdot.png The RTL form of the IR is much closer to assembler: whereas gimple works in terms of variables of specific data types, RTL instructions work in terms of low-level operations on an arbitrary number of registers of specific bit sizes. There are about 100 RTL optimization passes, which solve problems such as: * implementing function call/return, parameter passing, and the stack of frames, in terms of what actually exists at the CPU level (the "calling conventions" of an ABI) * using the registers actually available on the CPU, rather than blithely assuming that there's an arbitrary number of registers for every function, a process called register allocation * using the instructions and addressing modes actually available on the CPU, rather than assuming an ideal set of combinations * various optimizations, such as scheduling instructions so that they run efficiently on the target CPU (e.g. handling delay slots) * converting the CFG that RTL inherited from gimple into a flat series of instructions connected by jumps (honoring constraints such as limitations on how many bytes a jump instruction can go) The official documentation for RTL is `here `_. The "final" form of RTL *********************** Eventually the RTL form is suitable for output in assembler, in an optimization pass called "final" (which, annoyingly, is no longer the final pass); the :file:`test.c.320r.final` dump file looks like this: .. code-block:: lisp ;; Function test (test, funcdef_no=0, decl_uid=1933, cgraph_uid=1, symbol_order=0) test Dataflow summary: ;; fully invalidated by EH 0 [ax] 1 [dx] 2 [cx] 4 [si] 5 [di] 8 [st] 9 [st(1)] 10 [st(2)] 11 [st(3)] 12 [st(4)] 13 [st(5)] 14 [st(6)] 15 [st(7)] 17 [flags] 18 [fpsr] 20 [xmm0] 21 [xmm1] 22 [xmm2] 23 [xmm3] 24 [xmm4] 25 [xmm5] 26 [xmm6] 27 [xmm7] 28 [mm0] 29 [mm1] 30 [mm2] 31 [mm3] 32 [mm4] 33 [mm5] 34 [mm6] 35 [mm7] 36 [r8] 37 [r9] 38 [r10] 39 [r11] 44 [xmm8] 45 [xmm9] 46 [xmm10] 47 [xmm11] 48 [xmm12] 49 [xmm13] 50 [xmm14] 51 [xmm15] 52 [xmm16] 53 [xmm17] 54 [xmm18] 55 [xmm19] 56 [xmm20] 57 [xmm21] 58 [xmm22] 59 [xmm23] 60 [xmm24] 61 [xmm25] 62 [xmm26] 63 [xmm27] 64 [xmm28] 65 [xmm29] 66 [xmm30] 67 [xmm31] 68 [k0] 69 [k1] 70 [k2] 71 [k3] 72 [k4] 73 [k5] 74 [k6] 75 [k7] ;; hardware regs used 7 [sp] ;; regular block artificial uses 7 [sp] ;; eh block artificial uses 7 [sp] 16 [argp] ;; entry block defs 0 [ax] 1 [dx] 2 [cx] 4 [si] 5 [di] 7 [sp] 20 [xmm0] 21 [xmm1] 22 [xmm2] 23 [xmm3] 24 [xmm4] 25 [xmm5] 26 [xmm6] 27 [xmm7] 36 [r8] 37 [r9] ;; exit block uses 0 [ax] 7 [sp] ;; regs ever live 0 [ax] 1 [dx] 2 [cx] 4 [si] 5 [di] 17 [flags] ;; ref usage r0={4d,5u} r1={2d,3u} r2={1d,1u} r4={2d,3u} r5={1d,1u} r7={1d,4u} r17={5d,1u} r20={1d} r21={1d} r22={1d} r23={1d} r24={1d} r25={1d} r26={1d} r27={1d} r36={1d} r37={1d} ;; total ref usage 44{26d,18u,0e} in 11{11 regular + 0 call} insns. (note 1 0 7 NOTE_INSN_DELETED) (note 7 1 48 2 [bb 2] NOTE_INSN_BASIC_BLOCK) (note 48 7 2 2 NOTE_INSN_PROLOGUE_END) (note 2 48 6 2 NOTE_INSN_DELETED) (note 6 2 5 2 NOTE_INSN_FUNCTION_BEG) (insn:TI 5 6 9 2 (set (reg/v:SI 0 ax [orig:88 d ] [88]) (reg:SI 2 cx [97])) "test.c":2:1 67 {*movsi_internal} (expr_list:REG_DEAD (reg:SI 2 cx [97]) (nil))) (insn 9 5 10 2 (set (reg:CCZ 17 flags) (compare:CCZ (reg:SI 5 di [94]) (const_int 0 [0]))) "test.c":3:6 7 {*cmpsi_ccno_1} (expr_list:REG_DEAD (reg:SI 5 di [94]) (nil))) (jump_insn 10 9 11 2 (set (pc) (if_then_else (eq (reg:CCZ 17 flags) (const_int 0 [0])) (label_ref 16) (pc))) "test.c":3:6 736 {*jcc} (expr_list:REG_DEAD (reg:CCZ 17 flags) (int_list:REG_BR_PROB 536870916 (nil))) -> 16) (note 11 10 12 3 [bb 3] NOTE_INSN_BASIC_BLOCK) (insn:TI 12 11 13 3 (parallel [ (set (reg:SI 4 si [89]) (minus:SI (reg/v:SI 4 si [orig:86 b ] [86]) (reg/v:SI 1 dx [orig:87 c ] [87]))) (clobber (reg:CC 17 flags)) ]) "test.c":4:15 254 {*subsi_1} (expr_list:REG_DEAD (reg/v:SI 1 dx [orig:87 c ] [87]) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil)))) (insn:TI 13 12 52 3 (parallel [ (set (reg:SI 0 ax [orig:84 ] [84]) (mult:SI (reg/v:SI 0 ax [orig:88 d ] [88]) (reg:SI 4 si [89]))) (clobber (reg:CC 17 flags)) ]) "test.c":4:20 378 {*mulsi3_1} (expr_list:REG_DEAD (reg:SI 4 si [89]) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil)))) (insn 52 13 45 3 (use (reg/i:SI 0 ax)) -1 (nil)) (jump_insn:TI 45 52 46 3 (simple_return) "test.c":4:20 767 {simple_return_internal} (nil) -> simple_return) (barrier 46 45 16) (code_label 16 46 17 4 2 (nil) [1 uses]) (note 17 16 18 4 [bb 4] NOTE_INSN_BASIC_BLOCK) (insn:TI 18 17 19 4 (parallel [ (set (reg:SI 1 dx [90]) (minus:SI (reg/v:SI 1 dx [orig:87 c ] [87]) (reg/v:SI 4 si [orig:86 b ] [86]))) (clobber (reg:CC 17 flags)) ]) "test.c":6:15 254 {*subsi_1} (expr_list:REG_DEAD (reg/v:SI 4 si [orig:86 b ] [86]) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil)))) (insn:TI 19 18 26 4 (parallel [ (set (reg:SI 0 ax [orig:84 ] [84]) (mult:SI (reg/v:SI 0 ax [orig:88 d ] [88]) (reg:SI 1 dx [90]))) (clobber (reg:CC 17 flags)) ]) "test.c":6:20 378 {*mulsi3_1} (expr_list:REG_DEAD (reg:SI 1 dx [90]) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil)))) (insn 26 19 55 4 (use (reg/i:SI 0 ax)) "test.c":7:1 -1 (nil)) (note 55 26 50 4 NOTE_INSN_EPILOGUE_BEG) (jump_insn:TI 50 55 51 4 (simple_return) "test.c":7:1 767 {simple_return_internal} (nil) -> simple_return) (barrier 51 50 47) (note 47 51 0 NOTE_INSN_DELETED) There isn't a corresponding :file:`.dot` dump file, as by this point the CFG has been flattened away into a stream of instructions. There's a fairly direct correspondence between the above and the generated assembler file, although you have to know a bit about RTL to see it. For example if we just look at the ``insn`` instructions, we can see that e.g. the final multiply instruction above: .. code-block:: lisp (insn:TI 19 18 26 4 (parallel [ (set (reg:SI 0 ax [orig:84 ] [84]) (mult:SI (reg/v:SI 0 ax [orig:88 d ] [88]) (reg:SI 1 dx [90]))) (clobber (reg:CC 17 flags)) ]) "test.c":6:20 378 {*mulsi3_1} (expr_list:REG_DEAD (reg:SI 1 dx [90]) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil)))) i.e. "set %eax to the result of multipying %eax and %edx, clobbering the CC register" becomes: .. code-block:: asm imull %edx, %eax # tmp90, in the output :file:`test.s` assembler file.