Assembly language for Power Architecture, Part 3
A closer look at branching instructions and registers
Content series:
Flash is what is currently the standard (with HTML5 on the rise to replace it) that lets you do many things that involve the internet. Notes: You can hack 11.5 to work, read it here. PowerPC G3;G4;G5 / OS X 10.4+10.5 - 7.6MB. Branches in PowerPC make use of three special-purpose registers: the condition register, the count register, and the link register. About this 'Assembly language for Power Architecture' series The POWER5 processor is a 64-bit workhorse used in a variety of settings. This latest version of Safari 5 coupled with HTML5 videos on YouTube makes older and slower systems completely capable of watching YouTube vidoes much more smoothly and on lower CPU power. Even our 1GHz iMac G4 plays all HTML5 videos without a hitch.
Html 5 on PowerPc Macs Old PowerMacs, iBooks, iMacs and Mac Mini’s that use PowerPC architecture aren’t supported by Apple for a few good time now. In the other hand, most of them are still good shape especially the mighty PowerMac G5. Mac Specs: By Processor: PowerPC G4. Complete technical specifications for every Apple Mac using the PowerPC G4 processor are listed below for your convenience. For other processors, please refer to the main By Processor page. Also see: Macs By Series, Mac Clones, By Year, By Case Type and Currently Shipping. If you find this page useful, please Bookmark & Share it.
This content is part # of # in the series: Assembly language for Power Architecture, Part 3
This content is part of the series:Assembly language for Power Architecture, Part 3
Stay tuned for additional content in this series.
Branching registers
Branches in PowerPC make use of three special-purpose registers: the condition register, the count register, and the link register.
The condition register
The condition register consists conceptually of sevenfields. A field is a segment of four bits usedto store status information about the results of an instruction. Twoof the fields are somewhat special-purpose, and will be covered shortly, and the remaining fields are available for general use. The fields are named cr0
through cr7
.
The first field, cr0
is used for the results offixed-point computation instructions, which use non-immediate operands(with a few exceptions). The result of the computation is comparedwith zero, and the appropriate bits are set (negative, zero, orpositive). To indicate in a computational instruction that you want it to setcr0
, you simply add a period (.
)to the end of the instruction. For example, add 4, 5,6
adds register 5 to register 6 and stores the result inregister 4, without setting any status bits in cr0
.However, add. 4, 5, 6
does the same thing, but setsthe bits in cr0
based on the computed value.cr0
is also the default field for use on compare instructions.
The second field (called cr1
) is used by floating-point instructions using the period after the instruction name. Floating-point computation is outside the scope of this article.
Each field has four bits. The usage of those bits varies with theinstruction being used. Here are their possible uses (floating-pointuses are listed but not described):
Condition register field bits
Bit | Mnemonic | Fixed-point comparison | Fixed-point computation | Floating-point comparison | Floating-point computation |
---|---|---|---|---|---|
0 | lt | Less than | Negative | Less than | Exception summary |
1 | gt | Greater than | positive | Greater than | Enabled exception summary |
2 | eq | Equal | Zero | Equal | Invalid operation exception summary |
3 | so | Summary overflow | Summary overflow | Unordered | Overflow exception |
Later you see how to access these fields both implicitly and directly.
The condition register can be loaded to or from a general-purpose register using mtcr
, mtcrf
, and mfcr
. mtcr
moves a specified general-purpose register to the condition register. mfcr
moves the condition register to a general-purpose register. mtcrf
loads the condition register from a general-purpose register, but only the fields specified by an 8-bit mask, which is the first operand.
Here are some examples:
Listing 1. Condition register transfer examples
The count and link registers
The link register (called LR
) is a special-purpose register that holds return addresses from branch instructions. All branch instructions can be told to set the link register, which, if the branch is taken, sets the link register to the address of the instruction immediately following the current instruction. Branch instructions set the link register by appending the letter l
to the end of the instruction. For instance, b
is an unconditional branch instruction, and bl
is an unconditional branch instruction that sets the link register.
The count register (called CTR
) is a special-purpose register designed to hold loop counters. Special branch instructions can decrement the count register and/or conditionally branch depending on whether CTR
has reached zero.
Both the link and count registers can be used as a branch destination. bctr
branches to the address specified in the count register, and blr
branches to the address specified in the link register.
The link and count registers can also be loaded and copied from general-purpose registers. For the link register, mtlr
moves a given register value to the link register, and mflr
moves a value from the link register to a general-purpose register. mtctr
and mfctr
do the same for the count register.
Unconditional branching
Unconditional branching on PowerPC instruction sets uses the I-Form instruction format:
I-Form instruction format
Opcode
Absolute or relative branch address
Absolute address bit -- If this field is set, the instruction is interpreted as an absolute address, otherwise it is interpreted as a relative address
Link bit -- If this field is set, the instruction sets the link register with the address of the next instruction
As mentioned earlier, adding the letter l
onto a branch instruction causes the link bit to be set, so that the 'return address' (the instruction after the branch) is stored in the link register. If you affix the letter a
at the end (it comes after the l
, if that is used), then the address specified is an absolute address (this is not often used in user-level code, because it limits the branch destinations too much).
Listing 2 illustrates unconditional branches, and then exits (enter as branch_example.s
):
Listing 2. Unconditional branching examples
Assemble, link, and run it like this:
as -a64 branch_example.s -o branch_example.o
ld -melf64ppc branch_example.o -o branch_example
./branch_example
Notice that the targets for both b
and ba
are specified the same way in assembly language, despite the fact that they are coded differently in the instruction. The assembler and linker take care of converting the target address into a relative or absolute address for you.
Conditional branching
Comparing registers
The cmp
instruction is used to compare registers with other registers or immediate operands, and set the appropriate status bits in the condition register. By default, fixed-point compare instructions use cr0
to store the result, but the field can also be specified as an optional first operand. Compare instructions are written as in Listing 3:
Listing 3. Examples of compare instructions
As you can see, the d
specifies the operands as doublewords while the w
specifies the operands as words. The i
indicates that the last operand is an immediate value instead of a register, and the l
tells the processor to do unsigned (also called logical) comparisons instead of signed comparisons.
Each of these instructions set the appropriate bits in the condition register (as outlined earlier in the article), which can then be used by a conditional branch instruction.
Basics of conditional branching
Conditional branches are a lot more flexible than unconditional branches, but it comes at a cost of branchable distance. Conditional branches use the B-Form instruction format:
The B-Form instruction format
Opcode
Specifies the options used regarding how the bit is tested, whether and how the counter register is involved, and any branch prediction hints (called the BO
field)
Specifies the bit in the condition register to test (called the BI
field)
Absolute or Relative Address
Addressing Mode -- when set to 0 the specified address is considered a relative address; when set to 1 the address is considered an absolute address
Link Bit -- when set to 1 the link register is set to the address following the current instruction; when set to 0 the link register is not set
As you an see, a full 10 bits are used to specify the branch mode and condition, which limits the address size to only 14 bits (only a 16K range). This is usable for small jumps within a function, but not much else. To conditionally call a function outside of this 16K range, the code would need to do a conditional branch to an instruction containing an unconditional branch to the right location.
The basic forms of the conditional branch look like this:
bc BO, BI, address
bcl BO, BI, address
bca BO, BI, address
bcla BO, BI, address
In this basic form, BO
and BI
are numbers. Thankfully, we don't have to memorize all the numbers and what they mean. The extended mnemonics (described in the first article) of the PowerPC instruction set come to the rescue again, and we can avoid having to memorize all of the field numbers. Like unconditional branches, appending an l
to the instruction name sets the link register and appending an a
makes the instruction use absolute addressing instead of relative addressing.
Html5 Powerpoint Presentation
For a simple compare and branch if equal, the basic form (not using the extended mnemonics) looks like this:
Listing 4. Basic form of the conditional branch
bc
stands for 'branch conditionally.' The 12
(the BO
operand) means to branch if the given condition register field is set, with no branch prediction hint, and 2
(the BI
operand) is the bit of the condition register to test (it is the equal bit). Now, very few people, especially beginners, are going to be able to remember all of the branch code numbers and condition register bit numbers, nor would it be useful. The extended mnemonics make the code clearer for reading, writing, and debugging.
There are several different ways to specify the extended mnemonics. The way we will concentrate on combines the instruction name and the instruction's BO
operand (specifying the mode). The simplest ones are bt
and bf
. bt
branches if the given bit of the condition register is true, and bf
branches if the given bit of the condition register is false. In addition, the condition register bit can be specified with mnemonics as well. If you specify 4*cr3+eq
this will test bit 2 of cr3
(the 4*
is there because each field is four bits wide). The available mnemonics for each bit of the bit fields were given earlier in the description of the condition register. If you only specify the bit without specifying the field, the instruction will default to cr0
.
Here are some examples:
Listing 5. Simple conditional branches
Another set of extended mnemonics combines the instruction, the BO
operand, and the condition bit (but not the field). These use what are more-or-less 'traditional' mnemonics for various kinds of common conditional branches. For example, bne my_destination
(branch if not equal to my_destination
) is equivalent to bf eq, my_destination
(branch if the eq
bit is false to my_destination
). To use a different condition register field with this set of mnemonics, simply specify the field in the operand before the target address, such as bne cr4, my_destination
. These are the branch mnemonics following this pattern: blt
(less than), ble
(less than or equal), beq
(equal), bge
(greater than or equal), bgt
(greater than), bnl
(not less than), bne
(not equal), bng
(not greater than), bso
(summary overflow), bns
(not summary overflow), bun
(unordered - floating point specific), and bnu
(not unordered - floating-point specific).
All of the mnemonics and extended mnemonics can have l
and/or a
affixed to them to enable the link register or absolute addressing, respectively.
Using the extended mnemonics allows a much more readable and writable programming style. For the more advanced conditional branches, the extended mnemonics are more than just helpful, they are essential.
Html Powerpoint Presentation
Additional condition register features
Because the condition register has multiple fields, different computations and comparisons can use different fields, and then logical operations can be used to combine the conditions together. All of the logical operations have the following form: cr<opname> target_bit, operand_bit_1, operand_bit_2
. For example, to do a logical and
on the eq
bit of cr2
and the lt
bit of cr7
, and have it stored in the eq
bit of cr0
, you would write: crand 4*cr0+eq, 4*cr2+eq, 4*cr7+lt
.
You can move around condition register fields using mcrf
. To copy cr4
to cr1
you would write mcrf cr1, cr4
.
The branch instructions can also give hints to the branch processor for branch prediction. On most conditional branch instructions, appending a +
to the instruction will signal to the branch processor that this branch will probably be taken. Appending a -
to the instruction will signal that this branch will probably not be taken. However, this is usually not necessary, as the branch processor in the POWER5 CPU is usually able to do branch prediction quite well.
Using the count register
The count register is a special-purpose register used for a loop counter. The BO
operand of the conditional branch (controlling the mode) can be used, in addition to specifying how to test condition register bits, to decrement and test the count register. There are two operations you can do with the count register:
- decrement the count register and branch if it becomes zero
- decrement the count register and branch if it becomes nonzero
These count register operations can either be used on their own or in conjunction with a condition register test.
In the extended mnemonics, the count register semantics are specified by adding either dz
or dnz
immediately after the b
. Any additional condition or instruction modifier is added after that. So, to have a loop repeat 100 times, you would load the count register with the number 100, and use bdnz
to control the loop. Here is how the code would look:
Listing 6. Counter-controlled loop example
You can also combine the counter test with other tests. For instance, a loop might need to have an early exit condition. The following code demonstrates an early exit condition when register 24 is equal to register 28.
Listing 7. Count register combined branch example
So, rather than having to add an additional conditional branch instruction, all that is needed is the comparison instruction, and the conditional branch is merged into the loop counter branch.
Putting it together
Now we will put this information to practical use.
The first program will be a rewrite of the maximum value program we entered in the first article, and rewrite it according to what we have learned. The first version used a register to hold the current address being read from, and the code used indirect addressing to load the value. What this program will do is use an indexed-indirect addressing mode, with a register for the base address and a register for the index. In addition, rather than the index starting at zero and going forward, the index will count from the end to the beginning in order to save an extra compare instruction. The decrement can implicitly set the condition register (as opposed to an explicit compare with zero), which can then be used by a conditional branch instruction. Here is the new version (enter as max_enhanced.s
):
Listing 8. Maximum value program enhanced version
Assemble, link, and execute as before:
The loop in this program is approximately 15% faster than the loop in the first article because (a) we've shaved off several instructions from the main loop by using the status register to detect the end of the list when we decrement register 5 and (b) the program is using different condition register fields for the comparison (so that the result of the decrement can be held for later).
Note that using the link register in the call to set_new_maximum
is not strictly necessary. It would have worked just as well to set the return address explicitly rather than using the link register. However, this gives a good example of link register usage.
A quick introduction to simple functions
Html Powerpoint
The PowerPC ABI is fairly complex, and will be covered in much greater detail in the next article. However, for functions which do not themselves call any functions and follow a few easy rules, the PowerPC ABI provides a greatly simplified function-call mechanism.
In order to qualify for the simplified ABI, your function must obey the following rules:
- It must not call any other function.
- It may only modify registers 3 through 12.
- It may only modify condition register fields
cr0
,cr1
,cr5
,cr6
, andcr7
. - It must not alter the link register, unless it restores it before calling
blr
to return.
When functions are called, parameters are sent in registers, starting with register 3 and going through register 10, depending on the number of parameters. When the function returns, the return value must be stored in register 3.
So let's rewrite our maximum value program as a function, and call it from C.
The parameters we should pass are the pointer to the array as the first parameter (register 3), and the size of the array as the second parameter (register 4). Then, the maximum value will be placed into register 3 for the return value.
So here is our program, reformulated as a function (enter as max_function.s
):
Listing 9. The maximum value program as a function
This is very similar to the earlier version, with the main exceptions being:
- The initial conditions are passed through parameters instead of hardcoded.
- The register usage within the function was modified to match the layout of the passed parameters.
- The extraneous usage of the link register for
set_new_maximum
was removed in order to preserve the link register's contents.
The C language data type the program is working with is unsigned long long
. This is quite cumbersome to write, so it would be better to typedef this as something like uint64
. Then, the prototype for the function would be:
Here is a short driver program to test our new function (enter as use_max.c
):
Listing 10. Simple C program using the maximum value function
To compile and run this program, simply do:
Notice that since we are actually doing formatted printing now instead of returning the value to the shell, we can make use of the entire 64-bit size of the array elements.
Simple function calls are very cheap as far as performance goes. The simplified function call ABI is fully standard, and provides an easy way to get started writing mixed-language programs which require the speed of custom assembly language in its core loops, and the expressiveness and ease-of-use of higher-level languages for the rest.
Conclusion
Knowing the ins and outs of the branch processor helps to write more efficient PowerPC code. Using the various condition register fields enables the programmer to save and combine conditions in interesting ways. Using the count register helps code efficient loops. Simple functions can enable even the novice programmer to write useful assembly language functions for use by a higher-level language program.
In the next article, I'll cover the PowerPC ABI for function calls, and learn all about how the stack functions on PowerPC platforms.
Downloadable resources
Related topics
Html Powerapps
- Read all articles in this series.
- Read all of Jon Bartlett's articles on developerWorks.
- The PowerPC Architecture Book part 1 is a nice reference of all of the instructions and extended mnemonics available for 32-bit and 64-bit PowerPC programming.
- The 64-bit PowerPC ELF Standard contains the ABI specifications for function call interfaces.
- With IBM trial software, available for download directly from developerWorks, build your next development project on Linux.