DesktopLinuxAsm - Tips
Coding Tips
The following coding tips focus on reducing code size.
In some cases larger code fragments will run faster,
but in general the smaller code will be faster. This
is constantly debated, and often the only way to be sure
is to measure execution speed. It may be that the
processor cache can hold short code and run it faster
than long code sequences that have faster instructions.
1. setting registers
One of the most common operations in assembly is to move
a constant value into a register. Often the value moved
is zero. Here is the obvious instruction to load zero:
B800000000 mov eax,0
The nasm generated code (on left) says this instruction is
five bytes long and uses the operation code "B8". Another
way to do this is:
31C0 xor eax,eax
This is two bytes long, but modifies the flag states.
Another common value to load into registers is -1. Once
again the typical way to do this is:
B8ffffffff mov eax,-1
Another way to generate -1 using one less byte is:
31C0 xor eax,eax
F7D0 not eax
A even shorter way to generate -1 is:
83C8FF or eax,byte -1
If we want byte values from 1 to 254 (either negative
or positive) the following macro does it in only 3 bytes
%macro _mov 2
push byte %2
pop %1
%endmacro
_mov eax,2 ;example of _mov macro usage
6A02 <1> push byte %2
58 <1> pop %1
Note: To keep code simple, use macro names that clearly
describe the operation performed. This makes reading the
source code easier, but beware, debuggers that disassemble
the code only see the push and pop.
2. checking a register value
A very common operation is to check if register is zero. The
obvious way to do this is:
83F800 cmp eax,byte 0
740B je match
A better way to do this and save one byte is:
09C0 or eax,eax
7410 jz match
An even better way (depending upon the design) is to use
the "jecxz" or "loop" instruction and avoid the test:
E312 jecxz match
E210 loop match
The loop instruction is consided slow and avoided by
some programmers. So.. if speed is important, do some
code timing. I've not found the use of "loop" to be
slow.
The the "dec" and "inc" instructions are one byte long on
32 bit processors and provide an alternative way to check
for registers with "1" or "-1". Here are the traditional
test for 1 and the "dec" test.
83F801 cmp eax,byte 1
740B je match
The "dec" test (modifies the register)
48 dec eax ;set zero flag if eax=1
7408 jz match ;jmp if eax was = 1
To check for -1 use:
40 inc eax ;set zero flag if eax=-1
7400 jz match
3. Register math
The LEA instruction (load effective address) can be useful to multiply
registers and add values. Simple multipies are usually better done
with "MUL" or one of the shift instructions.
8D0400 lea eax,[eax*2] ;eax * 2
8D0440 lea eax,[eax+eax*2] ;eax * 3
8D048500000000 lea eax,[eax*4] ;eax * 4
8D0480 lea eax,[eax*4+eax] ;eax * 5
8D04C500000000 lea eax,[eax*8] ;eax * 8
8D04C0 lea eax,[eax*8+eax] ;eax * 9
If we want to add in a constant value or register
then use of LEA becomes a good choice.
8D8418F4010000 lea eax,[eax+ebx+500]
The best way to do a multiply is using shift and
adds as follows:
D1E0 shl eax,1 ;eax * 2
89C3 mov ebx,eax
D1E0 shl eax,1
01D8 add eax,ebx ;eax * 3
C1E002 shl eax,2 ;eax * 4
89C3 mov ebx,eax
C1E002 shl eax,2
01D8 add eax,ebx ;eax * 5
4. Avoiding branches
Programs execute a lot faster if they do not have to jump
to a new location very often. This suggests we use
decisions that do not involve the conditional jump
instructions. There are several technques to do this
using the "XOR" instruction. Here is one example:
choose regiser value without branch
if (eax != 0) eax = ebx; else eax = ecx;
3D01000000 cmp eax,1
19C0 sbb eax,eax
21C1 and ecx,eax
35FFFFFFFF xor eax,-1
21D8 and eax,ebx
09C8 or eax,ecx
The disadvantage of the above code is complexity. It
makes reading code more difficult and in most
cases isn't necessary.
5. Creating registers
Programs run a lot faster if all data is kept in registers.
This isn't a problem for simple loops, what if we
run out of registers. Our options are:
1. Free up a register by pushing it on the
stack.
2. Free up a regiser by moving it to memory
3. Use a CPU special regsiter that is dedicated
for other purposes.
4. Split a register into two or more regsiters.
Options 3 and 4 are seldom used, and option
4 has promise. The general registers eax,
ebx,ecx,edx can be split into byte or word
registers, and ebp,esi,edi can be split into
word regisers. If our application only needs
16bit for some registers we can define
over 14 word registers.
splitting a register into two 16bit registers
0FC8 bswap eax
; work with "ax #1"
0FC8 bswap eax
; work with "ax #2"
6. setting flags
- mov instructions do not set flags
- lea insruction does not set flags
- to set flags for register use "or eax,eax"
- the direction flag is initially set to "cld" and
is assumed by most AsmLib functions.
7. looping
- loops are most efficient if the loop back test is at end.
- the "loop" instuction works with a count in "ecx"
- the "jecxz" is often a good way to create loops using "ecx"
as a flag
8. divide error
Divide by zero or division that will overflow is an error
and can be detected by the following code.
3B15[87000000] cmp edx,[divisor]
730A jnb error
F735[87000000] div dword [divisor]
9. Set register to state of carry flag
The following code sets eax to zero if eax=ebx. If eax
does not equal ebx then set it to -1. This
is useful in setting a flag without using a conditional
jmp.
39D8 cmp eax,ebx
19C0 sbb eax,eax ;eax=result
10. Set edx to 0 or -1
Often we know what is in eax and need a constant in edx. The
following code will set edx.
B805000000 mov eax,5
99 cdq ;set edx to 0 if eax positive
B8FFFFFFFF mov eax,-1
99 cdq ;set edx to -1 if eax negative
If we wanted to clear both eax and edx, the shortest code
is:
31C0 xor eax,eax
99 cdq
11. Coding Style
The use of structures allow variables to be kept on
the stack and they are essential in describing data
records. The following code defines a structure and
sets up a stack frame to hold the structure:
struc animal
.dog resd 1
.cat resd 1
animal_struc_size:
endstruc
start:
81EC08000000 sub esp,animal_struc_size ;make room on stack
C7042401000000 mov [esp+animal.dog],dword 1 ;initialize dog
C744240402000000 mov [esp+animal.cat],dword 2 ;initialize cat
(program body here)
81C408000000 add esp,animal_struc_size ;destroy struc on stack
C3 ret
12. Avoiding spaghetti
Complexity and spaghetti code increases if we:
1. jump back often
2. use a lot of pushes and pops
3. fail to document register states with comments
4. use large blocks of code rather than small blocks
with inputs and outputs identified.
13. Converting C to Asm
It is easy to convert most "c" programs to nasm assembler using
the AsmSrc program, but best results are obtained if debug information
was provided by the compiler. Here are some tips to convert
"C" programs.
- After generating source, strip the library information off.
- Add a _start label at entry and make it a global.
- compile the program and fix any compile errors.
- Test the program and get it working using Asmbug.
- Create a structure describing the stack frame.
- Replace all the "ebp+xx" references with structure references.
|