Google
 
   
Login
Username:

Password:


Lost Password?

Register now!
Search
Main Menu
top books
Polls
What do you think about php-deluxe.net?
Excellent!
Cool
Hmm..not bad
What the hell is this?
encyclopedia
recommendation
compare webbrowser
Freenet DSL
Who's Online
7 user(s) are online (7 user(s) are browsing encyclopedia)

Members: 0
Guests: 7

more...
browser tip
Unix Befehle
manual of unix befehle
recommendation!
Sponsored
partner

Self-modifying code

In computer science, self-modifying code is code that modifies itself on purpose.

Self-modifying code is straightforward to write when using assembly language. It is also supported by some high level language interpreters such as SNOBOL4 or the Lisp programming language. It is more difficult to implement on compilers but compilers such as Clipper programming language and Spitbol make a fair attempt at it. Batch programming scripting languages often involve self-modifying code as well.

Reconfigurable computing could be said to be self-modifying hardware. Reconfigurable computing blurs the border between software and hardware.

=Assembly style self-modifying code=

The kinds of self-modifying code that are used in assembly can be for various purposes:

# Optimisation of a state dependent loop. # Runtime code generation, or specialisation of an algorithm in runtime or loadtime (which is popular, for example, in the domain of real-time graphics). # Altering of inlined state of an object (computer science), or simulating the high level construction of Closure_(computer_science). # Patching of subroutine address calling, as done usually at load time of DLL. Whether this is regarded as self-modifying code or not is a case of terminology.

The second and third types are probably the kinds mostly used also in high-level languages, such as LISP.

== Self-modifying code used to optimize a state-dependent loop ==

Pseudo-code example:

repeat N times { if STATE is 1 increase A by one else decrease A by one do something with A }

Self-modifying code in this case would simply be a matter of rewriting the loop like this:

repeat N times { increase A by one do something with A } when STATE has to switch { replace the Opcode increase above with the opcode to decrease }

Note that 2-state replacement of the opcode can be easily written as xor var at address with the value opcodeOf(Inc) xor opcodeOf(dec) .

Choosing this solution will have to depend of course on the value of N and the frequency of state changing.

== Attitudes towards self-modifying code ==

Some claim that use of self-modifying code is not recommended when a viable alternative exists, because such code can be difficult to understand and maintain.

Others simply view self-modifying code as something one would be doing while editing code (in the above example, replacing a line, or keyword), only done in run-time.

Self-modifying code was used in the early days of computers in order to save memory space, which was limited. It was also used to implement Subroutine calls and returns when the instruction set only provided simple branching or skipping instructions to vary the flow of control (this is still relevant in certain ultra-RISC architectures, at least theoretically, e.g. one such system has a sole branching instruction with three operands: subtract-and-branch-if-negative).

== Self-modifying code used as camouflage ==

Self-modifying code was used to hide copy protection instructions in 1980s DOS based games. The floppy disk drive access instruction int 0x13 would not appear in the executable program s image but it would be written into the executable s memory image after the program started executing.

Self-modifying code is also sometimes used by programs that do not want to reveal their presence — such as computer viruses and some shellcodes. Viruses and shellcodes that use self-modifying code mostly do this in combination with polymorphic code. Polymorphic viruses are sometimes called primitive self-mutators. Modifying a piece of running code is also used in certain attacks, such as buffer overflows.

== Operating systems and self-modifying code ==

Because of the security implications of self-modifying code, some operating systems go to lengths to rule it out. Recent versions of OpenBSD, for instance, have a feature known as W^X (for write xor execute , meaning that, for a given memory page, a program can only write, or execute, but not both ). Versions of OpenBSD with W^X do not allow alteration of memory pages which harbor executable code. Programs which depend upon rewriting their own machine code cannot execute in such an environment.

Some people have gone so far as to build hardware that cannot modify code — that makes self-modifying code impossible. Such systems are completely immune to buffer-overflow exploits. If the hardware completely locks out any possibility of software modifying some location, then later executing that location, then the only code that can be executed is the original code (in ROM) physically installed into the computer.

However, the whole point of an operating system (and the von Neumann architecture in general) involves loading new programs into memory, and then executing them, causing criticism that blocking self-modifying code is overzealous and blocks a useful tool simply because it can be used for viruses.

Nevertheless, most central processing units are used in systems that cannot modify code, such as Keyboard technology.

== Just-in-time compilers ==

Just in time compilers for Java programming language and other programming languages often compile short blocks of machine code and then immediately execute them.

== Interaction of cache and self-modifying code ==

In some cases self-modifying code executes more slowly on modern processors. This is because a modern processor will usually try to keep blocks of code in its CPU cache memory. Each time the program rewrites a part of itself, the rewritten part must be loaded into the cache again, which results in a slight delay.

The cache invalidation issue on modern processors usually means that self-modifying code would still be faster only when the modification will occur rarely, such as in the case of a state switching inside an inner loop. This consideration is not unique to processors with code cache, since on any processor rewriting the code never comes for free.

Most modern processors load the machine code before they execute it, which means that if an instruction that is too near the instruction pointer is modified, the processor will not notice, but instead execute the code as it was before it was modified. See Prefetch Input Queue (PIQ).

=Example NASM-syntax self-modifying x86-assembly algorithm that determines the size of the Prefetch Input Queue=

code_starts_here: xor cx, cx ; zero register cx xor ax, ax ; zero register ax mov dx, cs ; change dx to edx for protected mode. mov [code_segment], dx ; calculate codeseg in the far jump below (edx here too) around: cmp ax, 1 ; check if ax has been alterd je found_size mov [nop_field+cx], 0x90 ; 0x90 = opcode nop (NO oPeration) inc cx db 0xEA ; 0xEA = opcode far jump dw flush_queue ; should be followed by offset (rm = dw , pm = dd ) code_segment: dw 0 ; and then the code segment (calculated above) flush_queue: mov [nop_field+cx], 0x40 ; 0x40 = opcode inc ax (INCrease ax) nop_field: nop times 256 jmp around found_size: ; ; register cx now contains the size of the PIQ ; this code is for realmode, but it could easily be changed into ; running for protected mode as well. just change the dw for ; the offset to dd . you need also change dx to edx at the top as ; well. (dw and dx = 16 bit addressing, dd and edx = 32 bit addressing) ;

What this code essentially does is change the execution flow, and determine by brute force how large the PIQ is. How far away do I have to change the code in front of me for it to affect me If it is too near (it is already in the PIQ) the update will not have any affect. If it is far enough, the change of the code will affect the program and the program has then found the size of the processors PIQ. If this code is being executed in protected mode, the operating system must not make any context switch, or else this program may return the wrong value.

= See also =

  • Self-replication
  • Quine