Google
 
   
Login
Username:

Password:


Lost Password?

Register now!
Search
Main Menu
service
top books
Polls
What do you think about php-deluxe.net?
Excellent!
Cool
Hmm..not bad
What the hell is this?
encyclopedia
recommendation
Freenet DSL
Who's Online
10 user(s) are online (7 user(s) are browsing encyclopedia)

Members: 0
Guests: 10

more...
partner

Soft error

In Electronics and Computing, an error is a signal or datum which is wrong. Errors may be caused by a defect, usually understood either to be a mistake in design or construction, or to be a broken component. A soft error is also a signal or datum which is wrong, but is not assumed to imply such a mistake or breakage. After observing a soft error, there is no implication that the system is any less reliable than before.

A soft error may be always recovered by rewriting correct data in place of erroneous data. Highly reliable systems use error correction to correct soft errors on the fly. However, in many systems, it may be impossible to discover the correct data, or even to discover that an error is present at all. In addition, before the correction can occur, the system may have experienced an outage. In this case, the recovery procedure must include a reboot.

Soft errors involve changes to data - the electrons in a storage circuit, for example, - but not changes to the physical circuit itself, the atoms. If the data is rewritten, the circuit will be perfect again.

Soft errors can occur on transmission lines, in logic, in magnetic storage, and elsewhere, but are most commonly known in semiconductor storage.

=Causes of Soft Errors=

==Package Decay==

Soft errors became widely known with the introduction of dynamic RAM in the 1970s. In these early devices, chip packaging materials contained small amounts of radioactive contaminants. Very low decay rates are needed to avoid excess soft errors, and chip companies have occasionally suffered problems with contamination ever since. It is extremely hard to maintain the material purity needed.

Package radioactive decay usually causes a soft error by alpha particle emission. The positively charged alpha particle travels through the semiconductor and disturbs the distribution of electrons there. If the disturbance is large enough, a digital signal can change from a 0 to a 1 or vice versa. In combinational logic, this effect is transient, perhaps lasting a fraction of a nanosecond, and this has led to the challenge of soft errors in combinational logic mostly going unnoticed. In logic Random Access Memory and latch, even this transient upset can become stored for an indefinite time, to be read out later. Thus, designers are usually much more aware of the problem in storage circuits.

==Critical Charge==

Whether a circuit experiences a soft error depends on the energy of the incoming particle, the geometry of the impact, and the design of the logic circuit. Logic circuits with higher capacitance and higher logic voltages are less likely to suffer an error. This combination of capacitance and voltage is described by the critical charge parameter, Qcrit, the minimum electron charge disturbance needed to change the logic level. A higher Qcrit means fewer soft errors. Unfortunately, a higher Qcrit also means a slower logic gate and a higher power dissipation. Reduction in chip feature size and supply voltage, desirable for many reasons, decreases Qcrit. Thus, the importance of soft errors increases as chip technology advances.

==Cosmic Rays==

Once the electronics industry had learnt to control package contaminants, it became clear that other causes were also at work. James F. Ziegler led a program of work at IBM which culminated in the publication of a number of papers (Ziegler and Lanford, 1979)[insert more references] demonstrating that cosmic rays also could cause soft errors. Indeed, in modern devices, cosmic rays are the predominant cause. Many different particles can be present in cosmic rays, but the main cause of soft errors seems to be neutrons. Neutrons are uncharged and cannot disturb electron distribution on their own, but can undergo neutron capture by the nucleus of an atom in a chip, producing an unstable isotope which then causes a soft error when it decays producing an alpha particle.

Cosmic ray flux depends on altitude. Burying a system in a cave reduces the rate of cosmic-ray-induced soft errors to a negligible level. In the lower levels of the atmosphere, the flux increases by a factor of about 2.2 for every 1000 m (1.3 for every 1000 ft) increase in altitude above sea level. Computers operated on top of mountains, or in aircraft, experience an order of magnitude higher rate of soft errors compared to sea level. This is in contrast to package decay induced soft errors, which do not change with location.

It happens that one isotope of boron, Boron-10, captures neutrons and undergoes alpha decay very efficiently. It has a very high neutron cross section (physics). Boron is used in Borophosphosilicate_glass, a glass used to cover silicon dies to protect them. In critical designs, depleted boron - consisting almost entirely of Boron-11 - is used, to avoid this effect and therefore to reduce the soft error rate. Boron-11 is a by-product of the nuclear power.

==Other Causes==

Soft errors can also be caused by random noise or signal integrity problems.

=Designing Around Soft Errors=

==Avoiding soft errors==

A designer can attempt to minimise the rate of soft errors by judicious device design, choosing the right semiconductor, package and substrate materials, and the right device geometry. Often, however, this is limited by the need to reduce device size and voltage, to increase operating speed and to reduce power dissipation. The susceptibility of devices to upsets is described in the industry using the JEDEC JESD-89 standard.

==Correcting soft errors==

Designers can choose to accept that soft errors will occur, and design systems with appropriate error detection and correction to recover gracefully. Typically, a semiconductor memory design might use forward error correction, incorporating redundant data into each word to create an error correcting code. Alternatively, roll-back error correction can be used, detecting the soft error with an error-detecting code such as parity, and rewriting correct data from another source. This technique is often used for write-through cache memories.

Soft errors in logic circuits other than memory are sometimes detected and corrected using the techniques of fault tolerant design.

Traditionally, Dynamic_random_access_memory has had the most attention in the quest to reduce, or work-around soft errors, due to the fact that DRAM has comprised the majority-share of susceptible device surface area in desktop, and server computer systems (ref. the prevalence of ECC RAM in server computers). Hard figures for DRAM susceptibility are hard to come by, and vary considerably across designs, fabrication processes, and manufacturers. 1980s technology 256 kilobit DRAMS could have clusters of five or six bits flip from a single alpha particle. Modern DRAMs have much smaller feature sizes, so the deposition of a similar amount of charge could easily cause many more bits to flip.

The design of error detection and correction circuits is helped by the fact that soft errors usually are localised to a very small area of a chip. Usually, only one cell of a memory is affected, although high energy events can cause a multi-cell upset. Conventional memory layout usually places one bit of many different correction words adjacent on a chip. So, even a multi-cell upset leads to only a number of separate Single event upset in multiple correction words, rather than a multi-bit upset in a single correction word. So, an error correcting code needs only to cope with a single bit in error in each correction word in order to cope with all likely soft errors. The term multi-cell is used for upsets affecting multiple cells of a memory, whatever correction words those cells happen to fall in. Multi-bit is used when multiple bits in a single correction word are in error.

= See also =

  • Single event upset
  • Radiation hardening
  • = External links =

  • [http://www.tezzaron.com/about/papers/Soft%20Errors%201_1%20secure.pdf Soft Errors in Electronic Memory - A White Paper] - A good summary paper with many references - Tezzaron Jan 2004
  • [http://www-1.ibm.com/servers/eserver/pseries/campaigns/chipkill.pdf Benefits of Chipkill-Correct ECC for PC Server Main Memory] - A 1997 discussion of SDRAM reliability - some interesting information on soft errors from cosmic rays, especially with respect to Error-correcting_code schemes
  • [http://www.edn.com/article/CA454636.html Soft errors impact on system reliability] - Ritesh Mastipuram and Edwin C Wee, Cypress Semiconductor, 2004
  • [http://www.nepp.nasa.gov/DocUploads/40D7D6C9-D5AA-40FC-829DC2F6A71B02E9/Scal-00.pdf Scaling and Technology Issues for Soft Error Rates] - A Johnston - 4th Annual Research Conference on Reliability Stanford University, October 2000
  • [http://www.rcnp.osaka-u.ac.jp/~annurep/2001/genkou/sec3/kobayashi.pdf Evaluation of LSI Soft Errors Induced by Terrestrial Cosmic rays and Alpha Particles] - H. Kobayashi, K. Shiraishi, H. Tsuchiya, H. Usuki (all of Sony), and Y. Nagai, K. Takahisa (Osaka University), 2001.
  • = References =

    Ziegler, J. F. and W. A. Lanford, Effect of Cosmic Rays on Computer Memories , Science , 206, 776 (1979).