Google
 
   
Login
Username:

Password:


Lost Password?

Register now!
Search
Main Menu
top books
Polls
What do you think about php-deluxe.net?
Excellent!
Cool
Hmm..not bad
What the hell is this?
encyclopedia
recommendation
compare webbrowser
Freenet DSL
Who's Online
4 user(s) are online (3 user(s) are browsing encyclopedia)

Members: 0
Guests: 4

more...
browser tip
Unix Befehle
manual of unix befehle
recommendation!
Sponsored
partner

Shift-JIS

Shift_JIS (SJIS) is a sign at 0x5C and an overline at 0x7E in place of the ASCII character set s backslash and tilde respectively.

Shift_JIS requires an 8-bit medium for transmission. However, unlike the competing 8-bit format Extended Unix Coding (EUC), Shift_JIS only guarantees that the first byte will be in the upper ASCII range; the value of the second byte can be either high or low. This makes reliable Shift_JIS detection difficult.

For a double-byte JIS sequence j_1 j_2, the transformation to the corresponding Shift_JIS bytes s_1 s_2 is:

:33 le j_1 le 96 Rightarrow s_1 = frac{j_1 + 1}{2} + 112, :97 le j_1 le 126 Rightarrow s_1 = frac{j_1 + 1}{2} + 176, :j_1 mbox{ is odd } Rightarrow s_2 = j_2 + 31 + operatorname{trunc}left( frac{j_2}{96} ight) , :j_1 mbox{ is even } Rightarrow s_2 = j_2 + 126,

Many different versions of Shift_JIS exist. There are two areas for expansion: Firstly, JIS X 0208 does not fill the whole 94x94 space encoded for it in Shift_JIS, therefore there is room for more characters here -- these are really extensions to JIS X 0208 rather than to Shift_JIS itself. The most popular extension here is to the Windows-31J (otherwise known as Code page 932) encoding popularized by Microsoft. Secondly, Shift_JIS has more encoding space than is needed for JIS X 0201 and JIS X 0208 and these can and are used for yet more characters. The space with lead bytes 0xF5 to 0xF9 is used by Japanese mobile phone operators for pictographs for use in email, for example (KDDI goes further and defines hundreds more in the space with lead bytes 0xF3 and 0xF4).

Beyond even this there have been numerous minor variations made on Shift_JIS, with individual characters here and there altered. Most of these extensions and variants have no IANA registration, so there is much scope for confusion if the extensions are used. Microsoft Code Page 932 is registered separately from Shift_JIS.

IBM 943 has the same extensions as Code Page 932.

= See also =

  • Japanese language and computers
  • Mojibake
  • =External links=

  • [http://lfw.org/text/jp.html Ping: Japanese text encoding]
  • [http://www.rikai.com/library/kanjitables/kanji_codes.sjis.shtml Shift-JIS] A table of the non-ASCII part of the codeset.
  • [http://mail.apps.ietf.org/ietf/charsets/msg00616.html Proposal for clarification of the difference between Shift_JIS and Windows-31J at IANA]
  • [http://www.iana.org/assignments/character-sets IANA assignments for character sets]
  • [http://www.microsoft.com/globaldev/reference/dbcs/932.htm Microsoft s definition of Code Page 932]
  • [http://publibn.boulder.ibm.com/doc_link/en_US/a_doc_lib/aixprggd/genprogc/codeset_over.htm#HDRMGC0DAN IBM Code Page description page] Includes a brief description of where all the IBM 943 extensions came from.
  • *Forms of Shift-JIS in ICU (International Components for Unicode)
  • [http://www.ibm.com/software/globalization/icu/demo/convertersconv=ibm-942 ibm-942 (sjis78)]
  • [http://www.ibm.com/software/globalization/icu/demo/convertersconv=ibm-943 ibm-943 (Contains the u00A5 x5C mapping)]
  • [http://www.ibm.com/software/globalization/icu/demo/convertersconv=Shift_JIS Shift_JIS (Contains the u005C x5C mapping)]