My Profile
Active Members
TodayLast 7 Days
more...
Awards & Gifts
Online Exams
Fresher Jobs
Our fresher job section is exclusively for fresh graduates! Find jobs for freshers in major Indian
cities including Bangalore, Chennai, Hyderabad, Pune or Kochi
Resources
Find educational articles, blogs, discussion threads and other resources.
Colleges
Find details about any college in India or search for courses.
Paid Surveys
|
Uni code
Posted Date: 04 Mar 2008 Resource Type: Articles/Knowledge Sharing Category: General
|
Posted By: Aparanjitha Member Level: Gold Rating: Points: 4
|
|
|
|
3.1 Unicode Programs are written using the Unicode character set. Information about this character set and its associated character encodings may be found at:
http://www.unicode.org
The Java platform tracks the Unicode specification as it evolves. The precise version of Unicode used by a given release is specified in the documentation of the class Character.
Versions of the Java programming language prior to 1.1 used Unicode version 1.1.5. Upgrades to newer versions of the Unicode Standard occurred in JDK 1.1 (to Unicode 2.0), JDK 1.1.7 (to Unicode 2.1), J2SE 1.4 (to Unicode 3.0), and J2SE 5.0 (to Unicode 4.0).
The Unicode standard was originally designed as a fixed-width 16-bit character encoding. It has since been changed to allow for characters whose representation requires more than 16 bits. The range of legal code points is now U+0000 to U+10FFFF, using the hexadecimal U+n notation. Characters whose code points are greater than U+FFFF are called supplementary characters. To represent the complete range of characters using only 16-bit units, the Unicode standard defines an encoding called UTF-16. In this encoding, supplementary characters are represented as pairs of 16-bit code units, the first from the high-surrogates range, (U+D800 to U+DBFF), the second from the low-surrogates range (U+DC00 to U+DFFF). For characters in the range U+0000 to U+FFFF, the values of code points and UTF-16 code units are the same.
The Java programming language represents text in sequences of 16-bit code units, using the UTF-16 encoding. A few APIs, primarily in the Character class, use 32-bit integers to represent code points as individual entities. The Java platform provides methods to convert between the two representations.
This book uses the terms code point and UTF-16 code unit where the representation is relevant, and the generic term character where the representation is irrelevant to the discussion.
Except for comments (§3.7), identifiers, and the contents of character and string literals (§3.10.4, §3.10.5), all input elements (§3.5) in a program are formed only from ASCII characters (or Unicode escapes (§3.3) which result in ASCII characters). ASCII (ANSI X3.4) is the American Standard Code for Information Interchange. The first 128 characters of the Unicode character encoding are the ASCII characters.
|
Responses
|
No responses found. Be the first to respond and make money from revenue sharing program.
|
|
|