Character Codes   
 
Latin 1 Characters; This chart shows the effects of numeric ampersand entities on your browser. To use these characters in your own HTML files, put the appropriate number into &#__; e.g. "£" for the British pound (currency) sign), or, for the 8-bit alphabetic characters, use the alternative standard HTML 2.0 entity in parentheses on the right. (These are the only non-numeric character entities defined in HTML 2.0, except for "&", "<", and ">", which should be used to escape the characters & < > in an HTML file, and """ to escape a double-quote character in an attribute value.)

If the right column looks the same as the left column, you're losing the eighth bit somewhere. If the characters in the right column don't match their descriptions, then your browser is translating incorrectly between ISO 8859-1 Latin 1 and your platform's native character set.

Finally, note that positions 127-159 are not displayable characters in ISO 8859-1 Latin 1, and are not part of any HTML standard, so that HTML code such as "™" is incorrect, and will be displayed differently in browsers on different platforms (probably often in ways that you did not intend). See the next chart below (unicode) for the (future) correct way of displaying characters which are in positions 130-159 in Microsoft Windows -- including such typographical niceties as "curly" quotes, dashes, ellipses, and the trademark symbol.

The following chart only tests the ISO 8859-1 compliance of your browser's non-proportional font.

 32      160    Non-breaking space
 33  !   161    Inverted exclamation
 34  "   162    Cent sign
 35  #   163    Pound sterling
 36  $   164    General currency sign
 37  %   165    Yen sign
 38  &   166    Broken vertical bar
 39  '   167    Section sign
 40  (   168    Umlaut (dieresis)
 41  )   169    Copyright
 42  *   170    Feminine ordinal
 43  +   171    Left angle quote, guillemotleft
 44  ,   172    Not sign
 45  -   173    Soft hyphen
 46  .   174    Registered trademark
 47  /   175    Macron accent
 48  0   176    Degree sign
 49  1   177    Plus or minus
 50  2   178    Superscript two
 51  3   179    Superscript three
 52  4   180    Acute accent
 53  5   181    Micro sign
 54  6   182    Paragraph sign
 55  7   183    Middle dot
 56  8   184    Cedilla
 57  9   185    Superscript one
 58  :   186    Masculine ordinal
 59  ;   187    Right angle quote, guillemotright
 60  <   188    Fraction one-fourth
 61  =   189    Fraction one-half
 62  >   190    Fraction three-fourths
 63  ?   191    Inverted question mark
 64  @   192    Capital A, grave accent ("À")
 65  A   193    Capital A, acute accent ("Á")
 66  B   194    Capital A, circumflex accent ("Â")
 67  C   195    Capital A, tilde ("Ã")
 68  D   196    Capital A, dieresis or umlaut mark ("Ä")
 69  E   197    Capital A, ring ("Å")
 70  F   198    Capital AE dipthong (ligature) ("Æ")
 71  G   199    Capital C, cedilla ("Ç")
 72  H   200    Capital E, grave accent ("È")
 73  I   201    Capital E, acute accent ("É")
 74  J   202    Capital E, circumflex accent ("Ê")
 75  K   203    Capital E, dieresis or umlaut mark ("Ë")
 76  L   204    Capital I, grave accent ("Ì")
 77  M   205    Capital I, acute accent ("Í")
 78  N   206    Capital I, circumflex accent ("Î")
 79  O   207    Capital I, dieresis or umlaut mark ("Ï")
 80  P   208    Capital Eth, Icelandic ("Ð")
 81  Q   209    Capital N, tilde ("Ñ")
 82  R   210    Capital O, grave accent ("Ò")
 83  S   211    Capital O, acute accent ("Ó")
 84  T   212    Capital O, circumflex accent ("Ô")
 85  U   213    Capital O, tilde ("Õ")
 86  V   214    Capital O, dieresis or umlaut mark ("Ö")
 87  W   215    Multiply sign
 88  X   216    Capital O, slash ("Ø")
 89  Y   217    Capital U, grave accent ("Ù")
 90  Z   218    Capital U, acute accent ("Ú")
 91  [   219    Capital U, circumflex accent ("Û")
 92  \   220    Capital U, dieresis or umlaut mark ("Ü")
 93  ]   221    Capital Y, acute accent ("Ý")
 94  ^   222    Capital THORN, Icelandic ("Þ")
 95  _   223    Small sharp s, German (sz ligature) ("ß")
 96  `   224    Small a, grave accent ("à")
 97  a   225    Small a, acute accent ("á")
 98  b   226    Small a, circumflex accent ("â")
 99  c   227    Small a, tilde ("ã")
100  d   228    Small a, dieresis or umlaut mark ("ä")
101  e   229    Small a, ring ("å")
102  f   230    Small ae dipthong (ligature) ("æ")
103  g   231    Small c, cedilla ("ç")
104  h   232    Small e, grave accent ("è")
105  i   233    Small e, acute accent ("é")
106  j   234    Small e, circumflex accent ("ê")
107  k   235    Small e, dieresis or umlaut mark ("ë")
108  l   236    Small i, grave accent ("ì")
109  m   237    Small i, acute accent ("í")
110  n   238    Small i, circumflex accent ("î")
111  o   239    Small i, dieresis or umlaut mark ("ï")
112  p   240    Small eth, Icelandic ("ð")
113  q   241    Small n, tilde ("ñ")
114  r   242    Small o, grave accent ("ò")
115  s   243    Small o, acute accent ("ó")
116  t   244    Small o, circumflex accent ("ô")
117  u   245    Small o, tilde ("õ")
118  v   246    Small o, dieresis or umlaut mark ("ö")
119  w   247    Division sign
120  x   248    Small o, slash ("ø")
121  y   249    Small u, grave accent ("ù")
122  z   250    Small u, acute accent ("ú")
123  {   251    Small u, circumflex accent ("û")
124  |   252    Small u, dieresis or umlaut mark ("ü")
125  }   253    Small y, acute accent ("ý")
126  ~   254    Small thorn, Icelandic ("þ")
              255    Small y, dieresis or umlaut mark ("ÿ")

Unicode: The correct way to display "smart quotes", the trademark symbol, etc.

Some commonly-desired characters, such as the trademark symbol, as well as such typographical niceties as "curly" quotes, dashes, and ellipses, are not part of the ISO 8859-1 character set, and so cannot be displayed properly in HTML 2.0. If you put a raw 8-bit character in your file and intend it to be understood with a non-ISO8859-1 meaning, or put a numeric entity reference between 128 and 159 there (such as "™"), then this is incorrect HTML, which will not display as you intended on browsers on other platforms, and maybe not even on other browsers on the same platform -- even when it "looks right" in your own browser.

One correct way to specify such characters in more recent versions of HTML (starting with the "Cougar" proposal -- now superseded by the proposed HTML 4.0 standard -- and/or "internationalized HTML" as specified in RFC 2070 is to use numeric entities greater than 255, which refer to positions in the Unicode character set, as outlined in the Usenet posting below. Unfortunately, these are only begining to be implemented in some newer brower versions at this moment, but will become more widely implemented in the future. (You can see whether your own browser understands these entities by looking at the third column of the table below.)

(See also http://www.w3.org/pub/WWW/TR/WD-entities (from the "Cougar" draft) or http://www.w3.org/TR/WD-html40-970708/sgml/HTMLmisc.ent (HTML 4.0) for relevant entity lists in the proposed HTML standards.)

[Question: ’ valid HTML or no?]

The characters 128-159 are not used in ISO 8859-1 and Unicode, the character sets of HTML. MS-Windows uses a superset of ANSI/ISO 8859-1, known to experts as "Code Page 1252 (CP1252)", a Microsoft-specific character set with additional characters in the 128-159 range (also known as the "C1" range).

All the CP1252 characters are also available in Unicode. For example the CP1252 character 146 that you mentioned (RIGHT SINGLE QUOTATION MARK) has the Unicode number 8217, therefore you should use this number in order to conform to the HTML standard. Modern HTML browsers like Netscape 4.0 understand Unicode, and will automatically convert the Unicode character ’ back into the character 146 on MS-Windows machines, and into the appropriate character on other systems.

The official CP1252<->Unicode conversion table is printed in the Unicode 2.0 standard for instance, and is available on in the file ucs-map-cp1252. [See also the file ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT at the official Unicode site.]

The CP1252 characters that are not part of ANSI/ISO 8859-1, and that should therefore always be encoded as Unicode characters greater than 255, are the following:

 Windows   Unicode    Char.
  char.   HTML code   test         Description of Character
  -----     -----     ---          ------------------------
ALT-0130   ‚   ‚    Single Low-9 Quotation Mark
ALT-0131   ƒ    ƒ    Latin Small Letter F With Hook
ALT-0132   „   „    Double Low-9 Quotation Mark
ALT-0133   …   …    Horizontal Ellipsis
ALT-0134   †   †    Dagger
ALT-0135   ‡   ‡    Double Dagger
ALT-0136   ˆ    ˆ    Modifier Letter Circumflex Accent
ALT-0137   ‰   ‰    Per Mille Sign
ALT-0138   Š    Š    Latin Capital Letter S With Caron
ALT-0139   ‹   ‹    Single Left-Pointing Angle Quotation Mark
ALT-0140   Π   Π   Latin Capital Ligature OE
ALT-0145   ‘   ‘    Left Single Quotation Mark
ALT-0146   ’   ’    Right Single Quotation Mark
ALT-0147   “   “    Left Double Quotation Mark
ALT-0148   ”   ”    Right Double Quotation Mark
ALT-0149   •   •    Bullet
ALT-0150   –   –    En Dash
ALT-0151   —   —    Em Dash
ALT-0152   ˜    ˜    Small Tilde
ALT-0153   ™   ™    Trade Mark Sign
ALT-0154   š    š    Latin Small Letter S With Caron
ALT-0155   ›   ›    Single Right-Pointing Angle Quotation Mark
ALT-0156   œ    œ    Latin Small Ligature OE
ALT-0159   Ÿ    Ÿ    Latin Capital Letter Y With Diaeresis

My3C's
perrychicker
It's easy... it's a PSACAKE!
Back | Tell A Friend | Search this Site
1998 - 2014 psacake.com
Version 3.23

Send me One Million FREE Guaranteed Visitors