A widely supported and preferred character encoding system.
For a computer to display letters (or any text characters), it needs to enumerate them – create an index of characters it knows how to display. These indexes are known as character sets. This is invaluable for users hosting WordPress in a non-English language.
The most widely used collections of these character sets are the iso-8859 with iso-8859-1 and iso-8859-15 (which contains the euro sign and some characters used in Dutch, French, Czech and Slovak) being the most common; they are also known as Latin1 and Latin9. These character sets use 8 bits (a single byte) for each character, allowing for 255 different characters (256, counting null). However, when considering that Latin-based languages aren’t the only ones in the world (think Japanese or Hebrew), 255 characters aren’t nearly enough.
There is a wide index of characters known as Unicode. Unicode has so many characters that sometimes more than 16 bits (2 bytes!) are required to represent them. Furthermore, the first 127 characters of Unicode are the same as the first 127 of the most widely used character set – iso-8859-1. For this purpose, UTF, the Unicode Translation Format, was created. UTF uses different numbers of bits for characters and allows for the entire range of Unicode to be used. What you should probably know is:
- UTF-8 is an 8-bit-minimum type of UTF. There are also UTF-16 and UTF-32.
- If your document is in a Latin-based encoding, you probably don’t need to change anything about it for it to be UTF.
- A single UTF document can be in various languages with no need to switch encodings halfway through.
- External links: Joel Spolsky on Unicode