Memory & StorageKey Facts

UTF-8, UTF-16, UTF-32: Encoding Methods

Part of Character SetsGCSE Computer Science

This key facts covers UTF-8, UTF-16, UTF-32: Encoding Methods within Character Sets for GCSE Computer Science. Revise Character Sets in Memory & Storage for GCSE Computer Science with 15 exam-style questions and 18 flashcards. This topic appears less often, but it can still be a useful differentiator on mixed-topic papers. It is section 6 of 10 in this topic. Use this key facts to connect the idea to the wider topic before moving on to questions and flashcards.

Topic position

Section 6 of 10

Practice

15 questions

Recall

18 flashcards

UTF-8, UTF-16, UTF-32: Encoding Methods

Unicode Transformation Formats (UTF):

Unicode defines WHAT each character's code is. UTF defines HOW to store those codes in bytes.

UTF-8 (Most Common):

  • Variable length: 1 to 4 bytes per character
  • ASCII compatible: ASCII characters still use 1 byte (efficient!)
  • English text: 1 byte per character (same size as ASCII)
  • Accented letters: 2 bytes (é, ñ, ü)
  • Chinese/Japanese: 3 bytes per character
  • Emoji: 4 bytes
  • Advantages: Efficient for English, backward compatible with ASCII
  • Disadvantage: Asian languages take 3× more space than UTF-16

UTF-16:

  • Variable length: 2 or 4 bytes per character
  • Most characters: 2 bytes (including Chinese, Japanese, Korean)
  • Emoji & rare: 4 bytes (surrogate pairs)
  • Use case: Windows internals, Java, JavaScript strings
  • Advantage: Efficient for Asian languages
  • Disadvantage: English takes 2× space vs ASCII/UTF-8

UTF-32:

  • Fixed length: Exactly 4 bytes per character (always)
  • Advantage: Simple - every character same size, easy indexing
  • Disadvantage: Wastes space - 'A' takes 4 bytes (0x00000041)
  • Use case: Internal processing where speed > space

Keep building this topic

Read this section alongside the surrounding pages in Character Sets. That gives you the full topic sequence instead of a single isolated revision point.

Practice Questions for Character Sets

How many bits does standard ASCII use to represent each character?

  • A. 4 bits
  • B. 7 bits
  • C. 8 bits
  • D. 16 bits
1 markfoundation

Explain why using Unicode to store a text file produces a larger file than using ASCII to store the same text.

3 marksstandard

Quick Recall Flashcards

Want to test your knowledge?

PrepWise has 15 exam-style questions and 18 flashcards for Character Sets — with adaptive difficulty and instant feedback.

Join Alpha