Memory & StorageKey Facts

UTF-8, UTF-16, UTF-32: Encoding Methods

Part of Character Sets — GCSE Computer Science

This key facts covers UTF-8, UTF-16, UTF-32: Encoding Methods within Character Sets for GCSE Computer Science. Revise Character Sets in Memory & Storage for GCSE Computer Science with 15 exam-style questions and 18 flashcards. This topic appears less often, but it can still be a useful differentiator on mixed-topic papers. It is section 6 of 10 in this topic. Use this key facts to connect the idea to the wider topic before moving on to questions and flashcards.

Topic position

Section 6 of 10

Practice

15 questions

Recall

18 flashcards

UTF-8, UTF-16, UTF-32: Encoding Methods

Unicode Transformation Formats (UTF):

Unicode defines WHAT each character's code is. UTF defines HOW to store those codes in bytes.

UTF-8 (Most Common):

Variable length: 1 to 4 bytes per character
ASCII compatible: ASCII characters still use 1 byte (efficient!)
English text: 1 byte per character (same size as ASCII)
Accented letters: 2 bytes (é, ñ, ü)
Chinese/Japanese: 3 bytes per character
Emoji: 4 bytes
Advantages: Efficient for English, backward compatible with ASCII
Disadvantage: Asian languages take 3× more space than UTF-16

UTF-16:

Variable length: 2 or 4 bytes per character
Most characters: 2 bytes (including Chinese, Japanese, Korean)
Emoji & rare: 4 bytes (surrogate pairs)
Use case: Windows internals, Java, JavaScript strings
Advantage: Efficient for Asian languages
Disadvantage: English takes 2× space vs ASCII/UTF-8

UTF-32:

Fixed length: Exactly 4 bytes per character (always)
Advantage: Simple - every character same size, easy indexing
Disadvantage: Wastes space - 'A' takes 4 bytes (0x00000041)
Use case: Internal processing where speed > space

UTF-8, UTF-16, UTF-32: Encoding Methods

UTF-8, UTF-16, UTF-32: Encoding Methods

Keep building this topic

Practice Questions for Character Sets

Quick Recall Flashcards

15 questions on Character Sets — practise free