This topic summary covers Knowledge Organiser: Character Sets within Character Sets for GCSE Computer Science. Revise Character Sets in 3.3 Data Representation for GCSE Computer Science with 15 exam-style questions and 18 flashcards. This topic appears less often, but it can still be a useful differentiator on mixed-topic papers. It is section 11 of 11 in this topic. Use this topic summary to connect the idea to the wider topic before moving on to questions and flashcards.

Knowledge Organiser: Character Sets

Key Terms

Character set: A defined list of characters and their binary codes
ASCII: American Standard Code for Information Interchange — 7-bit, 128 characters
Unicode: Universal character set supporting 143,859+ characters and all world languages
UTF-8: Variable-length encoding (1–4 bytes); web standard for Unicode
UTF-16: 2–4 bytes per character; used internally by Windows and Java
Extended ASCII: 8-bit version with 256 characters, adds accented letters

Must-Know Facts

ASCII is 7-bit (NOT 8-bit) — 128 characters total
ASCII code for 'A' = 65; 'a' = 97; '0' = 48; space = 32
Lowercase letter = uppercase + 32 (e.g. 'A'=65, 'a'=97)
Unicode covers all languages, symbols and emoji
UTF-8 is backward compatible with ASCII (same codes 0–127)
More characters in Unicode = more bits needed per character = larger file sizes
UTF-8 dominant on the web (used by 95%+ of websites)

Key Concepts

Why ASCII is limited: Only 128 characters — covers English but not other languages
Why Unicode was created: Single universal standard replacing hundreds of incompatible code pages
Character set vs encoding: Unicode is the set; UTF-8/16/32 are ways to encode it in bytes
File size trade-off: Unicode files can be larger than ASCII because each character may need more bytes

Common Mistakes

Saying ASCII is 8-bit: Standard ASCII uses 7 bits (128 characters) — Extended ASCII uses 8 bits (256 characters); exam questions often test this distinction
Confusing Unicode with UTF-8: Unicode is the character set (the list of characters and code points); UTF-8, UTF-16, and UTF-32 are encoding schemes that store Unicode in binary
Saying Unicode always uses more storage than ASCII: UTF-8 encodes the first 128 characters identically to ASCII using just 1 byte — it only uses more bytes for characters beyond the ASCII range
Forgetting that numbers and symbols have ASCII codes too: ASCII encodes digits 0–9, punctuation, and control characters as well as letters — every character a computer handles has a numeric code

Knowledge Organiser: Character Sets

Knowledge Organiser: Character Sets

Key Terms

Must-Know Facts

Key Concepts

Common Mistakes

Practice questions for Character Sets

Quick recall flashcards

15 questions on Character Sets — practise free