Data Representation in Computers

Every time you send a message, save a photo, or stream a video, your computer performs a remarkable transformation. It takes the rich, complex world of human experience and converts it into something starkly simple: ones and zeros. This process, known as data representation, is the foundation of all computing. Understanding it isn't just academic curiosity, it's the key to writing better code, optimizing storage, and truly grasping how digital systems work.

Let's demystify how computers represent information, moving from the basic building blocks to the sophisticated structures that power modern applications.

Why Computers Need a Special Language

Computers are electronic machines at their core. They process information using billions of tiny switches called transistors that can only exist in two states: on or off, high voltage or low voltage, true or false. This binary nature isn't a limitation we work around; it's a feature we leverage.

Binary representation offers distinct advantages:

Reliability: Distinguishing between two states is far more reliable than interpreting multiple voltage levels
Simplicity: Electronic circuits are cheaper to build and less prone to error when dealing with binary logic
Universality: Any information can be encoded in binary, from simple numbers to complex neural network weights

The Binary Number System

Before diving into how computers represent text or images, we need to understand their native number system: binary (base-2).

Counting in Two Digits

While humans use decimal (base-10) with digits 0 through 9, computers use only 0 and 1. Each position in a binary number represents a power of 2, not 10.

Decimal	Binary	Calculation
0	0	—
1	1	—
2	10	1×2¹ + 0×2⁰
5	101	1×2² + 0×2¹ + 1×2⁰
10	1010	1×2³ + 0×2² + 1×2¹ + 0×2⁰
255	11111111	Sum of 2⁷ through 2⁰

A single binary digit is called a bit (binary digit). Eight bits form a byte, which can represent 256 different values (2⁸). This is why you often see file sizes measured in bytes, kilobytes (KB), megabytes (MB), and beyond.

Practical Conversion

Converting decimal to binary involves repeatedly dividing by 2 and tracking remainders. For example, converting 13 to binary:

13 ÷ 2 = 6 remainder 1
6 ÷ 2 = 3 remainder 0
3 ÷ 2 = 1 remainder 1
1 ÷ 2 = 0 remainder 1

Reading the remainders from bottom to top: 1101.

Modern programming languages handle these conversions automatically, but understanding the underlying mechanics helps when debugging bitwise operations or optimizing memory usage.

Representing Text: Character Encoding

Computers don't understand letters, they understand numbers. Character encoding maps numbers to characters, creating a shared language between humans and machines.

ASCII: The Foundation

Developed in the 1960s, ASCII (American Standard Code for Information Interchange) assigned 128 unique codes to characters using 7 bits:

Codes 0–31: Control characters (newline, tab, carriage return)
Codes 32–126: Printable characters (letters, numbers, punctuation)
Code 127: Delete character

For example:

'A' = 65 (binary: 1000001)
'a' = 97 (binary: 1100001)
'0' = 48 (binary: 0110000)

Notice that uppercase and lowercase letters differ by exactly 32. This isn't coincidence, it's designed to simplify case conversion through bitwise operations.

The Unicode Revolution

ASCII's 128 characters couldn't represent world languages, mathematical symbols, or emojis. Unicode solved this by assigning a unique number (code point) to every character across all writing systems.

UTF-8, the dominant encoding scheme, cleverly maintains backward compatibility with ASCII while supporting over a million characters:

1 byte for ASCII characters (codes 0–127)
2 bytes for extended Latin, Arabic, Hebrew
3 bytes for Basic Multilingual Plane (most common characters)
4 bytes for historical scripts, emojis, and rare symbols

This variable-length approach means English text remains compact while supporting global languages. A string like "Hello" consumes 5 bytes, while "你好" requires 6 bytes (3 bytes per character).

Numerical Data Types

Computers distinguish between different kinds of numbers through data types, each with specific memory requirements and ranges.

Integers

Integer types store whole numbers without fractional components:

Type	Size	Range
Byte	8 bits	0 to 255 (unsigned) or -128 to 127 (signed)
Short	16 bits	-32,768 to 32,767
Integer	32 bits	-2,147,483,648 to 2,147,483,647
Long	64 bits	±9.2 × 10¹⁸

Signed integers use one bit to indicate positive or negative, typically using two's complement representation. This elegant system allows addition and subtraction to use the same circuitry regardless of sign, simplifying processor design.

Floating-Point Numbers

Real numbers with decimal points use floating-point representation, modeled after scientific notation. The IEEE 754 standard defines:

Single precision (float): 32 bits (1 sign bit, 8 exponent bits, 23 mantissa bits)
Double precision (double): 64 bits (1 sign bit, 11 exponent bits, 52 mantissa bits)

A number like 123.45 becomes approximately 1.2345 × 10² in scientific notation, which computers store in binary scientific notation.

Critical insight for developers: Floating-point arithmetic is inherently approximate. The decimal 0.1 cannot be represented exactly in binary floating-point, leading to familiar quirks:

0.1 + 0.2 ≠ 0.3  // Actually 0.30000000000000004

For financial calculations requiring exact precision, languages offer decimal or fixed-point types that store numbers as integers with a separate scale factor.

Representing Images

Digital images are grids of colored points called pixels. How we encode each pixel's color determines image quality, file size, and editing flexibility.

Raster Images: Bitmap and Beyond

Bitmap (BMP) files store colors directly, typically using 24 bits per pixel: 8 bits for red, 8 for green, and 8 for blue (RGB). This supports 16.7 million colors (2²⁴) but creates enormous files. A 1920×1080 image requires roughly 6 MB uncompressed.

Compression algorithms reduce this dramatically:

Lossless compression (PNG): Identifies patterns and redundant data, preserving every pixel exactly
Lossy compression (JPEG): Discards information human eyes barely perceive, achieving 10:1 or greater compression ratios

Color Depth and Transparency

Modern formats support various color depths:

8-bit: 256 colors (limited palette)
24-bit: True color (16.7 million colors)
32-bit: True color plus 8-bit alpha channel for transparency

The alpha channel enables overlay effects, allowing web designers to place images over varying backgrounds seamlessly.

Vector Graphics

Unlike raster images, vector graphics (SVG) store mathematical descriptions of shapes, lines, and curves. They scale infinitely without pixelation because the computer recalculates the image at each size. Logos and icons typically use vector formats, while photographs remain raster due to their complexity.

Audio Representation

Sound is continuous waveforms, but computers require discrete data. Pulse Code Modulation (PCM) solves this through sampling:

Sampling rate: How often we measure the waveform per second (CD quality uses 44,100 samples/second)
Bit depth: Precision of each measurement (16 bits provides 65,536 possible values)

Higher sampling rates capture higher frequencies; greater bit depth improves dynamic range (quietest to loudest sounds). Professional audio often uses 96 kHz sampling with 24-bit depth, though this creates significantly larger files.

Compression formats like MP3 and AAC apply psychoacoustic models, removing sounds masked by louder frequencies or outside human hearing range (typically 20 Hz to 20 kHz). A 5-minute uncompressed CD audio file exceeds 50 MB; MP3 compression reduces this to roughly 5 MB with perceptually minimal quality loss.

Video: Combining Everything

Video represents the most complex data type, combining multiple image frames with synchronized audio. Uncompressed high-definition video generates gigabytes per minute, making compression essential.

Modern codecs like H.264, H.265 (HEVC), and AV1 employ sophisticated techniques:

Temporal compression: Storing only differences between frames rather than complete images
Motion compensation: Tracking how objects move across frames
Spatial compression: Applying image compression within each frame

A two-hour 4K movie might require 200 GB uncompressed but streams smoothly at 15 GB with modern compression, a 13:1 reduction with remarkable quality retention.

Data Structures: Organizing Information

Raw binary becomes useful through data structures that organize related information:

Arrays and Lists

Sequential collections where elements share the same type. Arrays offer O(1) access to any element by index but require contiguous memory.

Records and Objects

Structs or objects group heterogeneous data (name, age, email) into single units, with each field stored at a known offset from the structure's start address.

Pointers and References

Rather than copying large data structures, computers store memory addresses pointing to where data resides. This enables efficient passing of complex objects and dynamic data structures like linked lists and trees.

Understanding memory layout helps optimize performance. Objects stored contiguously in memory access faster due to CPU cache efficiency, while scattered data triggers slower main memory retrieval.

Practical Implications for Developers

Understanding data representation directly impacts your work:

Database Design: Choosing appropriate column types prevents wasted storage and improves query speed. A TINYINT (1 byte) suffices for boolean flags, while VARCHAR adapts to variable-length strings better than fixed-length CHAR.

Network Protocols: Data transmitted between systems must use agreed-upon formats. JSON and Protocol Buffers specify exactly how to encode numbers, strings, and structures into transmittable bytes.

Security: Buffer overflow vulnerabilities occur when programs write beyond allocated memory boundaries. Understanding how integers wrap around (e.g., 255 + 1 = 0 in 8-bit unsigned arithmetic) helps prevent calculation errors in critical systems.

Performance: Bitwise operations (AND, OR, XOR, shifts) manipulate data at the binary level, enabling fast flag checking, permission systems, and graphics processing.

Data representation bridges the gap between human meaning and machine capability. Every photo you share, every song you stream, every transaction you process relies on these fundamental encoding schemes.

As computing evolves, representation schemes adapt. Quantum computing introduces qubits that exist in superposition, potentially revolutionizing how we encode and process information. Neural networks use specialized tensor formats optimized for parallel processing on GPUs.

Yet the core principle remains unchanged: transforming the infinite complexity of our world into the elegant simplicity of binary, then back again. Mastering this transformation doesn't just make you a better programmer, it gives you deeper insight into the digital infrastructure shaping modern civilization.

Next Steps: Try implementing a simple base converter in your preferred language, or examine the hexadecimal representation of image files using a hex editor. Seeing these abstractions in action cements theoretical understanding into practical skill.

Found this helpful? Explore more tutorials at blog.ongoro.top and subscribe for weekly deep dives into computing fundamentals.

Table of Contents