George Ongoro.

Insights, engineering, and storytelling. Exploring the intersection of technology and creativity to build the future of the web.

Navigation

Home FeedFor YouAboutContactRSS FeedUse my articles on your site

Legal

Privacy PolicyTerms of ServiceAdmin Portal

Stay Updated

Get the latest engineering insights delivered to your inbox.

© 2026 George Ongoro. All rights reserved.

System Online
    Hometutorials-howto

    Data Representation in Computers

    March 17, 20269 min read
    tutorials-howto
    Data Representation in Computers
    Cover image for Data Representation in Computers

    Every time you send a message, save a photo, or stream a video, your computer performs a remarkable transformation. It takes the rich, complex world of human experience and converts it into something starkly simple: ones and zeros. This process, known as data representation, is the foundation of all computing. Understanding it isn't just academic curiosity, it's the key to writing better code, optimizing storage, and truly grasping how digital systems work.

    Let's demystify how computers represent information, moving from the basic building blocks to the sophisticated structures that power modern applications.

    Why Computers Need a Special Language

    Computers are electronic machines at their core. They process information using billions of tiny switches called transistors that can only exist in two states: on or off, high voltage or low voltage, true or false. This binary nature isn't a limitation we work around; it's a feature we leverage.

    Binary representation offers distinct advantages:

    • Reliability: Distinguishing between two states is far more reliable than interpreting multiple voltage levels
    • Simplicity: Electronic circuits are cheaper to build and less prone to error when dealing with binary logic
    • Universality: Any information can be encoded in binary, from simple numbers to complex neural network weights

    The Binary Number System

    Before diving into how computers represent text or images, we need to understand their native number system: binary (base-2).

    Counting in Two Digits

    While humans use decimal (base-10) with digits 0 through 9, computers use only 0 and 1. Each position in a binary number represents a power of 2, not 10.

    Decimal Binary Calculation
    0 0 —
    1 1 —
    2 10 1×2¹ + 0×2⁰
    5 101 1×2² + 0×2¹ + 1×2⁰
    10 1010 1×2³ + 0×2² + 1×2¹ + 0×2⁰
    255 11111111 Sum of 2⁷ through 2⁰

    A single binary digit is called a bit (binary digit). Eight bits form a byte, which can represent 256 different values (2⁸). This is why you often see file sizes measured in bytes, kilobytes (KB), megabytes (MB), and beyond.

    Practical Conversion

    Converting decimal to binary involves repeatedly dividing by 2 and tracking remainders. For example, converting 13 to binary:

    1. 13 ÷ 2 = 6 remainder 1
    2. 6 ÷ 2 = 3 remainder 0
    3. 3 ÷ 2 = 1 remainder 1
    4. 1 ÷ 2 = 0 remainder 1

    Reading the remainders from bottom to top: 1101.

    Modern programming languages handle these conversions automatically, but understanding the underlying mechanics helps when debugging bitwise operations or optimizing memory usage.

    Representing Text: Character Encoding

    Computers don't understand letters, they understand numbers. Character encoding maps numbers to characters, creating a shared language between humans and machines.

    ASCII: The Foundation

    Developed in the 1960s, ASCII (American Standard Code for Information Interchange) assigned 128 unique codes to characters using 7 bits:

    • Codes 0–31: Control characters (newline, tab, carriage return)
    • Codes 32–126: Printable characters (letters, numbers, punctuation)
    • Code 127: Delete character

    For example:

    • 'A' = 65 (binary: 1000001)
    • 'a' = 97 (binary: 1100001)
    • '0' = 48 (binary: 0110000)

    Notice that uppercase and lowercase letters differ by exactly 32. This isn't coincidence, it's designed to simplify case conversion through bitwise operations.

    The Unicode Revolution

    ASCII's 128 characters couldn't represent world languages, mathematical symbols, or emojis. Unicode solved this by assigning a unique number (code point) to every character across all writing systems.

    UTF-8, the dominant encoding scheme, cleverly maintains backward compatibility with ASCII while supporting over a million characters:

    • 1 byte for ASCII characters (codes 0–127)
    • 2 bytes for extended Latin, Arabic, Hebrew
    • 3 bytes for Basic Multilingual Plane (most common characters)
    • 4 bytes for historical scripts, emojis, and rare symbols

    This variable-length approach means English text remains compact while supporting global languages. A string like "Hello" consumes 5 bytes, while "你好" requires 6 bytes (3 bytes per character).

    Numerical Data Types

    Computers distinguish between different kinds of numbers through data types, each with specific memory requirements and ranges.

    Integers

    Integer types store whole numbers without fractional components:

    Type Size Range
    Byte 8 bits 0 to 255 (unsigned) or -128 to 127 (signed)
    Short 16 bits -32,768 to 32,767
    Integer 32 bits -2,147,483,648 to 2,147,483,647
    Long 64 bits ±9.2 × 10¹⁸

    Signed integers use one bit to indicate positive or negative, typically using two's complement representation. This elegant system allows addition and subtraction to use the same circuitry regardless of sign, simplifying processor design.

    Floating-Point Numbers

    Real numbers with decimal points use floating-point representation, modeled after scientific notation. The IEEE 754 standard defines:

    • Single precision (float): 32 bits (1 sign bit, 8 exponent bits, 23 mantissa bits)
    • Double precision (double): 64 bits (1 sign bit, 11 exponent bits, 52 mantissa bits)

    A number like 123.45 becomes approximately 1.2345 × 10² in scientific notation, which computers store in binary scientific notation.

    Critical insight for developers: Floating-point arithmetic is inherently approximate. The decimal 0.1 cannot be represented exactly in binary floating-point, leading to familiar quirks:

    0.1 + 0.2 ≠ 0.3  // Actually 0.30000000000000004
    

    For financial calculations requiring exact precision, languages offer decimal or fixed-point types that store numbers as integers with a separate scale factor.

    Representing Images

    Digital images are grids of colored points called pixels. How we encode each pixel's color determines image quality, file size, and editing flexibility.

    Raster Images: Bitmap and Beyond

    Bitmap (BMP) files store colors directly, typically using 24 bits per pixel: 8 bits for red, 8 for green, and 8 for blue (RGB). This supports 16.7 million colors (2²⁴) but creates enormous files. A 1920×1080 image requires roughly 6 MB uncompressed.

    Compression algorithms reduce this dramatically:

    • Lossless compression (PNG): Identifies patterns and redundant data, preserving every pixel exactly
    • Lossy compression (JPEG): Discards information human eyes barely perceive, achieving 10:1 or greater compression ratios

    Color Depth and Transparency

    Modern formats support various color depths:

    • 8-bit: 256 colors (limited palette)
    • 24-bit: True color (16.7 million colors)
    • 32-bit: True color plus 8-bit alpha channel for transparency

    The alpha channel enables overlay effects, allowing web designers to place images over varying backgrounds seamlessly.

    Vector Graphics

    Unlike raster images, vector graphics (SVG) store mathematical descriptions of shapes, lines, and curves. They scale infinitely without pixelation because the computer recalculates the image at each size. Logos and icons typically use vector formats, while photographs remain raster due to their complexity.

    Audio Representation

    Sound is continuous waveforms, but computers require discrete data. Pulse Code Modulation (PCM) solves this through sampling:

    1. Sampling rate: How often we measure the waveform per second (CD quality uses 44,100 samples/second)
    2. Bit depth: Precision of each measurement (16 bits provides 65,536 possible values)

    Higher sampling rates capture higher frequencies; greater bit depth improves dynamic range (quietest to loudest sounds). Professional audio often uses 96 kHz sampling with 24-bit depth, though this creates significantly larger files.

    Compression formats like MP3 and AAC apply psychoacoustic models, removing sounds masked by louder frequencies or outside human hearing range (typically 20 Hz to 20 kHz). A 5-minute uncompressed CD audio file exceeds 50 MB; MP3 compression reduces this to roughly 5 MB with perceptually minimal quality loss.

    Video: Combining Everything

    Video represents the most complex data type, combining multiple image frames with synchronized audio. Uncompressed high-definition video generates gigabytes per minute, making compression essential.

    Modern codecs like H.264, H.265 (HEVC), and AV1 employ sophisticated techniques:

    • Temporal compression: Storing only differences between frames rather than complete images
    • Motion compensation: Tracking how objects move across frames
    • Spatial compression: Applying image compression within each frame

    A two-hour 4K movie might require 200 GB uncompressed but streams smoothly at 15 GB with modern compression, a 13:1 reduction with remarkable quality retention.

    Data Structures: Organizing Information

    Raw binary becomes useful through data structures that organize related information:

    Arrays and Lists

    Sequential collections where elements share the same type. Arrays offer O(1) access to any element by index but require contiguous memory.

    Records and Objects

    Structs or objects group heterogeneous data (name, age, email) into single units, with each field stored at a known offset from the structure's start address.

    Pointers and References

    Rather than copying large data structures, computers store memory addresses pointing to where data resides. This enables efficient passing of complex objects and dynamic data structures like linked lists and trees.

    Understanding memory layout helps optimize performance. Objects stored contiguously in memory access faster due to CPU cache efficiency, while scattered data triggers slower main memory retrieval.

    Practical Implications for Developers

    Understanding data representation directly impacts your work:

    Database Design: Choosing appropriate column types prevents wasted storage and improves query speed. A TINYINT (1 byte) suffices for boolean flags, while VARCHAR adapts to variable-length strings better than fixed-length CHAR.

    Network Protocols: Data transmitted between systems must use agreed-upon formats. JSON and Protocol Buffers specify exactly how to encode numbers, strings, and structures into transmittable bytes.

    Security: Buffer overflow vulnerabilities occur when programs write beyond allocated memory boundaries. Understanding how integers wrap around (e.g., 255 + 1 = 0 in 8-bit unsigned arithmetic) helps prevent calculation errors in critical systems.

    Performance: Bitwise operations (AND, OR, XOR, shifts) manipulate data at the binary level, enabling fast flag checking, permission systems, and graphics processing.

    Data representation bridges the gap between human meaning and machine capability. Every photo you share, every song you stream, every transaction you process relies on these fundamental encoding schemes.

    As computing evolves, representation schemes adapt. Quantum computing introduces qubits that exist in superposition, potentially revolutionizing how we encode and process information. Neural networks use specialized tensor formats optimized for parallel processing on GPUs.

    Yet the core principle remains unchanged: transforming the infinite complexity of our world into the elegant simplicity of binary, then back again. Mastering this transformation doesn't just make you a better programmer, it gives you deeper insight into the digital infrastructure shaping modern civilization.

    Next Steps: Try implementing a simple base converter in your preferred language, or examine the hexadecimal representation of image files using a hex editor. Seeing these abstractions in action cements theoretical understanding into practical skill.

    Found this helpful? Explore more tutorials at blog.ongoro.top and subscribe for weekly deep dives into computing fundamentals.

    George Ongoro
    George Ongoro

    Blog Author & Software Engineer

    I'm George Ongoro, a passionate software engineer focusing on full-stack development. This blog is where I share insights, engineering deep dives, and personal growth stories. Let's build something great!

    View Full Bio

    Related Posts

    Comments (0)

    Join the Discussion

    Please login to join the discussion