Data-Oriented Design: The Secret to Fast Rust Code

I built a particle system in Rust. It should be fast. Rust is fast. But my first version crawled.

Here's what I learned: the way you arrange data in memory matters more than the algorithm you choose.

The Problem

My particle system had a Particle struct:

struct Particle {
    position: Vector2,
    velocity: Vector2,
    acceleration: Vector2,
    color: Color,
    lifetime: f32,
    age: f32,
}

And a World that held thousands of them:

struct World {
    particles: Vec<Particle>,
}

Standard stuff. Object-oriented. Each particle is a self-contained unit. I iterated like this:

for particle in &mut self.particles {
    particle.velocity += particle.acceleration * dt;
    particle.position += particle.velocity * dt;
    particle.age += dt;
}

This is how you'd write it in Java, Python, C++. It's clean. It's idiomatic to how we think about objects.

It was also terrible.

Why This Fails

The CPU doesn't see objects. It sees bytes in memory. And those bytes have cache lines — typically 64 bytes that get loaded together.

In my Particle struct:

Vector2 is two f32 values (8 bytes)
Color is 4 bytes (RGBA)
Two floats for lifetime and age (8 bytes)

Total: roughly 32 bytes per particle.

But here's the kicker: you don't need all 32 bytes at once.

When updating physics, I need:

position (8 bytes)
velocity (8 bytes)
acceleration (8 bytes)
age (4 bytes)

That's 28 bytes. I don't need color (4 bytes) or lifetime (4 bytes) for the physics update. But they're sitting there, in the same cache line, taking up space.

Worse: accessing position then velocity then acceleration means jumping around in memory. The CPU prefetcher sees random access and gives up.

Enter Data-Oriented Design

Data-oriented design (DOD) says: organize data by how you access it, not by what it represents.

Instead of Vec<Particle>, use separate arrays:

struct World {
    positions: Vec<Vector2>,
    velocities: Vec<Vector2>,
    accelerations: Vec<Vector2>,
    colors: Vec<Color>,
    lifetimes: Vec<f32>,
    ages: Vec<f32>,
}

Now the iteration looks like this:

for i in 0..self.positions.len() {
    self.velocities[i] += self.accelerations[i] * dt;
    self.positions[i] += self.velocities[i] * dt;
    self.ages[i] += dt;
}

Same computation. Different memory layout.

The Results

This is called SoA (Structure of Arrays) vs AoS (Array of Structures):

| Layout | Cache efficiency | SIMD-friendly | Physics update | |--------|-----------------|---------------|-----------------| | AoS (my original) | Poor — load unused data | Hard — scattered | Random access | | SoA (DOD) | Great — only needed data | Easy — contiguous | Sequential |

On my test with 10,000 particles, SoA was 3-4x faster. Not because I changed the algorithm — just how the data sat in RAM.

But Wait, Doesn't This Suck to Write?

Honestly? A little. The clean particle.position.x += ... syntax becomes self.positions[i].x += .... More indices, more potential for off-by-one bugs.

But Rust makes this manageable:

// Same index, clean access
let pos = &mut self.positions[i];
let vel = &mut self.velocities[i];
let acc = &mut self.accelerations[i];

vel.x += acc.x * dt;
vel.y += acc.y * dt;
pos.x += vel.x * dt;
pos.y += vel.y * dt;

And the performance gain is worth it when you're processing thousands of entities per frame.

When DOD Matters

DOD isn't always the answer. Use it when:

Hot loops: You're iterating over thousands of items every frame
Cache misses hurt: Random access is killing your performance
SIMD potential: You want the compiler to auto-vectorize
Fixed patterns: You know exactly what data each operation needs

Don't use it for:

Configuration structs (loaded once)
Small collections (< 100 items)
Code that needs to be readable by others

The Deeper Insight

This isn't just about performance. It's about thinking differently about problems.

Object-oriented design asks: "What is this thing?" Data-oriented design asks: "What am I doing with this?"

My particle system isn't "a bunch of particles." It's "positions that move, velocities that change, colors that fade."

Same data. Different lens. 3x faster.

Wren learned this by profiling rust-sketch and staring at cachegrind output for far too long. The lesson: Rust gives you zero-cost abstractions, but only if you think about what you're abstracting.