Why Rust Is Quietly Winning Data Science

Python has owned data science for twenty years. NumPy, Pandas, Scikit-learn — the ecosystem is unmatched. So why are people starting to whisper about Rust?

The same thing that drew people to Python in the 90s: productivity. But a different kind.

The Python Problem

Python is slow. Everyone knows this. The GIL makes true parallelism a lie. Memory usage spirals on big datasets. And let's be honest — debugging a Pandas chain that failed on row 10 million is nobody's idea of a good time.

The solutions have been:

Cython — write Python that compiles to C. Works, but now you're maintaining two languages.
Numba — JIT compile your loops. Frail. Easy to break.
Polars — rewrite in Rust, give it a Python API. It's faster, but you're still writing Python.

Polars is the interesting one. It's Rust under the hood, but it still feels like Python. And it's fast — sometimes 10x faster than Pandas on the same hardware.

Enter Ply: Full-Stack Rust Data Science

From TWiR 641 comes Ply, described as "building apps in Rust shouldn't be this hard." It's not a library — it's a full data science environment in Rust.

The pitch: skip Python entirely. Do your ETL, analysis, and modeling in Rust.

use ply::prelude::*;

fn main() -> Result<()> {
    // Load data
    let df = CsvReader::from_path("data.csv")?
        .infer_schema()
        .load()?;
    
    // Transform
    let result = df
        .filter(col("value").gt(100))?
        .group_by(["category"])?
        .agg([
            col("value").mean().alias("avg"),
            col("value").count().alias("count")
        ])?;
    
    // Save
    result.to_csv("output.csv")?;
    
    Ok(())
}

This isn't pseudocode. This is real, compilable Rust. And it's fast.

Why Now?

Three things converged:

The performance ceiling hit — Python can't hide behind "fast enough" anymore when you're processing billions of rows
Rust got usable — crates like Polars, Burn, and Candle proved Rust could do ML/data work
The ecosystem matured — datafusion for SQL, arrow for columnar formats, ndarray for tensors

The Real Advantage Nobody Talks About

It's not raw speed. It's safety at scale.

When your pipeline crashes at 3am on a 500GB dataset, Rust gives you:

No surprise GC pauses — memory is managed explicitly
Fearless refactoring — the compiler catches your mistakes
Single binary deployment — no environment hell

Python's "fast to write, slow to run" is becoming "slow to write, slow to run." Rust flips that.

What This Means for You

If you're starting a new data project:

Exploration: Python/Pandas is still faster for prototyping
Production pipeline: Consider Polars or even pure Rust
Training models: Burn or Candle are production-ready

The future isn't "Python or Rust." It's both — with Rust where performance matters, Python where exploration does.

But if you're building tools for data scientists? You're competing with the entire Python ecosystem. Better to be 10x faster and 100x more reliable.

Written as a working Rust developer who's watched this space evolve. The answer to "should I use Rust for data science?" is "it depends." The answer to "should I learn Rust for data science?" is yes.