Python has owned data science for twenty years. NumPy, Pandas, Scikit-learn — the ecosystem is unmatched. So why are people starting to whisper about Rust?
The same thing that drew people to Python in the 90s: productivity. But a different kind.
The Python Problem
Python is slow. Everyone knows this. The GIL makes true parallelism a lie. Memory usage spirals on big datasets. And let's be honest — debugging a Pandas chain that failed on row 10 million is nobody's idea of a good time.
The solutions have been:
- Cython — write Python that compiles to C. Works, but now you're maintaining two languages.
- Numba — JIT compile your loops. Frail. Easy to break.
- Polars — rewrite in Rust, give it a Python API. It's faster, but you're still writing Python.
Polars is the interesting one. It's Rust under the hood, but it still feels like Python. And it's fast — sometimes 10x faster than Pandas on the same hardware.
Enter Ply: Full-Stack Rust Data Science
From TWiR 641 comes Ply, described as "building apps in Rust shouldn't be this hard." It's not a library — it's a full data science environment in Rust.
The pitch: skip Python entirely. Do your ETL, analysis, and modeling in Rust.
use ply::prelude::*;
fn main() -> Result<()> {
// Load data
let df = CsvReader::from_path("data.csv")?
.infer_schema()
.load()?;
// Transform
let result = df
.filter(col("value").gt(100))?
.group_by(["category"])?
.agg([
col("value").mean().alias("avg"),
col("value").count().alias("count")
])?;
// Save
result.to_csv("output.csv")?;
Ok(())
}
This isn't pseudocode. This is real, compilable Rust. And it's fast.
Why Now?
Three things converged:
- The performance ceiling hit — Python can't hide behind "fast enough" anymore when you're processing billions of rows
- Rust got usable — crates like Polars, Burn, and Candle proved Rust could do ML/data work
- The ecosystem matured — datafusion for SQL, arrow for columnar formats, ndarray for tensors
The Real Advantage Nobody Talks About
It's not raw speed. It's safety at scale.
When your pipeline crashes at 3am on a 500GB dataset, Rust gives you:
- No surprise GC pauses — memory is managed explicitly
- Fearless refactoring — the compiler catches your mistakes
- Single binary deployment — no environment hell
Python's "fast to write, slow to run" is becoming "slow to write, slow to run." Rust flips that.
What This Means for You
If you're starting a new data project:
- Exploration: Python/Pandas is still faster for prototyping
- Production pipeline: Consider Polars or even pure Rust
- Training models: Burn or Candle are production-ready
The future isn't "Python or Rust." It's both — with Rust where performance matters, Python where exploration does.
But if you're building tools for data scientists? You're competing with the entire Python ecosystem. Better to be 10x faster and 100x more reliable.
Written as a working Rust developer who's watched this space evolve. The answer to "should I use Rust for data science?" is "it depends." The answer to "should I learn Rust for data science?" is yes.