Data-oriented Design

2026-03-15 00:35

Status: #child

Tags: #software #philosophy

It's all about the data

Perhaps it's worth internalizing the idea that no software application is anything without its data. Even instructions!
We know that instructions take up memory and use up bandwidth, and can be transformed, loaded, saved, and constructed.

The second thing to understand in the data-oriented paradigm, is that this data needs to be transformed and ran on something. This can be something abstract like a VM running on an unknown hardware or it might be running directly on a known compute unit.

Data-oriented Design

Data-oriented design is the practice of designing software by developing transformations for well-formed data where the cri-
teria for well-formed is guided by the target hardware and the patterns and types of transforms that need to operate on
it.

Therefore, one can define software engineering as the ability to control and design flow paths for said data with an understanding of the hardware on which we do these flows/transformations.

Data is not the problem domain

People dislike data-oriented design because it seems like an antithesis to other software design paradigms which incorporate the problem domain into the software being written. It doesn't seem to make a lot of sense how one can make any useful software while thinking of bytes and CPU pipelines.

The reality is that meaning can be applied to data to create information. This can be useful when adding ambiguous data like force or distance; one would ideally provide context by putting such data inside a class neighboring related data. This, however, also has its downsides. One will eventually start putting data deep within classes that forgetting its impact becomes an inevitability.

Example:
A lot of developers, although could have used a 2D/3D grid system, chose to keep the object paradigm for each entity in the map. This crime has been committed by a lot of devs ends up in a lot of objects for each tile..

This grid system is extremely easy to do in a contiguous manner since we can create a vector of tiles with x*y size and just index into things with a y*256 + x. Hardware can deal with that data-oriented model of the grid system. However, imagine making an object called Tile and giving that object a position to represent its location in the grid...

The latter design is pretty common but should be considered a crime against hardware. Programmers are often not necessarily dumb when they do this; they think about the huge number of tiles one would need and think that they should instead create the tiles when needed at runtime and avoid creating the full grid by default. However, most of the grids they deal with are actually much better created as contiguous bytes especially since the pointer chasing that comes from the object-oriented design is extremely expensive.

Another example from the game industry is that programmers tend to use an instance for each and every item in the game:

for (int i = 0; i < 100; i++)  
{  
	inventory.push_back(new Arrow());  
}

Instead, they could simply keep track of the number of arrows in their inventory.
Note that there is a reason to do this sometimes; creating and destroying arrows at runtime can be pretty expensive:

Another issue that we can see in games in the monolithic design that comes as a result of putting everything in the player object..

Again, seems simpler at first since it allows programmers to make a ton of assumptions but will be the reason for a lot of bugs and hard to untangle code as the in-game logic increases.

Summary

References

Tags:

Software
Philosophy