2024-07-01T12:00:00Z

Book Review: Data-Oriented Design


Author: Richard Fabian
Publish Date: 2018
Subtitle: Software Engineering for Limited Resources and Short Schedules
Worth it? Yes

Why I Picked The Book Up

Like many Computer Science grads my university education in terms of programming was essentially OOP, with a whirlwind tour of some others. In my case that was functional (Haskell) and logic (Prolog), but we did not linger on them for long. I feel very lucky that when I left university I was able to work in other styles, namely procedural when writing a cross-platform C backend for video conferencing software and now Go for database tooling.

And like many Computer Science grads my understanding of computers straight out of university was theoretical and quite out of date. I knew a bit about caches, von Neumann architecture and pipelining but nothing that could be applied in the real world. If you asked me to explain how any of those things might affect the performance of a programming I probably would have stared blankly at you.

This is where Data-Oriented Design comes in. It is about thinking and laying out the data that makes up your system in the most optimal way for the target hardware. That means considering data size but also data access patterns. It is a term I first heard when following the development of Zig's self-hosted compiler. In fact the author of Zig, Andrew Kelley, has a talk on DOD which is where I first heard of Richard Fabian's book.

Given my enjoyment of Zig and the referenced talk, and the feeling that although I knew my way around a few performance tools but didn't really know why something was fast or slow, this book seemed an obvious pick.

Review

The book starts of by defining DOD and a bit of a rallying cry against OOP. The author has a background in game development (more on this later) and gives an example of how a 2D grid game becomes expensive in OOP - a 128x128 map would give 16,384 tiles which is a lot if each is an object. Meanwhile if you limit yourself to at most 256 tiles (i.e. representable by a single byte) then your map will only take up 16KiB in memory.

The next chapter was where things really got interesting to me as it shows how DOD and relational databases are similar. Essentially the process you take when designing a schema - called normalisation - is equally applicable to your data in memory. By splitting your data up in this way it becomes more tightly packed and can be operated on as a group, something CPUs love. This way of presenting DOD is not something I have seen discussed elswhere and I think it is a nice entry point for people struggling to grasp the concepts. Later on there is also a section on how you can use indexes to help speed up your data processing - more databasey stuff.

The other chapters go through various DOD techniques such as:

  • Avoiding ifs by having multiple collections - e.g. if enemies can be dead or alive having two separate lists rather than a boolean
  • Helping the branch predictor - e.g. if doing work on some data but not the other can you make sure there are long runs of data you do the work on, perhaps by sorting?
  • Cache line utilisation - data on modern CPUs is loaded in chunks at least the length of a cache line. If your data is less than a cache line long you have some extra space which it might be beneficial to use
  • SIMD

Overall I found the explanations of all these techniques easy to follow and well motivated by code examples - there are a lot of examples! You get the real sense from the book that what the author is recommending is hard-won experience, not simply advice. I am particularly grateful for the twelfth chapter called "In Practice", where the author gives real world examples he has faced.

For me the topic of the examples is one of the places the book falls down however. As I mentioned the author is a game developer and the examples, as well as a lot of the content, is geared towards that domain. The introduction and the first chapter are perfectly up-front about this, but I would say the blurb and packaging of the book is not. To be perfectly honest however I am not sure I would have picked it up if it marketed it as such, which would have been a shame; there is a lot of technical information in this book that will serve you well outside of games.

Another place that I think the book does itself a disservice is how polemic it is. You really get the impression that the author has been burnt by OOP and its performance problems before. Although in small quantities that would give the book a bit of edge, it is overused here in my opinion. Both the introduction and the first chapter rag on OOP a fair bit and then the fourteenth chapter is dedicated entirely to the problems with how games are developed today (mostly OOP). In my opinion the author, when explaining the techniques, does more than enough to justify them without needing to labour the point.

If You Liked This, I Recommend

  • Computer Enhance - a video course by Casey Muratori on performance-aware programming which has a lot of overlap with DOD. There have been times when I have not been a fan but Casey's best qualities here as an ethusiastic expert and educator are on display here.
  • Mike Acton's Substack - again more videos but this time Mike is directly applying DOD. He takes a game made in Unity, grabs the data and starts normalising it. Don't be put off by the thumbnails of spreadsheets because what he manages to do here is really impressive.
  • Andrew Kelley's Presentation - mentioned earlier.