Optimizing Software Occlusion Culling – index
In January of 2013, some nice folks at Intel released a Software Occlusion Culling demo with full source code. I spent about two weekends playing around with the code, and after realizing that it made a great example for various things I’d been meaning to write about for a long time, started churning out blog posts about it for the next few weeks. This is the resulting series.
Here’s the list of posts (the series is now finished):
- “Write combining is not your friend”, on typical write combining issues when writing graphics code.
- “A string processing rant”, a slightly over-the-top post that starts with some bad string processing habits and ends in a rant about what a complete minefield the standard C/C++ string processing functions and classes are whenever non-ASCII character sets are involved.
- “Cores don’t like to share”, on some very common pitfalls when running multiple threads that share memory.
- “Fixing cache issues, the lazy way”. You could redesign your system to be more cache-friendly – but when you don’t have the time or the energy, you could also just do this.
- “Frustum culling: turning the crank” – on the other hand, if you do have the time and energy, might as well do it properly.
- “The barycentric conspiracy” is a lead-in to some in-depth posts on the triangle rasterizer that’s at the heart of Intel’s demo. It’s also a gripping tale of triangles, Möbius, and a plot centuries in the making.
- “Triangle rasterization in practice” – how to build your own precise triangle rasterizer and not die trying.
- “Optimizing the basic rasterizer”, because this is real time, not amateur hour.
- “Depth buffers done quick, part 1″ – at last, looking at (and optimizing) the depth buffer rasterizer in Intel’s example.
- “Depth buffers done quick, part 2″ – optimizing some more!
- “The care and feeding of worker threads, part 1″ – this project uses multi-threading; time to look into what these threads are actually doing.
- “The care and feeding of worker threads, part 2″ – more on scheduling.
- “Reshaping dataflows” – using global knowledge to perform local code improvements.
- “Speculatively speaking” – on store forwarding and speculative execution, using the triangle binner as an example.
- “Mopping up” – a bunch of things that didn’t fit anywhere else.
- “The Reckoning” – in which a lesson is learned, but the damage is irreversible.
All the code is available on Github; there’s various branches corresponding to various (simultaneous) tracks of development, including a lot of experiments that didn’t pan out. The articles all reference the blog branch which contains only the changes I talk about in the posts – i.e. the stuff I judged to be actually useful.
Special thanks to Doug McNabb and Charu Chandrasekaran at Intel for publishing the example with full source code and a permissive license, and for saying “yes” when I asked them whether they were okay with me writing about my findings in this way!

To the extent possible under law,
Fabian Giesen
has waived all copyright and related or neighboring rights to
Optimizing Software Occlusion Culling.
Trackbacks & Pingbacks