Memory Subsystem Optimizations

2026/01/02 2:52

Memory Subsystem Optimizations

RSS: https://news.ycombinator.com/rss

要約

本文

MenuIn this blog I wrote 18 blog posts about memory subsystem optimizations. By memory subsystem optimizations, I mean optimizations that aim at making software faster by better using the memory subsystem. Most of them are applicable to software that works with large datasets; but some of them are applicable to software that works with any data regardless of its size.Do you need to discuss a performance problem in your project? Or maybe you want a vectorization training for yourself or your team? Contact us Or follow us on LinkedIn , Twitter or Mastodon and get notified as soon as new content becomes available.Here is a list of all posts that we covered on Johnny’s Software Lab:TopicDescriptionLinkDecreasing Total Memory AccessesWe speed up software by keeping data in registers instead of reloading it from the memory subsystem several times.Decreasing the Number of Memory Accesses 1/2Decreasing the Number of Memory Accesses: The Compiler’s Secret Life 2/2Changing the Data Access Pattern to Increase LocalityBy changing our data access pattern we increase the possibility our data is in the fastest level of data cache.For Software Performance, the Way Data is Accessed Matters!Changing the Data Layout: ClassesSelecting proper class data layout can improve software performance.Software Performance and Class LayoutChanging the Data Layout: Data StructuresBy changing the data layout of common data structures, such as linked lists, trees or hash maps we can improve their performance.Faster hash maps, binary trees etc. through data layout modificationDecreasing the Dataset SizeMemory efficiency can be improved by decreasing the dataset size. This results in speed improvements as well.Memory consumption, dataset size and performance: how does it all relate?Changing the Memory LayoutWhereas data layout is determined at compile time, memory layout is determined by the system allocator at runtime. We examine how changing the memory layout using custom allocators influences software performance.Performance Through Memory LayoutIncreasing instruction-level parallelismSome codes cannot utilize the memory subsystem fully because of instruction dependencies. Here we investigate techniques that break dependencies and improve performance.Instruction-level parallelism in practice: speeding up memory-bound programs with low ILPHiding Memory Latency With In-Order CPU Cores OR How Compilers Optimize Your CodeSoftware prefetching for random data accessesExplicit software prefetches tell hardware that you will be accessing a certain piece of data soon. When used smartly, they can improve software performance.The pros and cons of explicit software prefetchingDecreasing TLB cache missesTLB cache is a small cache that speeds up translation of virtual to physical memory addresses. In some cases, it can be the reason for poor performance. We investigate techniques for decreasing TLB cache misses.Speeding Up Translation of Virtual To Physical Memory Addresses: TLB and Huge PagesSaving the memory subsystem bandwidthIn some cases, we don’t care about software performance, but we do care about being a good neighbor. We investigate techniques that make our software consume least possible amount of memory subsystem resources.Frugal Programming: Saving Memory Subsystem BandwidthBranch prediction and data cachesWe investigate the delicate interplay of the branch prediction and the memory subsystem.Unexpected Ways Memory Subsystem Interacts with Branch PredictionMultithreading and the Memory SubsystemHere we investigate how memory subsystem behaves in the presence of multithreading and how does that effect software speed.Multithreading and the Memory SubsystemLow-latency applicationsIn some cases we are more interested in short latency than high throughput. We investigate the techniques aimed at improving latency, either by modifying our programs, or reconfiguring the system.Latency-Sensitive Applications and the Memory Subsystem: Keeping the Data in the CacheLatency-Sensitive Application and the Memory Subsystem Part 2: Memory Management MechanismsMeasuring Memory Subsystem PerformanceWe talk about tools and metrics you can use to understand what is going on with the memory subsystem.Measuring Memory Subsystem PerformanceOther topicsA few remaining topics related to memory subsystem optimizations that didn’t fit any of the other categories.Memory Subsystem Optimizations – The Remaining TopicsAny feedback on the material covered in this posts will be highly appreciated.Do you need to discuss a performance problem in your project? Or maybe you want a vectorization training for yourself or your team? Contact us Or follow us on LinkedIn , Twitter or Mastodon and get notified as soon as new content becomes available.

同じ日のほかのニュース

一覧に戻る →