High-Performance Cache Design
The AMD Athlon processor's high-performance cache architecture includes an integrated dual-ported 128KB split-L1 cache with separate snoop port; either an integrated full-speed, 16-way set-associative, 256KB L2 cache or an L2 controller for an external L2 cache of size 512KB to as much as 8MB of industry-standard SDR or DDR SRAMs using a 72-bit (64-bit data + 8-bit ECC) interface; and a multi-level split 512-entry Translation Lookaside Buffer (TLB).
The AMD Athlon processor's large integrated full-speed L1 cache is comprised of two separate 64KB, two-way set-associative data and instruction caches which is four times larger than the Pentium III processor's L1 cache (128KB vs. 32KB). By featuring a larger L1 cache, applications running on AMD Athlon processors perform significantly faster since more instruction and data information is local to the processor. The data cache also has eight banks to provide maximum parallelism for running multiple applications. It supports concurrent accesses by two 64-bit loads or stores. The instruction cache contains predecode data to assist multiple, high-performance instruction decoders. Both instruction and data caches are dual-ported and contain dedicated snoop ports to eliminate all system coherency traffic, common in systems with many devices, from interfering with application performance.
The new AMD Athlon processor with performance enhancing cache memory includes an integrated, full-speed, 16-way set-associative, 256KB L2 cache. Previous AMD Athlon processors contain an L2 controller which operates at the maximum frequency compatible with the latest industry-standard SRAMs. By integrating the L2 cache onto the processor, the L2 cache always operates at the same frequency as the processor, thereby minimizing any delays incurred waiting for external data from a slower bus. The new AMD Athlon processor's L2 cache is 16-way set-associative, twice that of the L2 cache of the Intel Pentium III processor (16-way vs. 8-way). Higher associativity dramatically improves application performance since more local application data resides in the high-speed L2 cache memory instead of system memory. Finally, the integrated L2 cache tags improves performance by quickly indicating whether critical application data is located within the L2 cache. Having integrated tags is especially important for processors which utilize external SRAMs for the L2 cache. If application data is determined not to reside in the L2 cache early enough, then the processor can immediately request this data from the slower system memory, instead of checking for this data in an external L2 cache first, and then, having to request this data from system memory.
The robust multi-level, 512-entry, split TLB cache significantly improves performance of systems configured with large physical memory or storage, typically found in server environments, by caching all important translation information used by operating systems and application software that access large physical memory or storage. Thus, the cache architecture of the AMD Athlon processor enables high instruction-execution rates by minimizing effective memory latency and system snoop responses, and it provides large spatial locality of data for transaction-based applications and multiprocessing operating systems. The architecture also supports high-bandwidth data transfers to and from the execution resources, and it contributes to significant performance gains and extremely fast operation of data-intensive software programs.
The AMD Athlon processor's cache architecture is the first to incorporate a system-based MOESI (Modify, Owner, Exclusive, Shared, Invalid) cache control protocol for x86 multiprocessing platforms. Since the system logic manages memory coherency throughout the system by specifying all cache state transitions, either using a MESI or MOESI cache coherency protocol, and by filtering out unnecessary processor snoops, AMD Athlon processors are designed to deliver exceptional performance in both uniprocessor and multiprocessor system configurations. The AMD Athlon processor cache architecture also supports error correction code (ECC) protection, which is a required feature for high reliability of business desktop systems, workstations, and servers. Thus, the AMD Athlon processor's cache architecture provides the features required for high-performance computing from desktop to server configurations.