Smart Memories

Trends in VLSI technology scaling demand that future computing devices be narrowly focused to achieve high performance and high efficiency, yet also target the high volumes and low costs of widely applicable general-purpose designs. To address these conflicting requirements, here propose a modular reconfigurable architecture called Smart Memories, targeted at computing needs in the 0.1mm technology generation. A Smart Memories chip is made up of many processing tiles, each containing local memory, local interconnect, and a processor core. For efficient computation under a wide class of possible applications, the memories, the wires, and the computational model can all be altered to match the applications. To show the applicability of this design, two very different machines at opposite ends of the architectural spectrum, the Imagine stream processor and the Hydra speculative multiprocessor, are mapped onto the Smart Memories computing substrate. Simulations of the mappings show that the Smart Memories architecture can successfully map these architectures with only modest performance degradation.
INTRODUCTION

            The continued scaling of integrated circuit fabrication technology will dramatically affect the architecture of future computing systems. Scaling will make computation cheaper, smaller, and lower power, thus enabling more sophisticated computation in a growing number of embedded applications. This spread of low-cost, low power computing can easily be seen in today’s wired (e.g. gigabit Ethernet or DSL) and wireless communication devices, gaming consoles, and handheld PDAs. These new applications have different characteristics from today’s standard workloads, often containing highly data-parallel streaming behavior. While the applications will demand ever-growing compute performance, power (ops/W) and computational efficiency (ops/$) are also paramount; therefore, designers have created narrowly focused custom silicon solutions to meet these needs.

                 However, the scaling of process technologies makes the construction of custom solutions increasingly difficult due to the increasing complexity of the desired devices. While designer productivity has improved over time, and technologies like system-on-a-chip help to manage complexity, each generation of complex machines is more expensive to design than the previous one. High non-recurring fabrication costs (e.g. mask generation) and long chip manufacturing delays mean that designs must be all the more carefully validated, further increasing the design costs. Thus, these large complex chips are only cost-effective if they can be sold in large volumes. This need for a large market runs counter to the drive for efficient, narrowly- focused, custom hardware solutions.

                 To fill the need for widely applicable computing designs, a number of more general-purpose processors are targeted at a class of problems, rather than at specific applications. Tri-media, Equator, Mpact, IRAM, and many other projects are all attempts to create general purpose computing engine for multi-media applications. However, these attempts to create more universal computing elements have some limitations. First, these machines have been optimized for applications where the parallelism can be expressed at the instruction level using either VLIW or vector engines. However, they would not be very efficient for applications that lacked parallelism at this level, but had, for example, thread level parallelism. Second, their globally shared resource models (shared multi-ported registers and memory) will be increasingly difficult to implement in future technologies in which on-chip communication costs are appreciable. Finally, since these machines are generally compromise solutions between true signal processing engines and general-purpose processors, their efficiency at doing either task suffers.

If you like this please Link Back to this article...



Post a Comment