Influence of Caching and Encoding on Power Dissipation of System-Level Buses for Embedded Systems

William Fornaciari (1), Donatella Sciuto (1), Cristina Silvano (2)
(1) Politecnico di Milano, Dip. di Elettronica e Informazione, P.zza L. Da Vinci 32, 20133 Milano, Italy.
(2) CEFRIEL, via Fucini 2, 20133 Milano (MI), Italy.

Abstract

This paper proposes a methodology to evaluate the effects of encodings on the power consumption of system-level buses in the presence of multi-level cache memories. The proposed model can consider any cache configuration in terms of size, associativity and block. It includes also the most widely adopted power oriented encoding techniques for data and address buses. Experimental results show how the proposed model can be effectively adopted to configure the memory hierarchy and the system bus architecture from the power point of view.

1. System-level power model

The proposed model is composed of three main sub-models: the memory hierarchy, the bus encoder and the address and data stream generator, which have been integrated in an object-oriented sw tool written in C++. The models can be used as basic blocks of different types of system architecture, ranging from dedicated system to general-purpose computer systems.

The memory hierarchy model consists of a multi-level storage hierarchy of on-chip and off-chip caches. The generic level of the hierarchy can be organized as single unified cache or split between two different caches for instructions and data. The cache model considers several configurations in terms of cache size, block size, degree of associativity, write strategy and replacement policy. More in detail, the model offers the capability to vary: the size of the block, the cache size and the degree of associativity. The write strategy can be write-through or write-back. In the case of a write miss both the options write-allocate and no-write-allocate can be used. For set or fully associative caches, the block replacement policy can be random or LRU.

To evaluate the bus encoding effects on power consumption, the bus encoder model can be inserted either on the interface from the processor to the first level of the memory hierarchy or between any adjacent levels of the memory hierarchy. The model implements the most common power-oriented bus encoding techniques, such as Gray, Bus-Invert, T0, T0_BI, Dual_T0 and Dual_T0_BI. The encoding schemes can be applied to both the data and address buses.

The address and data stream generator aims at analyzing the system-level bus behavior by using address and data streams derived either by tracing a real microprocessor or by using a stream generator to simulate the execution of a generic program on a microprocessor. More specifically, the address generator models the processor-to-memory communication taking into account the spatial and temporal locality of memory references. The current version of the generator includes a generic load/store RISC architecture, although to derive the experimental results we refer to the instruction set of a 32-bit ARM7TDMI. In our model, we assume that the memory address spaces for data and instructions are separated. The address sequence in memory is generated by assigning the percentage of instructions of different classes, considering that we can specify: the format and the execution frequency for each instruction class; the addressing modes for each instruction and the related execution frequency and the execution rate of a conditional branch.

2. The simulation methodology

In this section, we describe the simulation methodology used to profile the power consumption of an embedded system consisting of the 32-bit low-power processor ARM7TDMI and a multi-level memory hierarchy providing a 32-bit address bus and a 32-bit data bus. The reference architecture is composed of the 33 MHz ARM7TDMI processor and its main memory interfacing through 66 MHz and 60 pF buses. Starting from this reference architecture, four different system configurations have been analyzed to evaluate:

• the bus encoding effects without cache (CASE1);
• the cache effects without bus encoding (CASE2);
• the combined effects of on-chip processor bus encoder and off-chip memory (CASE3);
• the combined effects of off-chip processor bus encoder followed by off-chip memory (CASE4);

In the simulation, we used a 100 000 generated instructions stream and a memory hierarchy constituted by a first level off-chip memory cache adopting write through, no write allocate and random block substitution policies. Results concerning CASE1 has been presented in [2], CASE2 aims at studying the effects of the off-chip first level cache, whose parameters vary from 4KB to 32KB for the cache size, 32-bit to 256-bit for the block size and associativity of 1-2-4-8 ways. We analyzed the miss rate vs cache size for four different block sizes and degrees of associativity. As expected, the miss rate decreases when these three parameters increase. Power behaves similarly because, by increasing the number of memory requests directly satisfied by the cache, the number of references to the main memory decreases. Consequently, a considerable reduction of the traffic occurs on the cache-to-memory bus, which has to switch larger capacitance (60 pF) than the processor-to-cache bus (10 pF). The corresponding power figures show a similar trend. Due to space limitation, only
the diagrams of 2-ways set-associative caches are reported in the following (fig. 1 and fig. 2).

![Figure 1. Power for address bus vs cache size for a 2-ways set associative cache and four different block sizes.](image1)

![Figure 2. Power for data bus vs cache size for a 2-ways set associative cache and four different block sizes.](image2)

A power reduction occurs for both address and data buses for larger cache sizes. A power reduction for larger block sizes can be observed only for the address bus, whilst for the data bus the power is almost invariant for any block size. As a matter of fact, for larger block sizes, the number of consecutive addresses loaded in caches increases and thus the average number of transitions (i.e., the power) of the address bus decreases for larger block sizes. The data bus behavior is quite different, since the data value of consecutive memory locations are distributed randomly. Hence, the power is almost the same for larger block sizes. A comparison with the bus power dissipated by the reference architecture (197.64 mW and 330.53 mW for address and data bus respectively) has shown how the memory hierarchy implies performance advantages but also power savings. The reduction increases for larger cache sizes. These results do not consider the internal power dissipation of the cache array, thus the effective reduction could be traded-off by the cache internal power.

For CASE2, the bus encoder is implemented on-chip, whilst an off-chip bus LI cache is provided, whose cache size varies from 4KB to 32KB, the degree of associativity is 1-2-4-8 ways and the block size is 64-bit. The power versus the cache size for 2-ways set associative cache and several bus encodings is reported in fig. 3 and fig. 4 for address and data bus, respectively.

Concerning the address bus, the power dissipation is considerably reduced by adopting the Gray, Dual_T0 and Dual_T0_BI schemes. The average percentage of power saved by several encodings with respect to the reference architecture are reported in table 1 for four cache sizes. For each encoding technique, larger savings can be obtained for larger caches, since the number of accesses to the main memory decreases.

Finally, table 2 reports the average percentage of power saved by the adopted encodings with respect to the binary encoding (same as CASE2) for four cache sizes. For the data bus, the power saving with respect to the binary code is very limited (approximately the 4% for BI), while the power is almost invariant for the other encodings. These results eliminate practical applications of the studied encoding methods on the data bus.

The analysis of CASE4 is similar to those carried out for the CASE2. Details and diagrams for all the presented cases can be found in [4].

![Figure 3. Power for address bus vs cache size for a 2-ways set associative cache and several bus encodings.](image3)

![Figure 4. Power for data bus vs cache size for a 2-ways set associative cache and several bus encodings.](image4)

<table>
<thead>
<tr>
<th>Cache Size</th>
<th>Binary</th>
<th>Grey</th>
<th>BI</th>
<th>T0</th>
<th>T0_BI</th>
<th>Dual_T0</th>
<th>Dual_T0_BI</th>
</tr>
</thead>
<tbody>
<tr>
<td>4KB</td>
<td>72.76</td>
<td>60.72</td>
<td>73.28</td>
<td>73.46</td>
<td></td>
<td></td>
<td>74.40</td>
</tr>
<tr>
<td>6KB</td>
<td>72.76</td>
<td>60.72</td>
<td>73.28</td>
<td>73.46</td>
<td></td>
<td></td>
<td>74.40</td>
</tr>
<tr>
<td>16KB</td>
<td>72.76</td>
<td>60.72</td>
<td>73.28</td>
<td>73.46</td>
<td></td>
<td></td>
<td>74.40</td>
</tr>
<tr>
<td>32KB</td>
<td>72.76</td>
<td>60.72</td>
<td>73.28</td>
<td>73.46</td>
<td></td>
<td></td>
<td>74.40</td>
</tr>
</tbody>
</table>

Table 1. Average power saving on address bus for different bus encodings vs. the reference architecture for 4 cache sizes.

<table>
<thead>
<tr>
<th>Cache Size</th>
<th>Binary</th>
<th>Grey</th>
<th>BI</th>
<th>T0</th>
<th>T0_BI</th>
<th>Dual_T0</th>
<th>Dual_T0_BI</th>
</tr>
</thead>
<tbody>
<tr>
<td>4KB</td>
<td>19.32</td>
<td>1.22</td>
<td>2.27</td>
<td>3.25</td>
<td></td>
<td>33.09</td>
<td>35.50</td>
</tr>
<tr>
<td>6KB</td>
<td>24.48</td>
<td>1.54</td>
<td>2.62</td>
<td>4.11</td>
<td></td>
<td>41.97</td>
<td>44.98</td>
</tr>
<tr>
<td>16KB</td>
<td>28.65</td>
<td>1.80</td>
<td>3.30</td>
<td>4.81</td>
<td></td>
<td>49.96</td>
<td>52.83</td>
</tr>
<tr>
<td>32KB</td>
<td>29.25</td>
<td>1.84</td>
<td>3.36</td>
<td>4.91</td>
<td></td>
<td>50.58</td>
<td>53.74</td>
</tr>
</tbody>
</table>

Table 2. Average power saving on address bus for different bus encodings vs. the binary encoding for 4 cache sizes.

3. Conclusions and future work

Aim of this work has been to evaluate the effects on power of bus encoding schemes in the presence of multi-level cache memories. Current effort is devoted to analyze high-end general purpose systems targeted for PowerPC architecture, where the presence of Virtual Memory as well as a finer grain model of the memory arrays contribution have to be taken into account.

4. References


