Issues in designing a multicore processor

1) Cache related issues:

 



 

i) Amount of cache: The size of cache required for a multicore processor is application dependent. The application where the data reusability is more it is preferred to use big cache. Bigger the size of cache faster will be the speed of accessing and better will be the performance but higher will be the cost.

ii) Number of cache levels: Another cache issue is to decide the number of cache levels a cache of multicore may have. It is not necessary for the entire cores cache to have equal number of cache levels. Basically the number of cache levels is decided by how far the main memory is or how many cycles will it take to access the main memory. More the number of cycles more will be the cache levels and faster is the accessing.

iii) Deterministic/nondeterministic performance: In some applications the caches are tagged and managed either by the hardware itself or its local memory. In case of hardware managed tags the tags are stored on the same die area automatically thus reducing the space for computation and hence give a nondeterministic performance. Whereas the tags assigned by the software stream explicitly and managed by the local memory does not store the tags and hence gives more space for storage in the same area. However, the latter case is complex so it is totally application dependent to decide the type of tagging of cache it requires for a good performance.

iv) Reusability: Another cache related issue is how to make the cache reusable. Since all the cores of the multicore processor have a limited amount of their private or shared cache so it is necessary to make the space reused on the basis of how frequently the caches are being accessed. More the caches are reused more space can be provided for computation and faster will be the accessing.

v) Hit/miss rates: If a data is requested during computation and the requested data is found in the core cache then it is said to be a hit else a miss. Hit or miss rate has always been an issue for an ideal processor. It is always preferred to have higher hit rate than miss rates so that the time of accessing is reduced and the computation is fast. This factor is dependent on the size of caches, the writing and page replacement strategy used by the cache and the number of cache levels.

2) Selection of core used: An application assigns its tasks among the various cores of the processor such that the tasks can perform their computation efficiently. In concern with this, two issues is to be dealt with: selection of which type of core which is compatible with the tasks assigned to them and number of cores needed for a particular application.

3) Consistency among cores: Since each core has its private cache, thus the copy of a data in each cache may not be the same. Consistency among cache, called as cache coherence, ensures that a single image of data stored in memory is seen by all the other cores of the processor for computation. Cache coherence can be implemented either via broadcast coherence or directory based coherence. In broadcast coherence only one processor is allowed to perform an operation. When a write occurs an invalidate message is sent to all the other cores and the write is performed only when all the cores acknowledges the permission to it to perform the write operation. All the other operations are delayed until this write is performed and hence provide a strong consistent environment among the cores of the processors. In directory coherence, a directory is used to store that which memory address is used by which cache. A home node is assigned to each address where its directory portion is stored. Whenever a request occurs the processor query the home node of that core to find the set of cores holding that cache address block. The requesting core in turn gets the permission from all the other cores holding that cache block. This scheme allows the read and write request to perform in parallel and hence, suitable for weak consistent models. The broadcast coherence can only be used for application having small number of processor whereas directory coherence scheme is used for processor with large number of cores.


Comments

Popular posts from this blog

Scheduling and factors affecting performance of multicore processor

Introduction