Issues in designing a multicore processor
1)
Cache related issues:
i)
Amount of cache: The size of cache required for a
multicore processor is application dependent. The application where the data
reusability is more it is preferred to use big cache. Bigger the size of cache
faster will be the speed of accessing and better will be the performance but
higher will be the cost.
ii)
Number of cache levels: Another cache issue is
to decide the number of cache levels a cache of multicore may have. It is not
necessary for the entire cores cache to have equal number of cache levels.
Basically the number of cache levels is decided by how far the main memory is
or how many cycles will it take to access the main memory. More the number of
cycles more will be the cache levels and faster is the accessing.
iii)
Deterministic/nondeterministic performance:
In some applications the caches are tagged and managed either by the hardware
itself or its local memory. In case of hardware managed tags the tags are
stored on the same die area automatically thus reducing the space for
computation and hence give a nondeterministic performance. Whereas the tags
assigned by the software stream explicitly and managed by the local memory does
not store the tags and hence gives more space for storage in the same area.
However, the latter case is complex so it is totally application dependent to
decide the type of tagging of cache it requires for a good performance.
iv)
Reusability: Another cache related issue is how
to make the cache reusable. Since all the cores of the multicore processor have
a limited amount of their private or shared cache so it is necessary to make
the space reused on the basis of how frequently the caches are being accessed.
More the caches are reused more space can be provided for computation and
faster will be the accessing.
v)
Hit/miss rates: If a data is requested during
computation and the requested data is found in the core cache then it is said
to be a hit else a miss. Hit or miss rate has always been an issue for an ideal
processor. It is always preferred to have higher hit rate than miss rates so
that the time of accessing is reduced and the computation is fast. This factor
is dependent on the size of caches, the writing and page replacement strategy
used by the cache and the number of cache levels.
2)
Selection of core used: An application assigns
its tasks among the various cores of the processor such that the tasks can
perform their computation efficiently. In concern with this, two issues is to
be dealt with: selection of which type of core which is compatible with the
tasks assigned to them and number of cores needed for a particular application.
3)
Consistency among cores: Since each core has its
private cache, thus the copy of a data in each cache may not be the same.
Consistency among cache, called as cache coherence, ensures that a single image
of data stored in memory is seen by all the other cores of the processor for
computation. Cache coherence can be implemented either via broadcast coherence or
directory based coherence. In broadcast coherence only one processor is allowed
to perform an operation. When a write occurs an invalidate message is sent to
all the other cores and the write is performed only when all the cores
acknowledges the permission to it to perform the write operation. All the other
operations are delayed until this write is performed and hence provide a strong
consistent environment among the cores of the processors. In directory
coherence, a directory is used to store that which memory address is used by
which cache. A home node is assigned to each address where its directory
portion is stored. Whenever a request occurs the processor query the home node
of that core to find the set of cores holding that cache address block. The
requesting core in turn gets the permission from all the other cores holding
that cache block. This scheme allows the read and write request to perform in
parallel and hence, suitable for weak consistent models. The broadcast
coherence can only be used for application having small number of processor
whereas directory coherence scheme is used for processor with large number of
cores.

Comments
Post a Comment