Memory System
Basic Concepts
The
maximum size of the memory that can be used in any computer is determined by
the addressing scheme.
16-bit addresses = 216 = 64K memory locations
Most
modern computers are byte addressable.
Big-endian
& Little-endian Assignment
Traditional Architecture
Figure .
Connection of the memory to the
processor.
Some Basic Concepts
“Block transfer” – bulk data
transfer
Memory access time
Memory cycle time
RAM – any location can be accessed for a
Read or Write operation in some fixed amount of time that is independent of the
location’s address.
Cache memory
Virtual memory, memory management unit
Internal Organization of Memory Chips
Figure :Organization of bit cells in a memory chip.
Semiconductor RAM Memories
A Memory Chip
Figure . Organization of a 1K ´ 1 memory chip.
Static Memories
•The
circuits are capable of retaining their state as long as power is
applied.
Figure : A static RAM cell.
• CMOS
cell: low
power consumption
Figure: An example of a CMOS memory
cell.
Asynchronous DRAMs
- Static RAMs are fast, but they cost more area and are more expensive. Dynamic RAMs (DRAMs) are cheap and area efficient, but they can not retain their state indefinitely – need to be periodically refreshed
Asynchronous DRAMs
Figure :
A single-transistor dynamic memory
cell
A Dynamic Memory Chip
Figure:
Internal organization of a 2M ´
8 dynamic memory chip.
Synchronous DRAMs
•The
operations of SDRAM are controlled by a clock signal.
Figure :
Synchronous DRAM.
Synchronous DRAMs
Figure :
Burst read of length 4 in an SDRAM.
• No
CAS pulses is needed in burst operation.
• Refresh
circuits are included (every 64ms).
• Clock
frequency > 100 MHz
• Intel
PC100 and PC133
Synchronous DRAMs
• The
choice of a RAM chip for a given application depends on several factors:Cost, speed, power, size…
• SRAMs
are faster, more expensive, smaller.
• DRAMs are slower, cheaper, larger
• Which one for cache and main memory, respectively?
• DRAMs are slower, cheaper, larger
• Which one for cache and main memory, respectively?
• Refresh
overhead – suppose a SDRAM whose cells are in 8K rows; 4 clock cycles are
needed to access each row; then it takes 8192×4=32,768 cycles to refresh all
rows; if the clock rate is 133 MHz, then it takes 32,768/(133×10-6)=246×10-6 seconds; suppose the typical
refreshing period is 64 ms, then the refresh overhead is
0.246/64=0.0038<0.4% of the total time available for accessing the memory.
Memory Controller
Figure: Use of a memory controller.
Read-Only-Memory
Figure
: A ROM
cell.
•Volatile / non-volatile memory
•ROM
•PROM: programmable ROM
•EPROM: erasable, reprogrammable ROM
•EEPROM: can be programmed and erased
electrically
Flash
Memory
•Similar to EEPROM
•Difference: only possible to write an
entire block of cells instead of a single cell
•Low power
•Use in portable equipment
•Implementation of such modules
–Flash cards
–Flash drives
•Fastest access is to the data held in
processor registers. Registers are at the top of the memory hierarchy.
•Relatively small amount of memory that
can be implemented on the processor
chip. This is processor cache.
•Two levels of cache. Level 1 (L1) cache
is on the processor chip. Level 2 (L2)
cache is in between main memory and processor.
•Next level is main memory, implemented as
SIMMs. Much larger, but much slower than cache memory.
•Next level is magnetic disks. Huge amount
of inexepensive storage.
Cache
Memory
The Cache memory stores a reasonable
number of blocks at a given time but this number is small compared to the total
number of blocks available in Main
Memory.
The Cache control hardware decide that
which block should be removed to create space for the new block that contains
the referenced word.
The collection of rule for making this
decision is called the replacement algorithm.
The cache control circuit determines
whether the requested word currently exists in the cache.
If the data is in the cache it is called
a Read or Write hit
If the data is not present in the cache,
then a Read miss or Write miss occurs
Cache
Memories
•Effectiveness
of cache is based on a property of computer programs called locality
of reference
•Most
of programs time is spent in loops or procedures called repeatedly. The remainder of the program is accessed
infrequently.
•Temporal
referencing – a
recently executed instruction is likely to be called again.
•Spatial
referencing –
instructions in close proximity to a recently executed instruction are likely
to be called again.
•Effectiveness
of cache is based on a property of computer programs called locality
of reference
•Most
of programs time is spent in loops or procedures called repeatedly. The remainder of the program is accessed
infrequently.
•Temporal
referencing – a
recently executed instruction is likely to be called again.
•Spatial
referencing –
instructions in close proximity to a recently executed instruction are likely
to be called again.
•Based
on locality of reference
–Temporal
•Recently executed instructions are likely to executed again soon
–Spatial
•Instructions in close proximity to a recently executed instruction (with respect to an address) are also likely to be executed soon.
•Cache Block – a set of contiguous address locations (cache block = cache line)
Conceptual Operation of Cache
•Memory control circuitry is designed to take advantage of locality of reference.
•Temporal
–Whenever an information (instruction or data) is first needed, this item should be brought into the cache where it will hopefully remain until it is needed again.
•Spatial
–Instead of fetching just one item from the main memory to the
cache, it is useful to fetch several items that reside at adjacent
addresses well.
•A set of contiguous addresses are called a block
–cache block or cache line
•Write through Protocol
–Cache and main memory are updated simultaneously
•Write Back Protocol
–Update on the cache and mark it with an associated flag bit (dirty or modified bit)
–Main memory is updated later, when the block containing this marked word is to be removed from cache to make room for a new block.
Write Protocols
•Write through
–Simpler, but results in unnecessary Write operations in main memory when a cache word is updated several times during its cache residency.
•write back
–can result in unnecessary write operations because when a cache block is written back to the memory all words of the block are written back, even if only a single word has been changed while the block was in the cache.
Mapping Algorithms
•Processor does not need to know explicitly that there is a cache.
•Based on R/W operations, the cache control circuitry determines whether the requested word currently exists in the cache. (Hit)
•If information is in cache for a read, main memory is not involved. For write operations, system can either use write-through protocol or write-back protocol
Mapping Functions
•Specification of correspondence between the main memory blocks and those in cache.
•Hit or Miss
Read Protocols
•Read miss
–Addressed word is not in cache
–Temporal
•Recently executed instructions are likely to executed again soon
–Spatial
•Instructions in close proximity to a recently executed instruction (with respect to an address) are also likely to be executed soon.
•Cache Block – a set of contiguous address locations (cache block = cache line)
Conceptual Operation of Cache
•Memory control circuitry is designed to take advantage of locality of reference.
•Temporal
–Whenever an information (instruction or data) is first needed, this item should be brought into the cache where it will hopefully remain until it is needed again.
•Spatial
–Instead of fetching just one item from the main memory to the
cache, it is useful to fetch several items that reside at adjacent
addresses well.
•A set of contiguous addresses are called a block
–cache block or cache line
•Write through Protocol
–Cache and main memory are updated simultaneously
•Write Back Protocol
–Update on the cache and mark it with an associated flag bit (dirty or modified bit)
–Main memory is updated later, when the block containing this marked word is to be removed from cache to make room for a new block.
Write Protocols
•Write through
–Simpler, but results in unnecessary Write operations in main memory when a cache word is updated several times during its cache residency.
•write back
–can result in unnecessary write operations because when a cache block is written back to the memory all words of the block are written back, even if only a single word has been changed while the block was in the cache.
Mapping Algorithms
•Processor does not need to know explicitly that there is a cache.
•Based on R/W operations, the cache control circuitry determines whether the requested word currently exists in the cache. (Hit)
•If information is in cache for a read, main memory is not involved. For write operations, system can either use write-through protocol or write-back protocol
Mapping Functions
•Specification of correspondence between the main memory blocks and those in cache.
•Hit or Miss
–Write
through Protocol
–Write
back protocol (uses dirty bit)
–Read
miss
–Load
through or early restart on read miss
–Write
Miss
Read Protocols
•Read miss
–Addressed word is not in cache
–Block
of words containing requested word is written from main memory to cache.
–After
entire block is written to cache, particular word is forwarded to processor.
Or
word may be sent to processor as soon as it is read from main memory
(load-through or early-restart)
reduces processor’s wait time but requires
more complex circuitry.
Write Miss
•If addressed word is not in cache for a write operation, write miss occurs.
•write-through
– information is written directly into main memory.
•Write-back
– block containing word is brought into cache, then the desired word in the cache is overwritten with the new information.
Replacement Algorithms
•Difficult to determine which blocks to kick out
•Least Recently Used (LRU) block
•The cache controller tracks references to all blocks as computation proceeds.
•Increase / clear track counters when a hit/miss occurs
•For Associative & Set-Associative Cache
Which location should be emptied when the cache is full and a miss occurs?
Write Miss
•If addressed word is not in cache for a write operation, write miss occurs.
•write-through
– information is written directly into main memory.
•Write-back
– block containing word is brought into cache, then the desired word in the cache is overwritten with the new information.
Mapping Function
•Direct
Mapping
•Associative
Mapping
•Set-Associative
Mapping
Direct Mapping
•Block j of the main memory maps to j
modulo 128 of the cache.
•More than one memory block is mapped
onto the same position in the cache.
•Resolve the contention by allowing new
block to replace the old block
•Memory address is divided into three
fields:
- Low order 4 bits determine one of the 16 words in a block.
- When a new block is brought into the cache, the the next
7 bits determine which cache block this new block is placed in.
- High order 5 bits determine which of the possible 32 blocks is
currently present in the cache. These are tag bits.
•Simple to implement but not very flexible
Associative Mapping
•Main memory block can be placed into any
cache position.
•Memory address is divided into two
fields:
- Low order 4 bits identify the word within a block.
- High order 12 bits or tag bits identify a memory block when it is
resident in the cache.
•Flexible, and uses cache space
efficiently.
•Replacement algorithms can be used to
replace an existing block in the
cache when the cache is full.
•Cost is higher than direct-mapped cache
Set-Associative Mapping
•Blocks
of cache are grouped into sets.
•Mapping
function allows a block of the main memory to reside in any
block of a specific set.
•Memory
address is divided into three fields:
- 6 bit field determines the set number.
- High order 6 bit fields are compared to the tag
fields of the two blocks in a set.
•Set-associative
mapping combination of direct and associative mapping.
•Number
of blocks per set is a design parameter.
- One extreme is to have all the blocks in one set,
requiring no set bits (fully associative mapping).
- Other extreme is to have one block per set, is
the same as direct mapping.
Replacement Algorithms
•Difficult to determine which blocks to kick out
•Least Recently Used (LRU) block
•The cache controller tracks references to all blocks as computation proceeds.
•Increase / clear track counters when a hit/miss occurs
•For Associative & Set-Associative Cache
Which location should be emptied when the cache is full and a miss occurs?
–
First In First Out (FIFO)
–
Least Recently Used (LRU)
•Distinguish an Empty
location from a Full one
FIFO- Replacement Algorithms
LRU-Replacement Algorithms
–
Valid Bit
FIFO- Replacement Algorithms
LRU-Replacement Algorithms
Consecutive words in a modules
Figure : Consecutive words in a
modules
Consecutive
words are placed in a module.
High-order
k bits of a memory address determine the module.
Low-order
m bits of a memory address determine the word within a module.
When
a block of words is transferred from main memory to cache, only one module is busy at a time.
Figure : Consecutive words in
consecutive modules
Consecutive words in Consecutive modules
• Consecutive words are located in
consecutive modules.
• Consecutive addresses can be located in
consecutive modules.
• While transferring a block of data,
several memory modules can be kept busy at the same time.
Virtual
Memory
Assume that program and data are composed of fixed-length units called pages.
A page consists of a block of words that occupy contiguous locations in the main memory.
Page is a basic unit of information that is transferred between
secondary storage and main memory.
Size of a page commonly ranges from 2K to 16K bytes.
Pages should not be too small, because the access time
of a secondary storage device is much larger than the main
memory.
Pages should not be too large, else a large portion of the page may not be used, and it will occupy valuable space in the main memory.
Each virtual or logical address generated by a processor is interpreted as a virtual page number (high-order bits) plus an offset (low-order bits) that specifies the location of a particular byte within that page.
Information about the main memory location of each page is kept in the page table.
-Main memory address where the page is stored.
-Current status of the page.
Area of the main memory that can hold a page is called as page frame.
Starting address of the page table is kept in a page table base register.
Techniques that automatically move
program and data blocks into the physical
main memory when they are required for execution is called the Virtual
Memory
The
binary address that the processor issues either for instruction or data are
called the virtual / Logical address.
When
the desired data are in the main memory ,these data are fetched / accessed immediately.
If
the data are not in the main memory, the MMU causes the Operating system to bring the data into memory from
the disk.
Assume that program and data are composed of fixed-length units called pages.
A page consists of a block of words that occupy contiguous locations in the main memory.
Page is a basic unit of information that is transferred between
secondary storage and main memory.
Size of a page commonly ranges from 2K to 16K bytes.
Pages should not be too small, because the access time
of a secondary storage device is much larger than the main
memory.
Pages should not be too large, else a large portion of the page may not be used, and it will occupy valuable space in the main memory.
Each virtual or logical address generated by a processor is interpreted as a virtual page number (high-order bits) plus an offset (low-order bits) that specifies the location of a particular byte within that page.
Information about the main memory location of each page is kept in the page table.
-Main memory address where the page is stored.
-Current status of the page.
Area of the main memory that can hold a page is called as page frame.
Starting address of the page table is kept in a page table base register.
Virtual
address from processor
• High-order
bits of the virtual address generated by the processor select the virtual page.
•These
bits are compared to the virtual page numbers in the TLB.
•If
there is a match, a hit occurs and the corresponding address of the pageframe is
read.
•If
there is no match, a miss occurs and the page table within the main memory must
be consulted.
•Set-associative
mapped TLBs are found in commercial processors.
No comments