O3 Cpu Commit

Posted Jun 4, 2021

By Jaehyuk Lee 122 min read

Memory read and write of the O3 CPU

Memory read

        
      
LSQUnit<Impl>::read(LSQRequest *req, int load_idx)
{
   LQEntry& load_req = loadQueue[load_idx];
   const DynInstPtr& load_inst = load_req.instruction();

   load_req.setRequest(req);
   assert(load_inst);

   assert(!load_inst->isExecuted());

   // Make sure this isn't a strictly ordered load
   // A bit of a hackish way to get strictly ordered accesses to work
   // only if they're at the head of the LSQ and are ready to commit
   // (at the head of the ROB too).

   if (req->mainRequest()->isStrictlyOrdered() &&
       (load_idx != loadQueue.head() || !load_inst->isAtCommit())) {
       // Tell IQ/mem dep unit that this instruction will need to be
       // rescheduled eventually
       iewStage->rescheduleMemInst(load_inst);
       load_inst->clearIssued();
       load_inst->effAddrValid(false);
       ++lsqRescheduledLoads;
       DPRINTF(LSQUnit, "Strictly ordered load [sn:%lli] PC %s\n",
               load_inst->seqNum, load_inst->pcState());

       // Must delete request now that it wasn't handed off to
       // memory.  This is quite ugly.  @todo: Figure out the proper
       // place to really handle request deletes.
       load_req.setRequest(nullptr);
       req->discard();
       return std::make_shared<GenericISA::M5PanicFault>(
           "Strictly ordered load [sn:%llx] PC %s\n",
           load_inst->seqNum, load_inst->pcState());
   }

   DPRINTF(LSQUnit, "Read called, load idx: %i, store idx: %i, "
           "storeHead: %i addr: %#x%s\n",
           load_idx - 1, load_inst->sqIt._idx, storeQueue.head() - 1,
           req->mainRequest()->getPaddr(), req->isSplit() ? " split" : "");

   if (req->mainRequest()->isLLSC()) {
       // Disable recording the result temporarily.  Writing to misc
       // regs normally updates the result, but this is not the
       // desired behavior when handling store conditionals.
       load_inst->recordResult(false);
       TheISA::handleLockedRead(load_inst.get(), req->mainRequest());
       load_inst->recordResult(true);
   }

   if (req->mainRequest()->isMmappedIpr()) {
       assert(!load_inst->memData);
       load_inst->memData = new uint8_t[MaxDataBytes];

       ThreadContext *thread = cpu->tcBase(lsqID);
       PacketPtr main_pkt = new Packet(req->mainRequest(), MemCmd::ReadReq);

       main_pkt->dataStatic(load_inst->memData);

       Cycles delay = req->handleIprRead(thread, main_pkt);

       WritebackEvent *wb = new WritebackEvent(load_inst, main_pkt, this);
       cpu->schedule(wb, cpu->clockEdge(delay));
       return NoFault;
   }

   // Check the SQ for any previous stores that might lead to forwarding
......
   // If there's no forwarding case, then go access memory
   DPRINTF(LSQUnit, "Doing memory access for inst [sn:%lli] PC %s\n",
           load_inst->seqNum, load_inst->pcState());

   // Allocate memory if this is the first time a load is issued.
   if (!load_inst->memData) {
       load_inst->memData = new uint8_t[req->mainRequest()->getSize()];
   }

   // For now, load throughput is constrained by the number of
   // load FUs only, and loads do not consume a cache port (only
   // stores do).
   // @todo We should account for cache port contention
   // and arbitrate between loads and stores.

   // if we the cache is not blocked, do cache access
   if (req->senderState() == nullptr) {
       LQSenderState *state = new LQSenderState(
               loadQueue.getIterator(load_idx));
       state->isLoad = true;
       state->inst = load_inst;
       state->isSplit = req->isSplit();
       req->senderState(state);
   }
   req->buildPackets();
   req->sendPacketToCache();
   if (!req->isSent())
       iewStage->blockMemInst(load_inst);

   return NoFault;
}

If the current instruction has not initiated the memory load operation before, then it allocates a memory and make the memData of the instruction points to this allocated memory to store the actual data read from cache or memory. After that, it generates senderState object if it doesn’t have. The state object contains information such as whether this request is load or store, the instruction that initiated the memory operation, and information about whether the request is a split or single access. After the senderState is generated, it is stored in the request object. Note that here the req is the object of LSQRequest. Remember that the req is the same object used for the TLB resolution. Because this object contains all information required for resolving one memory operation including TLB, cache ports, etc, by invoking proper function, CPU can handle read/write operations.

Build packet

        
      
template<class Impl>
void
LSQ<Impl>::SingleDataRequest::buildPackets()
{  
   assert(_senderState);
   /* Retries do not create new packets. */
   if (_packets.size() == 0) {
       _packets.push_back(
               isLoad()
                   ?  Packet::createRead(request())
                   :  Packet::createWrite(request()));
       _packets.back()->dataStatic(_inst->memData);
       _packets.back()->senderState = _senderState;
   }
   assert(_packets.size() == 1);
}

        
      
/**
* A Packet is used to encapsulate a transfer between two objects in
* the memory system (e.g., the L1 and L2 cache).  (In contrast, a
* single Request travels all the way from the requestor to the
* ultimate destination and back, possibly being conveyed by several
* different Packets along the way.)
*/
class Packet : public Printable
{
 public:
   typedef uint32_t FlagsType;
   typedef gem5::Flags<FlagsType> Flags;
......
 private:
  /**
   * A pointer to the data being transferred. It can be different
   * sizes at each level of the hierarchy so it belongs to the
   * packet, not request. This may or may not be populated when a
   * responder receives the packet. If not populated memory should
   * be allocated.
   */
   PacketDataPtr data;
......
   /**
    * Constructor. Note that a Request object must be constructed
    * first, but the Requests's physical address and size fields need
    * not be valid. The command must be supplied.
    */
   Packet(const RequestPtr &_req, MemCmd _cmd)
       :  cmd(_cmd), id((PacketId)_req.get()), req(_req),
          data(nullptr), addr(0), _isSecure(false), size(0),
          _qosValue(0),
          htmReturnReason(HtmCacheFailure::NO_FAIL),
          htmTransactionUid(0),
          headerDelay(0), snoopDelay(0),
          payloadDelay(0), senderState(NULL)
   {
       flags.clear();
       if (req->hasPaddr()) {
           addr = req->getPaddr();
           flags.set(VALID_ADDR);
           _isSecure = req->isSecure();
       }

       /**
        * hardware transactional memory
        *
        * This is a bit of a hack!
        * Technically the address of a HTM command is set to zero
        * but is not valid. The reason that we pretend it's valid is
        * to void the getAddr() function from failing. It would be
        * cumbersome to add control flow in many places to check if the
        * packet represents a HTM command before calling getAddr().
        */
       if (req->isHTMCmd()) {
           flags.set(VALID_ADDR);
           assert(addr == 0x0);
       }
       if (req->hasSize()) {
           size = req->getSize();
           flags.set(VALID_SIZE);
       }
   }
......
   /**
    * Constructor-like methods that return Packets based on Request objects.
    * Fine-tune the MemCmd type if it's not a vanilla read or write.
    */
   static PacketPtr
   createRead(const RequestPtr &req)
   {
       return new Packet(req, makeReadCmd(req));
   }

   static PacketPtr
   createWrite(const RequestPtr &req)
   {
       return new Packet(req, makeWriteCmd(req));
   }

buildPackets function generates new packet that will be sent to the cache. The generated packet is maintained in the internal vector called _packets. Also, it sets the buffer allocated for storing the data, _inst->memData to internal data member field of the packet. Also, the senderState is stored.

        
      
   /**
    * A virtual base opaque structure used to hold state associated
    * with the packet (e.g., an MSHR), specific to a SimObject that
    * sees the packet. A pointer to this state is returned in the
    * packet's response so that the SimObject in question can quickly
    * look up the state needed to process it. A specific subclass
    * would be derived from this to carry state specific to a
    * particular sending device.
    *
    * As multiple SimObjects may add their SenderState throughout the
    * memory system, the SenderStates create a stack, where a
    * SimObject can add a new Senderstate, as long as the
    * predecessing SenderState is restored when the response comes
    * back. For this reason, the predecessor should always be
    * populated with the current SenderState of a packet before
    * modifying the senderState field in the request packet.
    */
   struct SenderState
   {
       SenderState* predecessor;
       SenderState() : predecessor(NULL) {}
       virtual ~SenderState() {}
   };

attribute of the packet

mem/packet.hh

        
      
   bool
   testCmdAttrib(MemCmd::Attribute attrib) const
   {
       return commandInfo[cmd].attributes[attrib] != 0;
   }

 public:

   bool isRead() const            { return testCmdAttrib(IsRead); }
   bool isWrite() const           { return testCmdAttrib(IsWrite); }
   bool isUpgrade() const         { return testCmdAttrib(IsUpgrade); }
   bool isRequest() const         { return testCmdAttrib(IsRequest); }
   bool isResponse() const        { return testCmdAttrib(IsResponse); }
   bool needsWritable() const     { return testCmdAttrib(NeedsWritable); }
   bool needsResponse() const     { return testCmdAttrib(NeedsResponse); }
   bool isInvalidate() const      { return testCmdAttrib(IsInvalidate); }
   bool isEviction() const        { return testCmdAttrib(IsEviction); }
   bool isClean() const           { return testCmdAttrib(IsClean); }
   bool fromCache() const         { return testCmdAttrib(FromCache); }

mem/packet.cc

        
      
const MemCmd::CommandInfo
MemCmd::commandInfo[] =
{
   /* InvalidCmd */
   { {}, InvalidCmd, "InvalidCmd" },
   /* ReadReq - Read issued by a non-caching agent such as a CPU or
    * device, with no restrictions on alignment. */
   { {IsRead, IsRequest, NeedsResponse}, ReadResp, "ReadReq" },
   /* ReadResp */
   { {IsRead, IsResponse, HasData}, InvalidCmd, "ReadResp" },
   /* ReadRespWithInvalidate */
   { {IsRead, IsResponse, HasData, IsInvalidate},
           InvalidCmd, "ReadRespWithInvalidate" },
   /* WriteReq */
   { {IsWrite, NeedsWritable, IsRequest, NeedsResponse, HasData},
           WriteResp, "WriteReq" },
   /* WriteResp */
   { {IsWrite, IsResponse}, InvalidCmd, "WriteResp" },
   /* WriteCompleteResp - The WriteCompleteResp command is needed
    * because in the GPU memory model we use a WriteResp to indicate
    * that a write has reached the cache controller so we can free
    * resources at the coalescer. Later, when the write succesfully
    * completes we send a WriteCompleteResp to the CU so its wait
    * counters can be updated. Wait counters in the CU is how memory
    * dependences are handled in the GPU ISA. */
   { {IsWrite, IsResponse}, InvalidCmd, "WriteCompleteResp" },

send packet to the cache

        
      
template<class Impl>
void
LSQ<Impl>::SingleDataRequest::sendPacketToCache()
{  
   assert(_numOutstandingPackets == 0);
   if (lsqUnit()->trySendPacket(isLoad(), _packets.at(0)))
       _numOutstandingPackets = 1;
}  

        
      
template <class Impl>
bool
LSQUnit<Impl>::trySendPacket(bool isLoad, PacketPtr data_pkt)
{  
   bool ret = true;
   bool cache_got_blocked = false;
       
   auto state = dynamic_cast<LSQSenderState*>(data_pkt->senderState);
               
   if (!lsq->cacheBlocked() &&
       lsq->cachePortAvailable(isLoad)) {
       if (!dcachePort->sendTimingReq(data_pkt)) {
           ret = false;
           cache_got_blocked = true;
       } 
   } else {
       ret = false;
   }   
   
   if (ret) {
       if (!isLoad) {
           isStoreBlocked = false;
       }
       lsq->cachePortBusy(isLoad);
       state->outstanding++;                
       state->request()->packetSent();
   } else {
       if (cache_got_blocked) {
           lsq->cacheBlocked(true);
           ++lsqCacheBlocked;
       }
       if (!isLoad) {
           assert(state->request() == storeWBIt->request());
           isStoreBlocked = true;
       }
       state->request()->packetNotSent();
   }
   return ret;
}

This packet will be sent to the cache through the cache port connected to the LSQ. It first checks whether the cache is currently blocked. If it is not blocked and there are available read port for the cache, then it sends the request packet through the dcachePort. It can initiate memory access by sending request packet through a sendTimingReq method. Because CPU goes through the data cache before touching the physical memory, the sendTimingReq is invoked on the DcachePort.

gem5/src/mem/port.hh

        
      
inline bool
MasterPort::sendTimingReq(PacketPtr pkt)
{
   return TimingRequestProtocol::sendReq(_slavePort, pkt);
}

mem/protocol/timing.cc

        
      
/* The request protocol. */

bool
TimingRequestProtocol::sendReq(TimingResponseProtocol *peer, PacketPtr pkt)
{
   assert(pkt->isRequest());
   return peer->recvTimingReq(pkt);
}

The sendTimingReq function is very simple. Just invoke the recvTimingReq function of the peer connected to the dcachePort as a slave. Because the cache unit is connected to the dcachePort on the other side of the CPU, we will take a look at the recvTimingReq implementation of the cache unit.

Cache, Cache, Cache!

recvTimingReq of the BaseCache: how to process the cache access?

        
      
bool
BaseCache::CpuSidePort::recvTimingReq(PacketPtr pkt)
{
   assert(pkt->isRequest());

   if (cache->system->bypassCaches()) {
       // Just forward the packet if caches are disabled.
       // @todo This should really enqueue the packet rather
       GEM5_VAR_USED bool success = cache->memSidePort.sendTimingReq(pkt);
       assert(success);
       return true;
   } else if (tryTiming(pkt)) {
       cache->recvTimingReq(pkt);
       return true;
   }
   return false;
}

First of all, the cache port connected to the CPU side will be in charge of handling timing request generated from CPU side. Because the BaseCache contains dedicated port for communicating with the CPU side, called CpuSidePort, its recvTimingReq function will be invoked. However, the main cache operations are done by the BaseCache’s recvTimingReq

        
      
void
BaseCache::recvTimingReq(PacketPtr pkt)
{   
   // anything that is merely forwarded pays for the forward latency and
   // the delay provided by the crossbar
   Tick forward_time = clockEdge(forwardLatency) + pkt->headerDelay;
   
   Cycles lat;
   CacheBlk *blk = nullptr;
   bool satisfied = false;
   {   
       PacketList writebacks;
       // Note that lat is passed by reference here. The function
       // access() will set the lat value.
       satisfied = access(pkt, blk, lat, writebacks);
       
       // After the evicted blocks are selected, they must be forwarded
       // to the write buffer to ensure they logically precede anything
       // happening below
       doWritebacks(writebacks, clockEdge(lat + forwardLatency));
   }
   

Because the recvTimingReq is pretty complex and long, I will explain important parts one by one. First of all, it invokes the access function to access the cache entry if the data mapped to the request address exists in the cache. After that, it invokes doWritebacks function to write backs evicted entries if exist. Btw, why the access generates victim entry and write back is required? I will show you the answer soon.

access function, another long journey in the midst of recvTimingReq

Unfortunately, the access function is more complex function than the recvTimingReq cause it emulates actual cache accesses in the GEM5 cache. Let’s take a look at its implementation one by one.

access1: check if the cache block exist in current cache

        
      
bool
BaseCache::access(PacketPtr pkt, CacheBlk *&blk, Cycles &lat,
                 PacketList &writebacks)
{
   // sanity check
   assert(pkt->isRequest());

   chatty_assert(!(isReadOnly && pkt->isWrite()),
                 "Should never see a write in a read-only cache %s\n",
                 name());

   // Access block in the tags
   Cycles tag_latency(0);
   blk = tags->accessBlock(pkt, tag_latency);

   DPRINTF(Cache, "%s for %s %s\n", __func__, pkt->print(),
           blk ? "hit " + blk->print() : "miss");


The first job done by the access function is retrieving the CacheBlk associated with current request address. Because the tags member field manages all CacheBlk of the cache, it invokes the accessBlock function of the tags.

        
      
   /**
    * Access block and update replacement data. May not succeed, in which case
    * nullptr is returned. This has all the implications of a cache access and
    * should only be used as such. Returns the tag lookup latency as a side
    * effect.
    *
    * @param pkt The packet holding the address to find.
    * @param lat The latency of the tag lookup.
    * @return Pointer to the cache block if found.
    */
   CacheBlk* accessBlock(const PacketPtr pkt, Cycles &lat) override
   {
       CacheBlk *blk = findBlock(pkt->getAddr(), pkt->isSecure());

       // Access all tags in parallel, hence one in each way.  The data side
       // either accesses all blocks in parallel, or one block sequentially on
       // a hit.  Sequential access with a miss doesn't access data.
       stats.tagAccesses += allocAssoc;
       if (sequentialAccess) {
           if (blk != nullptr) {
               stats.dataAccesses += 1;
           }
       } else {
           stats.dataAccesses += allocAssoc;
       }

       // If a cache hit
       if (blk != nullptr) {
           // Update number of references to accessed block
           blk->increaseRefCount();

           // Update replacement data of accessed block
           replacementPolicy->touch(blk->replacementData, pkt);
       }

       // The tag lookup latency is the same for a hit or a miss
       lat = lookupLatency;

       return blk;
   }

        
      
CacheBlk*
BaseTags::findBlock(Addr addr, bool is_secure) const
{
   // Extract block tag
   Addr tag = extractTag(addr);

   // Find possible entries that may contain the given address
   const std::vector<ReplaceableEntry*> entries =
       indexingPolicy->getPossibleEntries(addr);

   // Search for block
   for (const auto& location : entries) {
       CacheBlk* blk = static_cast<CacheBlk*>(location);
       if (blk->matchTag(tag, is_secure)) {
           return blk;
       }
   }

   // Did not find block
   return nullptr;
}

Because the CacheBlk is associated with one address based on the Tag value, by checking the tag value of way entries in one set mapped to current request’s address, it can find whether the cache already contains the cache block mapped to current request address.

Also, note that it can return nullptr when there is no cache hit. Therefore, by checking the returned CacheBlk as a result of the findBlock function, it can distinguish cache hit and miss. When the cache hit happens, it invokes touch function of the replacementPolicy to update the replacement policy associated with current CacheBlk.

access2: handling cache maintenance packet

Let’s go back to the access function. After the accessBlock function returns, it checks types of the packet.

        
      
   if (pkt->req->isCacheMaintenance()) {
       // A cache maintenance operation is always forwarded to the
       // memory below even if the block is found in dirty state.

       // We defer any changes to the state of the block until we
       // create and mark as in service the mshr for the downstream
       // packet.

       // Calculate access latency on top of when the packet arrives. This
       // takes into account the bus delay.
       lat = calculateTagOnlyLatency(pkt->headerDelay, tag_latency);

       return false;
   }

```cpp
   /**
    * Accessor functions to determine whether this request is part of
    * a cache maintenance operation. At the moment three operations
    * are supported:

    * 1) A cache clean operation updates all copies of a memory
    * location to the point of reference,
    * 2) A cache invalidate operation invalidates all copies of the
    * specified block in the memory above the point of reference,
    * 3) A clean and invalidate operation is a combination of the two
    * operations.
    * @{ */
   bool isCacheClean() const { return _flags.isSet(CLEAN); }
   bool isCacheInvalidate() const { return _flags.isSet(INVALIDATE); }
   bool isCacheMaintenance() const { return _flags.isSet(CLEAN|INVALIDATE); }
   /** @} */

Currently, GEM5 provide three different requests for cache maintenance: cache clean, cache invalidate, and clean and invalidate. Here is a good definition about invalidate and clean event in general.

Invalidate simply marks a cache line as “invalid”, meaning you won’t hit upon. Clean causes the contents of the cache line to be written back to memory (or the next level of cache), but only if the cache line is “dirty”. That is, the cache line holds the latest copy of that memory. Clean & Invalidate, as the name suggests, does both. Dirty lines normally get back to memory through evictions. When the line is selected to be evicted, there is a check to see if it’s dirty. If yes, it gets written back to memory. Cleaning is way to force this to happen at a particular time. For example, because something else is going to read the buffer. In theory, if you invalidated a dirty line you could loose data. As an invalid line won’t get written back to memory automatically through eviction. In practice many cores will treat Invalidate as Clean&Invalidate - but you shouldn’t rely on that. If the line is potentially dirty, and you care about the data, you should use Clean&Invalidate rather than Invalidate.

Because the cache maintenance request is related with cache flushing and coherency, it should be specially handled by the cache unit. When the packet is sent to the cache for its maintenance it returns immediately from the access function and set the satisfied variable as false, which indicates the miss event happens.

access3: handling eviction request packet

        
      
   if (pkt->isEviction()) {
       // We check for presence of block in above caches before issuing
       // Writeback or CleanEvict to write buffer. Therefore the only
       // possible cases can be of a CleanEvict packet coming from above
       // encountering a Writeback generated in this cache peer cache and
       // waiting in the write buffer. Cases of upper level peer caches
       // generating CleanEvict and Writeback or simply CleanEvict and
       // CleanEvict almost simultaneously will be caught by snoops sent out
       // by crossbar.
       WriteQueueEntry *wb_entry = writeBuffer.findMatch(pkt->getAddr(),
                                                         pkt->isSecure());
       if (wb_entry) {
           assert(wb_entry->getNumTargets() == 1);
           PacketPtr wbPkt = wb_entry->getTarget()->pkt;
           assert(wbPkt->isWriteback());

           if (pkt->isCleanEviction()) {
               // The CleanEvict and WritebackClean snoops into other
               // peer caches of the same level while traversing the
               // crossbar. If a copy of the block is found, the
               // packet is deleted in the crossbar. Hence, none of
               // the other upper level caches connected to this
               // cache have the block, so we can clear the
               // BLOCK_CACHED flag in the Writeback if set and
               // discard the CleanEvict by returning true.
               wbPkt->clearBlockCached();

               // A clean evict does not need to access the data array
               lat = calculateTagOnlyLatency(pkt->headerDelay, tag_latency);

               return true;
           } else {
               assert(pkt->cmd == MemCmd::WritebackDirty);
               // Dirty writeback from above trumps our clean
               // writeback... discard here
               // Note: markInService will remove entry from writeback buffer.
               markInService(wb_entry);
               delete wbPkt;
           }
       }
   }

        
      
   { {IsWrite, IsRequest, IsEviction, HasData, FromCache},
           InvalidCmd, "WritebackDirty" },
   /* WritebackClean - This allows the upstream cache to writeback a
    * line to the downstream cache without it being considered
    * dirty. */
   { {IsWrite, IsRequest, IsEviction, HasData, FromCache},
           InvalidCmd, "WritebackClean" },
   /* CleanEvict */
   { {IsRequest, IsEviction, FromCache}, InvalidCmd, "CleanEvict" },

access4: handle writeback packets

        
      
   // The critical latency part of a write depends only on the tag access
   if (pkt->isWrite()) {
       lat = calculateTagOnlyLatency(pkt->headerDelay, tag_latency);
   }

   // Writeback handling is special case.  We can write the block into
   // the cache without having a writeable copy (or any copy at all).
   if (pkt->isWriteback()) {
       assert(blkSize == pkt->getSize());

       // we could get a clean writeback while we are having
       // outstanding accesses to a block, do the simple thing for
       // now and drop the clean writeback so that we do not upset
       // any ordering/decisions about ownership already taken
       if (pkt->cmd == MemCmd::WritebackClean &&
           mshrQueue.findMatch(pkt->getAddr(), pkt->isSecure())) {
           DPRINTF(Cache, "Clean writeback %#llx to block with MSHR, "
                   "dropping\n", pkt->getAddr());

           // A writeback searches for the block, then writes the data.
           // As the writeback is being dropped, the data is not touched,
           // and we just had to wait for the time to find a match in the
           // MSHR. As of now assume a mshr queue search takes as long as
           // a tag lookup for simplicity.
           return true;
       }

       const bool has_old_data = blk && blk->isValid();
       if (!blk) {
           // need to do a replacement
           blk = allocateBlock(pkt, writebacks);
           if (!blk) {
               // no replaceable block available: give up, fwd to next level.
               incMissCount(pkt);
               return false;
           }

           blk->setCoherenceBits(CacheBlk::ReadableBit);
       } else if (compressor) {
           // This is an overwrite to an existing block, therefore we need
           // to check for data expansion (i.e., block was compressed with
           // a smaller size, and now it doesn't fit the entry anymore).
           // If that is the case we might need to evict blocks.
           if (!updateCompressionData(blk, pkt->getConstPtr<uint64_t>(),
               writebacks)) {
               invalidateBlock(blk);
               return false;
           }
       }

       // only mark the block dirty if we got a writeback command,
       // and leave it as is for a clean writeback
       if (pkt->cmd == MemCmd::WritebackDirty) {
           // TODO: the coherent cache can assert that the dirty bit is set
           blk->setCoherenceBits(CacheBlk::DirtyBit);
       }
       // if the packet does not have sharers, it is passing
       // writable, and we got the writeback in Modified or Exclusive
       // state, if not we are in the Owned or Shared state
       if (!pkt->hasSharers()) {
           blk->setCoherenceBits(CacheBlk::WritableBit);
       }
       // nothing else to do; writeback doesn't expect response
       assert(!pkt->needsResponse());

       updateBlockData(blk, pkt, has_old_data);
       DPRINTF(Cache, "%s new state is %s\n", __func__, blk->print());
       incHitCount(pkt);

       // When the packet metadata arrives, the tag lookup will be done while
       // the payload is arriving. Then the block will be ready to access as
       // soon as the fill is done
       blk->setWhenReady(clockEdge(fillLatency) + pkt->headerDelay +
           std::max(cyclesToTicks(tag_latency), (uint64_t)pkt->payloadDelay));

       return true;
   } else if (pkt->cmd == MemCmd::CleanEvict) {

The GEM5 defines the condition for writeback as below.

        
      
   /**
    * A writeback is an eviction that carries data.
    */
   bool isWriteback() const       { return testCmdAttrib(IsEviction) &&
                                           testCmdAttrib(HasData); }

When the request packet sets IsEviction and HasData, it means that current request packet invoked the access function was the writeback request packet. Below code specified the commands that satisfy above condition.

        
      
   { {IsWrite, IsRequest, IsEviction, HasData, FromCache},
           InvalidCmd, "WritebackDirty" },
   /* WritebackClean - This allows the upstream cache to writeback a
    * line to the downstream cache without it being considered
    * dirty. */
   { {IsWrite, IsRequest, IsEviction, HasData, FromCache},
           InvalidCmd, "WritebackClean" },

When those conditions are met, the access function handles writeback packet. \XXX{need more explain for this case}

access5: handle CleanEvict and writeClean packets

        
      
   } else if (pkt->cmd == MemCmd::CleanEvict) {
       // A CleanEvict does not need to access the data array
       lat = calculateTagOnlyLatency(pkt->headerDelay, tag_latency);

       if (blk) {
           // Found the block in the tags, need to stop CleanEvict from
           // propagating further down the hierarchy. Returning true will
           // treat the CleanEvict like a satisfied write request and delete
           // it.
           return true;
       }
       // We didn't find the block here, propagate the CleanEvict further
       // down the memory hierarchy. Returning false will treat the CleanEvict
       // like a Writeback which could not find a replaceable block so has to
       // go to next level.
       return false;
   } else if (pkt->cmd == MemCmd::WriteClean) {
       // WriteClean handling is a special case. We can allocate a
       // block directly if it doesn't exist and we can update the
       // block immediately. The WriteClean transfers the ownership
       // of the block as well.
       assert(blkSize == pkt->getSize());

       const bool has_old_data = blk && blk->isValid();
       if (!blk) {
           if (pkt->writeThrough()) {
               // if this is a write through packet, we don't try to
               // allocate if the block is not present
               return false;
           } else {
               // a writeback that misses needs to allocate a new block
               blk = allocateBlock(pkt, writebacks);
               if (!blk) {
                   // no replaceable block available: give up, fwd to
                   // next level.
                   incMissCount(pkt);
                   return false;
               }

               blk->setCoherenceBits(CacheBlk::ReadableBit);
           }
       } else if (compressor) {
           // This is an overwrite to an existing block, therefore we need
           // to check for data expansion (i.e., block was compressed with
           // a smaller size, and now it doesn't fit the entry anymore).
           // If that is the case we might need to evict blocks.
           if (!updateCompressionData(blk, pkt->getConstPtr<uint64_t>(),
               writebacks)) {
               invalidateBlock(blk);
               return false;
           }
       }

       // at this point either this is a writeback or a write-through
       // write clean operation and the block is already in this
       // cache, we need to update the data and the block flags
       assert(blk);
       // TODO: the coherent cache can assert that the dirty bit is set
       if (!pkt->writeThrough()) {
           blk->setCoherenceBits(CacheBlk::DirtyBit);
       }
       // nothing else to do; writeback doesn't expect response
       assert(!pkt->needsResponse());

       updateBlockData(blk, pkt, has_old_data);
       DPRINTF(Cache, "%s new state is %s\n", __func__, blk->print());

       incHitCount(pkt);

       // When the packet metadata arrives, the tag lookup will be done while
       // the payload is arriving. Then the block will be ready to access as
       // soon as the fill is done
       blk->setWhenReady(clockEdge(fillLatency) + pkt->headerDelay +
           std::max(cyclesToTicks(tag_latency), (uint64_t)pkt->payloadDelay));

       // If this a write-through packet it will be sent to cache below
       return !pkt->writeThrough();

access6: handle normal read or write request to existing block with adequate properties

        
      
   } else if (blk && (pkt->needsWritable() ?
           blk->isSet(CacheBlk::WritableBit) :
           blk->isSet(CacheBlk::ReadableBit))) {
       // OK to satisfy access
       incHitCount(pkt);

       // Calculate access latency based on the need to access the data array
       if (pkt->isRead()) {
           lat = calculateAccessLatency(blk, pkt->headerDelay, tag_latency);

           // When a block is compressed, it must first be decompressed
           // before being read. This adds to the access latency.
           if (compressor) {
               lat += compressor->getDecompressionLatency(blk);
           }
       } else {
           lat = calculateTagOnlyLatency(pkt->headerDelay, tag_latency);
       }

       satisfyRequest(pkt, blk);
       maintainClusivity(pkt->fromCache(), blk);

       return true;
   }


To handle the read and write access to existing cache block, it first should check the properties of the existing cache block such as writable and readable bit. If those conditions of the cached block met requirement of the current request’s type such as read or write, then it can be handled in the above condition statement. Note that it returns true at the end because the request can be handled with the cached block, which means cache hit.

Also, it invokes the satisfyRequest function to \XXX{do what?} The satisfyRequest function is the virtual function of the BaseCache and implemented also by its children class Cache class. There are two main places that satisfyRequest is invoked, the access function and serviceMSHRTarget.

1
2

access7: other cases, mainly cache misses

        
      
   // Can't satisfy access normally... either no block (blk == nullptr)
   // or have block but need writable

   incMissCount(pkt);

   lat = calculateAccessLatency(blk, pkt->headerDelay, tag_latency);

   if (!blk && pkt->isLLSC() && pkt->isWrite()) {
       // complete miss on store conditional... just give up now
       pkt->req->setExtraData(0);
       return true;
   }

   return false;
}

This cases includes cache misses, write request to non-writable block, or read request to non-readable block, etc. When all conditions doesn’t match with the current’ request, then it should be handled by the rest of the recvTimingReq function, particularly cache miss handling. Note that it returns false.

Revisiting revTimingReq of the BaseCache to handle cache hit and miss

        
      
void
BaseCache::recvTimingReq(PacketPtr pkt)
 ......
   // Here we charge the headerDelay that takes into account the latencies
   // of the bus, if the packet comes from it.
   // The latency charged is just the value set by the access() function.
   // In case of a hit we are neglecting response latency.
   // In case of a miss we are neglecting forward latency.
   Tick request_time = clockEdge(lat);
   // Here we reset the timing of the packet.
   pkt->headerDelay = pkt->payloadDelay = 0;
   
   if (satisfied) {
       // notify before anything else as later handleTimingReqHit might turn
       // the packet in a response
       ppHit->notify(pkt);
       
       if (prefetcher && blk && blk->wasPrefetched()) {
           DPRINTF(Cache, "Hit on prefetch for addr %#x (%s)\n",
                   pkt->getAddr(), pkt->isSecure() ? "s" : "ns");
           blk->clearPrefetched();
       }
       
       handleTimingReqHit(pkt, blk, request_time);
   } else {
       handleTimingReqMiss(pkt, blk, forward_time, request_time);
       
       ppMiss->notify(pkt);
   }
   
   if (prefetcher) {
       // track time of availability of next prefetch, if any
       Tick next_pf_time = prefetcher->nextPrefetchReadyTime();
       if (next_pf_time != MaxTick) {
           schedMemSideSendEvent(next_pf_time);
       }
   }
}

After executing the access function that asks caches if the requested data exists in the cache, it returns value to indicate whether there was an item in the cache or not. The satisfied variable contains the return value of the access. Therefore, based on the satisfied condition, it should handle cache hit and miss event differently.

When the cache hit happens

        
      
void
BaseCache::handleTimingReqHit(PacketPtr pkt, CacheBlk *blk, Tick request_time)
{
   if (pkt->needsResponse()) {
       // These delays should have been consumed by now
       assert(pkt->headerDelay == 0);
       assert(pkt->payloadDelay == 0);

       pkt->makeTimingResponse();

       // In this case we are considering request_time that takes
       // into account the delay of the xbar, if any, and just
       // lat, neglecting responseLatency, modelling hit latency
       // just as the value of lat overriden by access(), which calls
       // the calculateAccessLatency() function.
       cpuSidePort.schedTimingResp(pkt, request_time);
   } else {
       DPRINTF(Cache, "%s satisfied %s, no response needed\n", __func__,
               pkt->print());

       // queue the packet for deletion, as the sending cache is
       // still relying on it; if the block is found in access(),
       // CleanEvict and Writeback messages will be deleted
       // here as well
       pendingDelete.reset(pkt);
   }
}

Based on the request type of the memory operation, it may or may not require response. Therefore, it first checks whether the packet requires response with the needsResponse method. When it requires response, it invokes schedTimingResp of the cpuSidePort.

        
   void schedTimingResp(PacketPtr pkt, Tick when)
   { respQueue.schedSendTiming(pkt, when); }

The schedTimingResp function is defined in the QueuedResponsePort class which is one of the ancestor class of the CpuSidePort class. Also, schedSendTiming is defined as the member function of the RespPacketQueue which is the type of the respQueue. The PacketQueue class defines the schedSendTiming method, and the RespPacketQueue inherits PacketQueue.

        
      
void
PacketQueue::schedSendTiming(PacketPtr pkt, Tick when)
{
   DPRINTF(PacketQueue, "%s for %s address %x size %d when %lu ord: %i\n",
           __func__, pkt->cmdString(), pkt->getAddr(), pkt->getSize(), when,
           forceOrder);

   // we can still send a packet before the end of this tick
   assert(when >= curTick());

   // express snoops should never be queued
   assert(!pkt->isExpressSnoop());

   // add a very basic sanity check on the port to ensure the
   // invisible buffer is not growing beyond reasonable limits
   if (!_disableSanityCheck && transmitList.size() > 128) {
       panic("Packet queue %s has grown beyond 128 packets\n",
             name());
   }

   // we should either have an outstanding retry, or a send event
   // scheduled, but there is an unfortunate corner case where the
   // x86 page-table walker and timing CPU send out a new request as
   // part of the receiving of a response (called by
   // PacketQueue::sendDeferredPacket), in which we end up calling
   // ourselves again before we had a chance to update waitingOnRetry
   // assert(waitingOnRetry || sendEvent.scheduled());

   // this belongs in the middle somewhere, so search from the end to
   // order by tick; however, if forceOrder is set, also make sure
   // not to re-order in front of some existing packet with the same
   // address
   auto it = transmitList.end();
   while (it != transmitList.begin()) {
       --it;
       if ((forceOrder && it->pkt->matchAddr(pkt)) || it->tick <= when) {
           // emplace inserts the element before the position pointed to by
           // the iterator, so advance it one step
           transmitList.emplace(++it, when, pkt);
           return;
       }
   }
   // either the packet list is empty or this has to be inserted
   // before every other packet
   transmitList.emplace_front(when, pkt);
   schedSendEvent(when);
}

transmitList maintains all the packets need to be sent to other end of the port

        
      
   /** A deferred packet, buffered to transmit later. */
   class DeferredPacket
   {
     public:
       Tick tick;      ///< The tick when the packet is ready to transmit
       PacketPtr pkt;  ///< Pointer to the packet to transmit
       DeferredPacket(Tick t, PacketPtr p)
           : tick(t), pkt(p)
       {}
   };

   typedef std::list<DeferredPacket> DeferredPacketList;

   /** A list of outgoing packets. */
   DeferredPacketList transmitList;


tranmitList contains all the deferrredPackets that are waiting to be sent. Therefore, it contains the packet itself and when should it be sent. Note that when which is the Tick is required because GEM5 is emulator not the hardware. Anyway the maintained packets will be sent when the schedSendEvent fires. Note that it is scheduled to be fired at when clock cycle through schedSendEvent function.

schedSendEvent function schedules event to handle the deferred packet

        
      
void
PacketQueue::schedSendEvent(Tick when)
{
   // if we are waiting on a retry just hold off
   if (waitingOnRetry) {
       DPRINTF(PacketQueue, "Not scheduling send as waiting for retry\n");
       assert(!sendEvent.scheduled());
       return;
   }

   if (when != MaxTick) {
       // we cannot go back in time, and to be consistent we stick to
       // one tick in the future
       when = std::max(when, curTick() + 1);
       // @todo Revisit the +1

       if (!sendEvent.scheduled()) {
           em.schedule(&sendEvent, when);
       } else if (when < sendEvent.when()) {
           // if the new time is earlier than when the event
           // currently is scheduled, move it forward
           em.reschedule(&sendEvent, when);
       }
   } else {
       // we get a MaxTick when there is no more to send, so if we're
       // draining, we may be done at this point
       if (drainState() == DrainState::Draining &&
           transmitList.empty() && !sendEvent.scheduled()) {

           DPRINTF(Drain, "PacketQueue done draining,"
                   "processing drain event\n");
           signalDrainDone();
       }
   }
}

The most important things done by the schedSendEvent is the scheduling event to make it fire at the exact time specified by the GEM5 emulator. As shown in Line 170-176, it first checks whether the sendEvent is already scheduled before. If there is no scheduled event, then it schedule the event with schedule function. Note that the em member field points to the BaseCache. Also, if there is already pre-scheduled event for the sendEvent and if the current event should be raised before the pre-scheduled one, then it reschedule the event. BTW, if there were events that should be handled later then newly scheduled event, how those events can be processed!? To understand the how the deferred packet will be processed and resolve question, let’s take a look at the function invoked when the scheduled event raises.

processSendEvent: event to handle deferred packet processing

        
      
PacketQueue::PacketQueue(EventManager& _em, const std::string& _label,
                        const std::string& _sendEventName,
                        bool force_order,
                        bool disable_sanity_check)
   : em(_em), sendEvent([this]{ processSendEvent(); }, _sendEventName),
     _disableSanityCheck(disable_sanity_check),
     forceOrder(force_order),
     label(_label), waitingOnRetry(false)
{
}
......
void 
PacketQueue::processSendEvent()
{
   assert(!waitingOnRetry);
   sendDeferredPacket();
}

I can easily find that the sendEvent is initialized with processSendEvent in the constructor of the PacketQueue. Therefore, when the sendEvent fires, it invokes the processSendEvent function. Note that it further invokes sendDeferredPacket function of the PacketQueue.

sendDeferredPacket handles deferred packet processing at right time

        
      
void 
PacketQueue::sendDeferredPacket()
{
   // sanity checks
   assert(!waitingOnRetry);
   assert(deferredPacketReady());

   DeferredPacket dp = transmitList.front();

   // take the packet of the list before sending it, as sending of
   // the packet in some cases causes a new packet to be enqueued
   // (most notaly when responding to the timing CPU, leading to a 
   // new request hitting in the L1 icache, leading to a new
   // response)
   transmitList.pop_front();

   // use the appropriate implementation of sendTiming based on the
   // type of queue
   waitingOnRetry = !sendTiming(dp.pkt);

   // if we succeeded and are not waiting for a retry, schedule the
   // next send 
   if (!waitingOnRetry) {
       schedSendEvent(deferredPacketReadyTime());
   } else {
       // put the packet back at the front of the list 
       transmitList.emplace_front(dp);
   }    
}

You might remember that the transmitList contains all the packet and when should it be fired. And because the sendDeferredPacket is the function that process the packet in the transmitList at the right time specified. Therefore, the sendDeferredPacket extracts the packet from the transmitList (line 197-204). After getting the packet to send, it invokes sendTiming function to actually send the packet to the unit that waits for the response. However, you can find that sendTiming function is not implemented on the PacketQueue, and implemented as a virtual function, which means it should invoke its child’s sendTiming. Remind that the schedTimingResp of the cpuSidePort makes us to all the way down to here. Also the respQueue used to schedule sendTiming event was the RespPacketQueue object. And the RespPacketQueue inherits PacketQueue, which means it has the sendTiming function.

        
      
bool
RespPacketQueue::sendTiming(PacketPtr pkt)
{
   return cpuSidePort.sendTimingResp(pkt);
}

Finally it invokes sendTimingResp function of the cpuSidePort to send packet to the CPU. Yeah… It is kind of a long detour to get to the sendTimingResp. The important reason of this complicated process for handling packets is because it wants to decouple the CpuSidePort from the managing response packets. After the cache generates the response packet, instead of directly invoking the sendTimingResp function of the cpuSidePort it let the PacketQueue handles all relevant operations to manage response packets. Anyway, after sendTimingResp is invoked, it returns the waitingOnRetry which indicates whether the CPU is currently not available for receiving the response packet from the cache. In that case, the waitingOnRetry field is set and should send the packet once again when the CPU send the retry message to the cache at some point.

        
      
   /**
    * Get the next packet ready time.
    */
   Tick deferredPacketReadyTime() const
   { return transmitList.empty() ? MaxTick : transmitList.front().tick; }

Now this is the time for answering previous question: after one packet is processed, if there are remaining packets need to be sent at some later point, what should we do? Yeah the deferredPacketReadyTime checks the transmitList and returns the tick if deferred packet still remains. This tick is passed to the schedSendEvent function, and will schedule the sendEvent. That’s it!

waitingOnRetry

\TODO{need to explain some particular details regarding waitingOnRetry}

When the cache miss happens

When the access function cannot return cache block associated with current request, the satisfied condition is set as false. Therefore, the handleTimingReqMiss function is executed to fetch cache block from the upper level cache or memory.

        
      
void
Cache::handleTimingReqMiss(PacketPtr pkt, CacheBlk *blk, Tick forward_time,
                          Tick request_time)
{
   if (pkt->req->isUncacheable()) {
       // ignore any existing MSHR if we are dealing with an
       // uncacheable request

       // should have flushed and have no valid block
       assert(!blk || !blk->isValid());

       stats.cmdStats(pkt).mshrUncacheable[pkt->req->requestorId()]++;

       if (pkt->isWrite()) {
           allocateWriteBuffer(pkt, forward_time);
       } else {
           assert(pkt->isRead());

           // uncacheable accesses always allocate a new MSHR

           // Here we are using forward_time, modelling the latency of
           // a miss (outbound) just as forwardLatency, neglecting the
           // lookupLatency component.
           allocateMissBuffer(pkt, forward_time);
       }

       return;
   }

   Addr blk_addr = pkt->getBlockAddr(blkSize);

   MSHR *mshr = mshrQueue.findMatch(blk_addr, pkt->isSecure());

   // Software prefetch handling:
   // To keep the core from waiting on data it won't look at
   // anyway, send back a response with dummy data. Miss handling
   // will continue asynchronously. Unfortunately, the core will
   // insist upon freeing original Packet/Request, so we have to
   // create a new pair with a different lifecycle. Note that this
   // processing happens before any MSHR munging on the behalf of
   // this request because this new Request will be the one stored
   // into the MSHRs, not the original.
   if (pkt->cmd.isSWPrefetch()) {
       assert(pkt->needsResponse());
       assert(pkt->req->hasPaddr());
       assert(!pkt->req->isUncacheable());

       // There's no reason to add a prefetch as an additional target
       // to an existing MSHR. If an outstanding request is already
       // in progress, there is nothing for the prefetch to do.
       // If this is the case, we don't even create a request at all.
       PacketPtr pf = nullptr;

       if (!mshr) {
           // copy the request and create a new SoftPFReq packet
           RequestPtr req = std::make_shared<Request>(pkt->req->getPaddr(),
                                                   pkt->req->getSize(),
                                                   pkt->req->getFlags(),
                                                   pkt->req->requestorId());
           pf = new Packet(req, pkt->cmd);
           pf->allocate();
           assert(pf->matchAddr(pkt));
           assert(pf->getSize() == pkt->getSize());
       }

       pkt->makeTimingResponse();

       // request_time is used here, taking into account lat and the delay
       // charged if the packet comes from the xbar.
       cpuSidePort.schedTimingResp(pkt, request_time);

       // If an outstanding request is in progress (we found an
       // MSHR) this is set to null
       pkt = pf;
   }

   BaseCache::handleTimingReqMiss(pkt, mshr, blk, forward_time, request_time);
}

When cache miss happens, the first thing to do is searching the MSHR entry. The findMatch function of the mshrQueue containing all the previous MSHR entries will be invoked to search if there is MSHR entry associated with the current request.

Whether it has matching MSHR entry or not, it invokes the handleTimingReqMiss of the BaseCache to further handles the cache miss. Briefly speaking, this function handles cache miss based on whether the MSHR entry exists or not. Because this function is quite long, I will split it in two parts: when MSHR exists and when MSHR doesn’t existing.

When MSHR does exist

        
      
void
BaseCache::handleTimingReqMiss(PacketPtr pkt, MSHR *mshr, CacheBlk *blk,
                              Tick forward_time, Tick request_time)
{
   if (writeAllocator &&
       pkt && pkt->isWrite() && !pkt->req->isUncacheable()) {
       writeAllocator->updateMode(pkt->getAddr(), pkt->getSize(),
                                  pkt->getBlockAddr(blkSize));
   }

   if (mshr) {
       /// MSHR hit
       /// @note writebacks will be checked in getNextMSHR()
       /// for any conflicting requests to the same block
       
       //@todo remove hw_pf here
       
       // Coalesce unless it was a software prefetch (see above).
       if (pkt) {
           assert(!pkt->isWriteback());
           // CleanEvicts corresponding to blocks which have
           // outstanding requests in MSHRs are simply sunk here
           if (pkt->cmd == MemCmd::CleanEvict) {
               pendingDelete.reset(pkt);
           } else if (pkt->cmd == MemCmd::WriteClean) {
               // A WriteClean should never coalesce with any
               // outstanding cache maintenance requests.
               
               // We use forward_time here because there is an
               // uncached memory write, forwarded to WriteBuffer.
               allocateWriteBuffer(pkt, forward_time);
           } else {
               DPRINTF(Cache, "%s coalescing MSHR for %s\n", __func__,
                       pkt->print());
               
               assert(pkt->req->requestorId() < system->maxRequestors());
               stats.cmdStats(pkt).mshrHits[pkt->req->requestorId()]++;
               
               // We use forward_time here because it is the same
               // considering new targets. We have multiple
               // requests for the same address here. It
               // specifies the latency to allocate an internal
               // buffer and to schedule an event to the queued
               // port and also takes into account the additional
               // delay of the xbar.
               mshr->allocateTarget(pkt, forward_time, order++,
                                    allocOnFill(pkt->cmd));
               if (mshr->getNumTargets() == numTarget) {
                   noTargetMSHR = mshr;
                   setBlocked(Blocked_NoTargets);
                   // need to be careful with this... if this mshr isn't
                   // ready yet (i.e. time > curTick()), we don't want to
                   // move it ahead of mshrs that are ready
                   // mshrQueue.moveToFront(mshr);
               }
           }
       }

You have to understand that one MSHR entry can track multiple memory requests associated with the address handled by the particular MSHR entry. Therefore, the first job needs to be done is registering the missed request to the MSHR entry as its target. Based on the type of the memory request, it might not add the missed request as the targets of the MSHR entry. However, in most of the cases, when the L1 cache miss happens, it will be added to the found MSHR entry by invoking allocateTarget function of the MSHR entry.

allocateTarget associates the missed requests to the found MSHR entry

        
      
/*          
* Adds a target to an MSHR
*/         
void        
MSHR::allocateTarget(PacketPtr pkt, Tick whenReady, Counter _order,
                    bool alloc_on_fill)
{           
   // assume we'd never issue a prefetch when we've got an
   // outstanding miss
   assert(pkt->cmd != MemCmd::HardPFReq);
               
   // if there's a request already in service for this MSHR, we will
   // have to defer the new target until after the response if any of
   // the following are true:
   // - there are other targets already deferred
   // - there's a pending invalidate to be applied after the response
   //   comes back (but before this target is processed)
   // - the MSHR's first (and only) non-deferred target is a cache
   //   maintenance packet
   // - the new target is a cache maintenance packet (this is probably
   //   overly conservative but certainly safe)
   // - this target requires a writable block and either we're not
   //   getting a writable block back or we have already snooped
   //   another read request that will downgrade our writable block
   //   to non-writable (Shared or Owned)
   PacketPtr tgt_pkt = targets.front().pkt;
   if (pkt->req->isCacheMaintenance() ||
       tgt_pkt->req->isCacheMaintenance() ||
       !deferredTargets.empty() ||
       (inService &&
        (hasPostInvalidate() ||
         (pkt->needsWritable() &&
          (!isPendingModified() || hasPostDowngrade() || isForward))))) {
       // need to put on deferred list
       if (inService && hasPostInvalidate())
           replaceUpgrade(pkt);
       deferredTargets.add(pkt, whenReady, _order, Target::FromCPU, true,
                           alloc_on_fill);
   } else {
       // No request outstanding, or still OK to append to
       // outstanding request: append to regular target list.  Only
       // mark pending if current request hasn't been issued yet
       // (isn't in service).
       targets.add(pkt, whenReady, _order, Target::FromCPU, !inService,
                   alloc_on_fill);
   }

   DPRINTF(MSHR, "After target allocation: %s", print());
}

The basic functionality of the allocateTarget is adding the missed memory request to one particular MSHR entries’ target list. Because MSHR collects every memory accesses targeting specific address and maintains them as its targets, this function must associates the missed packet to proper MSHR entry. Also, based on the current condition of the MSHR and pending requests associated with that MSHR entry, the new packet can be added to either deferredTargets and targets. Because they are all TargetList objects, let’s take a look at it first.

Target and TargetList

The TargetList is the expanded vector class with Target type. Because one MSHR should record all the memory request associated with that entry, the TargetList vector stores all the missed request and associated information together represented as a Target type.

        
      
   class Target : public QueueEntry::Target
   {   
     public:
       
       enum Source
       {
           FromCPU,
           FromSnoop,
           FromPrefetcher
       };

       const Source source;  //!< Request from cpu, memory, or prefetcher?

       /**
        * We use this flag to track whether we have cleared the
        * downstreamPending flag for the MSHR of the cache above
        * where this packet originates from and guard noninitial
        * attempts to clear it.
        *
        * The flag markedPending needs to be updated when the
        * TargetList is in service which can be:
        * 1) during the Target instantiation if the MSHR is in
        * service and the target is not deferred,
        * 2) when the MSHR becomes in service if the target is not
        * deferred,
        * 3) or when the TargetList is promoted (deferredTargets ->
        * targets).
        */
       bool markedPending;

       const bool allocOnFill;   //!< Should the response servicing this
                                 //!< target list allocate in the cache?

       Target(PacketPtr _pkt, Tick _readyTime, Counter _order,
              Source _source, bool _markedPending, bool alloc_on_fill)
           : QueueEntry::Target(_pkt, _readyTime, _order), source(_source),
             markedPending(_markedPending), allocOnFill(alloc_on_fill)
       {}
   };

   class TargetList : public std::list<Target>, public Named
   {

When no MSHR is present

        
      
   } else {
       // no MSHR
       assert(pkt->req->requestorId() < system->maxRequestors());
       stats.cmdStats(pkt).mshrMisses[pkt->req->requestorId()]++;
       if (prefetcher && pkt->isDemand())
           prefetcher->incrDemandMhsrMisses();

       if (pkt->isEviction() || pkt->cmd == MemCmd::WriteClean) {
           // We use forward_time here because there is an
           // writeback or writeclean, forwarded to WriteBuffer.
           allocateWriteBuffer(pkt, forward_time);
       } else {
           if (blk && blk->isValid()) {
               // If we have a write miss to a valid block, we
               // need to mark the block non-readable.  Otherwise
               // if we allow reads while there's an outstanding
               // write miss, the read could return stale data
               // out of the cache block... a more aggressive
               // system could detect the overlap (if any) and
               // forward data out of the MSHRs, but we don't do
               // that yet.  Note that we do need to leave the
               // block valid so that it stays in the cache, in
               // case we get an upgrade response (and hence no
               // new data) when the write miss completes.
               // As long as CPUs do proper store/load forwarding
               // internally, and have a sufficiently weak memory
               // model, this is probably unnecessary, but at some
               // point it must have seemed like we needed it...
               assert((pkt->needsWritable() &&
                   !blk->isSet(CacheBlk::WritableBit)) ||
                   pkt->req->isCacheMaintenance());
               blk->clearCoherenceBits(CacheBlk::ReadableBit);
           }
           // Here we are using forward_time, modelling the latency of
           // a miss (outbound) just as forwardLatency, neglecting the
           // lookupLatency component.
           allocateMissBuffer(pkt, forward_time);
       }
   }
}

It first checks whether the current memory request is Eviction request. Note that cache miss can happen either because of the read and write operation. When it already has a valid block, but the cache access returns miss, it means that the block exists but not writable. In that case, it first set the selected block as non-readable (line 339) because the data should not be read until the write miss is resolved through the XBar. To handle the write miss request, it invokes allocateMissBuffer function.

allocateMissBuffer: allocate MSHR entry for the write miss event

        
      
   MSHR *allocateMissBuffer(PacketPtr pkt, Tick time, bool sched_send = true)
   {
       MSHR *mshr = mshrQueue.allocate(pkt->getBlockAddr(blkSize), blkSize,
                                       pkt, time, order++,
                                       allocOnFill(pkt->cmd));

       if (mshrQueue.isFull()) {
           setBlocked((BlockedCause)MSHRQueue_MSHRs);
       }

       if (sched_send) {
           // schedule the send
           schedMemSideSendEvent(time);
       }

       return mshr;
   }

When there is no MSHR entry associated with current request, the first priority job is allocating new MSHR entry for this memory request and further memory requests. mshrQueue maintains all MSHR entries and provide allocate interface that adds new MSHR entry to the queue. After that, because the allocateMissBuffer by default set sched_send parameter, it invokes schedMemSideSendEvent to let the lower level cache or memory to fetch data. Let’s take a look at how the MSHR entry is allocated and processed by the schedMemSideSendEvent later.

        
      
MSHR *
MSHRQueue::allocate(Addr blk_addr, unsigned blk_size, PacketPtr pkt,
                   Tick when_ready, Counter order, bool alloc_on_fill)
{
   assert(!freeList.empty());
   MSHR *mshr = freeList.front();
   assert(mshr->getNumTargets() == 0);
   freeList.pop_front();

   DPRINTF(MSHR, "Allocating new MSHR. Number in use will be %lu/%lu\n",
           allocatedList.size() + 1, numEntries);

   mshr->allocate(blk_addr, blk_size, pkt, when_ready, order, alloc_on_fill);
   mshr->allocIter = allocatedList.insert(allocatedList.end(), mshr);
   mshr->readyIter = addToReadyList(mshr);

   allocated += 1;
   return mshr;
}

The MSHRQueue manages entire MSHR entries in the system. Also, the MSHRQueue is the child class of the Queue class. Therefore, to understand how each MSHR entry is allocated, we should take a look at the methods and fields implemented in the Queue class. Note that the Queue is template class so that it can manage any type of queue entries. Each Queue has a list called freeList which have free queue entries typed passed at template initialization.

        
      
void
MSHR::allocate(Addr blk_addr, unsigned blk_size, PacketPtr target,
              Tick when_ready, Counter _order, bool alloc_on_fill)
{
   blkAddr = blk_addr;
   blkSize = blk_size;
   isSecure = target->isSecure();
   readyTime = when_ready;
   order = _order;
   assert(target);
   isForward = false;
   wasWholeLineWrite = false;
   _isUncacheable = target->req->isUncacheable();
   inService = false;
   downstreamPending = false;

   targets.init(blkAddr, blkSize);
   deferredTargets.init(blkAddr, blkSize);

   // Don't know of a case where we would allocate a new MSHR for a
   // snoop (mem-side request), so set source according to request here
   Target::Source source = (target->cmd == MemCmd::HardPFReq) ?
       Target::FromPrefetcher : Target::FromCPU;
   targets.add(target, when_ready, _order, source, true, alloc_on_fill);

   // All targets must refer to the same block
   assert(target->matchBlockAddr(targets.front().pkt, blkSize));
}

First of all, the retrieved MSHR entry should be initialized. The allocation function of the MSHR object first initialize the targets list. Remember that one MSHR entry can have multiple targets. Also, those targets are maintained by targets and deferredTargets TargetList. Therefore, the two TargetLists should be initialized first. After the initialization, it adds the current request to the targets list.

        
      
   typename Entry::Iterator addToReadyList(Entry* entry)
   {
       if (readyList.empty() ||
           readyList.back()->readyTime <= entry->readyTime) {
           return readyList.insert(readyList.end(), entry);
       }

       for (auto i = readyList.begin(); i != readyList.end(); ++i) {
           if ((*i)->readyTime > entry->readyTime) {
               return readyList.insert(i, entry);
           }
       }
       panic("Failed to add to ready list.");
   } 

After the MSHR entry is initialized, the packet should also be registered to the readyList of the MSHRQueue. The readyList manages all MSHR entries in ascending order of the readyTime of the initial packet that populated the MSHR entry. Because the MSHR entries should be processed in the readyTime order, when the time specified by the readyTime reaches, the waiting MSHR will be processed. You can think of the readyList is kind of a queue determines the order which entry should be processed first among all MSHR entries.

schedMemSideSendEvent: schedule sending deferred packet

After allocating the MSHR entry for the missed packet, the missed request should be forwarded to the next cache level or the memory based on where the current cache is located on. However, the real hardware cannot process cache miss and forwarding at the same clock cycle. Therefore, it schedules the sending missed cache request packet after a few clock cycles elapsed. For that purpose, the schedMemSideSendEvent function is invoked.

        
      
   /**
    * Schedule a send event for the memory-side port. If already
    * scheduled, this may reschedule the event at an earlier
    * time. When the specified time is reached, the port is free to
    * send either a response, a request, or a prefetch request.
    *      
    * @param time The time when to attempt sending a packet.
    */ 
   void schedMemSideSendEvent(Tick time) 
   { 
       memSidePort.schedSendEvent(time);
   }  

We took a look at the schedSendEvent function provided by the PacketQueue. The major job of the function was registering event to process deferred packet and send response to the CpuSidePort. However, note that we are currently looking at the memSidePort’s schedSendEvent.

        
      
   /**
    * The memory-side port extends the base cache request port with
    * access functions for functional, atomic and timing snoops.
    */
   class MemSidePort : public CacheRequestPort
   {
     private:

       /** The cache-specific queue. */
       CacheReqPacketQueue _reqQueue;

       SnoopRespPacketQueue _snoopRespQueue;

       // a pointer to our specific cache implementation
       BaseCache *cache;

     protected:

       virtual void recvTimingSnoopReq(PacketPtr pkt);

       virtual bool recvTimingResp(PacketPtr pkt);

       virtual Tick recvAtomicSnoop(PacketPtr pkt);

       virtual void recvFunctionalSnoop(PacketPtr pkt);

     public:

       MemSidePort(const std::string &_name, BaseCache *_cache,
                   const std::string &_label);
   };

Because it doesn’t provide the function schedSendEvent, we should go deeper to its parent class, CacheRequestPort.

        
      
   /**
    * A cache request port is used for the memory-side port of the
    * cache, and in addition to the basic timing port that only sends
    * response packets through a transmit list, it also offers the
    * ability to schedule and send request packets (requests &
    * writebacks). The send event is scheduled through schedSendEvent,
    * and the sendDeferredPacket of the timing port is modified to
    * consider both the transmit list and the requests from the MSHR.
    */
   class CacheRequestPort : public QueuedRequestPort
   {

     public:

       /**
        * Schedule a send of a request packet (from the MSHR). Note
        * that we could already have a retry outstanding.
        */
       void schedSendEvent(Tick time)
       {
           DPRINTF(CachePort, "Scheduling send event at %llu\n", time);
           reqQueue.schedSendEvent(time);
       }

     protected:

       CacheRequestPort(const std::string &_name, BaseCache *_cache,
                       ReqPacketQueue &_reqQueue,
                       SnoopRespPacketQueue &_snoopRespQueue) :
           QueuedRequestPort(_name, _cache, _reqQueue, _snoopRespQueue)
       { }

       /**
        * Memory-side port always snoops.
        *
        * @return always true
        */
       virtual bool isSnooping() const { return true; }
   };

Yeah this has very similar interfaces with the CpuSidePort. However, the schedSendEvent function invokes schedSendEvent function of the reqQueue instead of the respQueue.

        
      
void
PacketQueue::schedSendEvent(Tick when)
{
   // if we are waiting on a retry just hold off
   if (waitingOnRetry) {
       DPRINTF(PacketQueue, "Not scheduling send as waiting for retry\n");
       assert(!sendEvent.scheduled());
       return;
   }

   if (when != MaxTick) {
       // we cannot go back in time, and to be consistent we stick to
       // one tick in the future
       when = std::max(when, curTick() + 1);
       // @todo Revisit the +1

       if (!sendEvent.scheduled()) {
           em.schedule(&sendEvent, when);
       } else if (when < sendEvent.when()) {
           // if the new time is earlier than when the event
           // currently is scheduled, move it forward
           em.reschedule(&sendEvent, when);
       }
   } else {
       // we get a MaxTick when there is no more to send, so if we're
       // draining, we may be done at this point
       if (drainState() == DrainState::Draining &&
           transmitList.empty() && !sendEvent.scheduled()) {

           DPRINTF(Drain, "PacketQueue done draining,"
                   "processing drain event\n");
           signalDrainDone();
       }
   }
}

Although the reqQueue type is different from respQueue, note that the same methods are invoked because they both inherit the PacketQueue class.

        
      
PacketQueue::PacketQueue(EventManager& _em, const std::string& _label,
                        const std::string& _sendEventName,
                        bool force_order,
                        bool disable_sanity_check)
   : em(_em), sendEvent([this]{ processSendEvent(); }, _sendEventName),
     _disableSanityCheck(disable_sanity_check),
     forceOrder(force_order),
     label(_label), waitingOnRetry(false)
{
}
......
void 
PacketQueue::processSendEvent()
{
   assert(!waitingOnRetry);
   sendDeferredPacket();
}

It schedules sendEvent and involves processSendEvent when the event fires. However, when the sendEvent raises, processSendEvent function invokes different sendDeferredPacket function. Note that respQueue is CacheReqPacketQueue inheriting ReqPacketQueue. Also, the CacheReqPacketQueue overrides sendDeferredPacket implemented in the PacketQueue class. Although the CacheReqPacketQueue inherits the PacketQueue class, the overidden implementation of sendDeferredPacket will be invoked instead.

        
      
void
BaseCache::CacheReqPacketQueue::sendDeferredPacket()
{
   // sanity check
   assert(!waitingOnRetry);

   // there should never be any deferred request packets in the
   // queue, instead we rely on the cache to provide the packets
   // from the MSHR queue or write queue
   assert(deferredPacketReadyTime() == MaxTick);

   // check for request packets (requests & writebacks)
   QueueEntry* entry = cache.getNextQueueEntry();

   if (!entry) {
       // can happen if e.g. we attempt a writeback and fail, but
       // before the retry, the writeback is eliminated because
       // we snoop another cache's ReadEx.
   } else {
       // let our snoop responses go first if there are responses to
       // the same addresses
       if (checkConflictingSnoop(entry->getTarget()->pkt)) {
           return;
       }
       waitingOnRetry = entry->sendPacket(cache);
   }

   // if we succeeded and are not waiting for a retry, schedule the
   // next send considering when the next queue is ready, note that
   // snoop responses have their own packet queue and thus schedule
   // their own events
   if (!waitingOnRetry) {
       schedSendEvent(cache.nextQueueReadyTime());
   }
}

You might remember that the sendDeferredPacket of the PacketQueue utilizes the transmitList to dequeue the packets and send it to the CPU in our previous cache hit cases (sending response to the CPU). However, when the cache miss happens, it needs help from complicated cache units MSHR and writeBuffer. Also, you might have noticed that the packet had not been pushed to the transmitList but MSHR or writeBuffer. Instead of searching the transmitList, it invokes getNextQueueEntry function to find the next entry to process.

getNextQueueEntry: select entry to send to the memory either from MSHR or writeBuffer

        
      
QueueEntry*
BaseCache::getNextQueueEntry()
{
   // Check both MSHR queue and write buffer for potential requests,
   // note that null does not mean there is no request, it could
   // simply be that it is not ready
   MSHR *miss_mshr  = mshrQueue.getNext();
   WriteQueueEntry *wq_entry = writeBuffer.getNext();

When the cache miss happens, the missed request packet could be stored in either MSHR or WriteBuffer. This is because the sending memory request operations can be issued from two different units depending on the type of the memory request. However, the sending response to the upper cache or processor can be handled in unified way regardless of the request type.

getNext functions return entry which becomes ready to be processed

When one entry is retrieved with the getNext method in the getNextQueueEntry function, it returns the MSHR entry or writeBack entry that waits the longest time among them. Note that getNext function is defined in the Queue class, and the WriteBuffer and MSHRQueue inherits the Queue class.

        
      
   /**
    * Returns the WriteQueueEntry at the head of the readyList.
    * @return The next request to service.
    */
   Entry* getNext() const
   {
       if (readyList.empty() || readyList.front()->readyTime > curTick()) {
           return nullptr;
       }
       return readyList.front();
   }

The getNext function returns the first entry stored in the readyList. Note that the front entry of the readyList is the entry that has highest priority based on the readyTime. Therefore, it can process the entry that needs to be handled as soon as possible.

        
      
   // If we got a write buffer request ready, first priority is a
   // full write buffer, otherwise we favour the miss requests
   if (wq_entry && (writeBuffer.isFull() || !miss_mshr)) {
       // need to search MSHR queue for conflicting earlier miss.
       MSHR *conflict_mshr = mshrQueue.findPending(wq_entry);

       if (conflict_mshr && conflict_mshr->order < wq_entry->order) {
           // Service misses in order until conflict is cleared.
           return conflict_mshr;

           // @todo Note that we ignore the ready time of the conflict here
       }

       // No conflicts; issue write
       return wq_entry;
   } else if (miss_mshr) {
       // need to check for conflicting earlier writeback
       WriteQueueEntry *conflict_mshr = writeBuffer.findPending(miss_mshr);
       if (conflict_mshr) {
           // not sure why we don't check order here... it was in the
           // original code but commented out.

           // The only way this happens is if we are
           // doing a write and we didn't have permissions
           // then subsequently saw a writeback (owned got evicted)
           // We need to make sure to perform the writeback first
           // To preserve the dirty data, then we can issue the write

           // should we return wq_entry here instead?  I.e. do we
           // have to flush writes in order?  I don't think so... not
           // for Alpha anyway.  Maybe for x86?
           return conflict_mshr;

           // @todo Note that we ignore the ready time of the conflict here
       }

       // No conflicts; issue read
       return miss_mshr;
   }

After the two entries from the MSHR and writeBack queue are retrieved, it should check condition of two entries to determine which entry should be processed first. It is important to note that the port from the cache unit to the memory is limited resource. However, because we have two input sources to choose we need to determine which packet retrieved from where should be sent to the memory. Here, the logic put more priority in consuming full writeBuffer. When the writeBuffer is not full, then MSHRqueue will be consumed. Also, even when the writeBuffer is full, if there is conflicting and earlier entry in the MSHR, then the selected entry should be replaced with the conflicting MSHR entry. Otherwise, the selected entry from the writeBuffer will be returned. Based on the comment in the left part of the getNextQueueEntry function, it seems that the selecting order is somewhat controversial, so I will skip them.

Generate prefetching request when there is no entries to process

        
      
   // fall through... no pending requests.  Try a prefetch.
   assert(!miss_mshr && !wq_entry);
   if (prefetcher && mshrQueue.canPrefetch() && !isBlocked()) {
       // If we have a miss queue slot, we can try a prefetch
       PacketPtr pkt = prefetcher->getPacket();
       if (pkt) {
           Addr pf_addr = pkt->getBlockAddr(blkSize);
           if (tags->findBlock(pf_addr, pkt->isSecure())) {
               DPRINTF(HWPrefetch, "Prefetch %#x has hit in cache, "
                       "dropped.\n", pf_addr);
               prefetcher->pfHitInCache();
               // free the request and packet
               delete pkt;
           } else if (mshrQueue.findMatch(pf_addr, pkt->isSecure())) {
               DPRINTF(HWPrefetch, "Prefetch %#x has hit in a MSHR, "
                       "dropped.\n", pf_addr);
               prefetcher->pfHitInMSHR();
               // free the request and packet
               delete pkt;
           } else if (writeBuffer.findMatch(pf_addr, pkt->isSecure())) {
               DPRINTF(HWPrefetch, "Prefetch %#x has hit in the "
                       "Write Buffer, dropped.\n", pf_addr);
               prefetcher->pfHitInWB();
               // free the request and packet
               delete pkt;
           } else {
               // Update statistic on number of prefetches issued
               // (hwpf_mshr_misses)
               assert(pkt->req->requestorId() < system->maxRequestors());
               stats.cmdStats(pkt).mshrMisses[pkt->req->requestorId()]++;

               // allocate an MSHR and return it, note
               // that we send the packet straight away, so do not
               // schedule the send
               return allocateMissBuffer(pkt, curTick(), false);
           }
       }
   }

   return nullptr;
}

The fall through pass can only be reachable when there are no suitable request waiting in the writeBuffer and mshrQueue. In that case, it tries to prefetch entries. Note that this prefetching is not software thing, but a hardware prefetcher generated addresses are accessed. Because hardware prefetcher doesn’t know whether the cache or other waiting queues already have entry for that prefetched cache line, it checks them to confirm this is the fresh prefetch request. If it is the fresh request, then add the request to the MSHR. Because the added request will be handled later when the next events happen, so it returns nullptr to report that there is no packet to be sent to the memory at this cycle.

checkConflictingSnoop

        
      
   if (!entry) {
       // can happen if e.g. we attempt a writeback and fail, but
       // before the retry, the writeback is eliminated because
       // we snoop another cache's ReadEx.
   } else {
       // let our snoop responses go first if there are responses to
       // the same addresses
       if (checkConflictingSnoop(entry->getTarget()->pkt)) {
           return;
       }
       waitingOnRetry = entry->sendPacket(cache);
   }

After the entry is found it should check that whether the found entry has conflicting snoop response.

        
      
       /**
        * Check if there is a conflicting snoop response about to be
        * send out, and if so simply stall any requests, and schedule
        * a send event at the same time as the next snoop response is
        * being sent out.
        *
        * @param pkt The packet to check for conflicts against.
        */
       bool checkConflictingSnoop(const PacketPtr pkt)
       {   
           if (snoopRespQueue.checkConflict(pkt, cache.blkSize)) {
               DPRINTF(CachePort, "Waiting for snoop response to be "
                       "sent\n");
               Tick when = snoopRespQueue.deferredPacketReadyTime();
               schedSendEvent(when);
               return true;
           }
           return false;
       }

In other words, if there are the waiting snoop response for the same address, currently selected entry should be deferred until the snooping response is handled. The deferredPacketReadyTime function calculates the required time to send the snoop response, so that the cache miss handling is done after the elapsed time passes (by schedSendEvent).

        
      
bool             
PacketQueue::checkConflict(const PacketPtr pkt, const int blk_size) const
{
   // caller is responsible for ensuring that all packets have the
   // same alignment
   for (const auto& p : transmitList) {
       if (p.pkt->matchBlockAddr(pkt, blk_size))
           return true;
   }
   return false;
}

Because the SnoopRespPacketQueue is the child of PacketQueue, it invokes the above checkConflict function to figure out if there is waiting snoopResponse packet for the same address of the selected entry.

finally sendPacket

When there is no conflict between the selected entry and the snoop response, it will send the request stored in the selected entry.

        
      
void
BaseCache::CacheReqPacketQueue::sendDeferredPacket()
......
   QueueEntry* entry = cache.getNextQueueEntry();
2562
   if (!entry) {
       // can happen if e.g. we attempt a writeback and fail, but
       // before the retry, the writeback is eliminated because
       // we snoop another cache's ReadEx.
   } else {
       // let our snoop responses go first if there are responses to
       // the same addresses
       if (checkConflictingSnoop(entry->getTarget()->pkt)) {
           return;
       }
       waitingOnRetry = entry->sendPacket(cache);
   }
2575
   // if we succeeded and are not waiting for a retry, schedule the
   // next send considering when the next queue is ready, note that
   // snoop responses have their own packet queue and thus schedule
   // their own events
   if (!waitingOnRetry) {
       schedSendEvent(cache.nextQueueReadyTime());
   }
}

The sendPacket function is defined as a virtual function in the QueueEntry class. Therefore, the corresponding implementation of the sendPacket function should be implemented in the MSHR class and WriteQueueEntry class.

Therefore, based on which type of packet is selected, one of below sendPacket implementation will be invoked. Also note that the CacheReqPacketQueue has member field cache which is the reference of the BaseCache. And this cache field is initialized as the cache object itself who owns this CacheReqPacketQueue. In our case it will be the Cache object.

        
      
bool
MSHR::sendPacket(BaseCache &cache)
{
   return cache.sendMSHRQueuePacket(this);
}

        
      
bool
WriteQueueEntry::sendPacket(BaseCache &cache)
{
   return cache.sendWriteQueuePacket(this);
}

Processing selected MSHR entry

Cache::sendMSHRQueuePacket

        
      
bool
Cache::sendMSHRQueuePacket(MSHR* mshr)
{
   assert(mshr);

   // use request from 1st target
   PacketPtr tgt_pkt = mshr->getTarget()->pkt;

   if (tgt_pkt->cmd == MemCmd::HardPFReq && forwardSnoops) {
       DPRINTF(Cache, "%s: MSHR %s\n", __func__, tgt_pkt->print());

       // we should never have hardware prefetches to allocated
       // blocks
       assert(!tags->findBlock(mshr->blkAddr, mshr->isSecure));

       // We need to check the caches above us to verify that
       // they don't have a copy of this block in the dirty state
       // at the moment. Without this check we could get a stale
       // copy from memory that might get used in place of the
       // dirty one.
       Packet snoop_pkt(tgt_pkt, true, false);
       snoop_pkt.setExpressSnoop();
       // We are sending this packet upwards, but if it hits we will
       // get a snoop response that we end up treating just like a
       // normal response, hence it needs the MSHR as its sender
       // state
       snoop_pkt.senderState = mshr;
       cpuSidePort.sendTimingSnoopReq(&snoop_pkt);

       // Check to see if the prefetch was squashed by an upper cache (to
       // prevent us from grabbing the line) or if a Check to see if a
       // writeback arrived between the time the prefetch was placed in
       // the MSHRs and when it was selected to be sent or if the
       // prefetch was squashed by an upper cache.

       // It is important to check cacheResponding before
       // prefetchSquashed. If another cache has committed to
       // responding, it will be sending a dirty response which will
       // arrive at the MSHR allocated for this request. Checking the
       // prefetchSquash first may result in the MSHR being
       // prematurely deallocated.
       if (snoop_pkt.cacheResponding()) {
           GEM5_VAR_USED auto r = outstandingSnoop.insert(snoop_pkt.req);
           assert(r.second);

           // if we are getting a snoop response with no sharers it
           // will be allocated as Modified
           bool pending_modified_resp = !snoop_pkt.hasSharers();
           markInService(mshr, pending_modified_resp);

           DPRINTF(Cache, "Upward snoop of prefetch for addr"
                   " %#x (%s) hit\n",
                   tgt_pkt->getAddr(), tgt_pkt->isSecure()? "s": "ns");
           return false;
       }

       if (snoop_pkt.isBlockCached()) {
           DPRINTF(Cache, "Block present, prefetch squashed by cache.  "
                   "Deallocating mshr target %#x.\n",
                   mshr->blkAddr);

           // Deallocate the mshr target
           if (mshrQueue.forceDeallocateTarget(mshr)) {
               // Clear block if this deallocation resulted freed an
               // mshr when all had previously been utilized
               clearBlocked(Blocked_NoMSHRs);
           }

           // given that no response is expected, delete Request and Packet
           delete tgt_pkt;

           return false;
       }
   }

   return BaseCache::sendMSHRQueuePacket(mshr);
}

Because we are currently dealing with Cache not the BaseCache, it should first invokes sendMSHRQueuePacket of the Cache class. Although it has pretty complicated code, most of the code are not relevant to general MSHR packet handling. At the end of the function it invokes sendMSHRQueuePacket function of the BaseCache to handle the packets in common scenario.

BaseCache::sendMSHRQueuePacket

        
      
bool
BaseCache::sendMSHRQueuePacket(MSHR* mshr)
{
   assert(mshr);

   // use request from 1st target
   PacketPtr tgt_pkt = mshr->getTarget()->pkt;

   DPRINTF(Cache, "%s: MSHR %s\n", __func__, tgt_pkt->print());

   // if the cache is in write coalescing mode or (additionally) in
   // no allocation mode, and we have a write packet with an MSHR
   // that is not a whole-line write (due to incompatible flags etc),
   // then reset the write mode
   if (writeAllocator && writeAllocator->coalesce() && tgt_pkt->isWrite()) {
       if (!mshr->isWholeLineWrite()) {
           // if we are currently write coalescing, hold on the
           // MSHR as many cycles extra as we need to completely
           // write a cache line
           if (writeAllocator->delay(mshr->blkAddr)) {
               Tick delay = blkSize / tgt_pkt->getSize() * clockPeriod();
               DPRINTF(CacheVerbose, "Delaying pkt %s %llu ticks to allow "
                       "for write coalescing\n", tgt_pkt->print(), delay);
               mshrQueue.delay(mshr, delay);
               return false;
           } else {
               writeAllocator->reset();
           }
       } else {
           writeAllocator->resetDelay(mshr->blkAddr);
       }
   }

   CacheBlk *blk = tags->findBlock(mshr->blkAddr, mshr->isSecure);

   // either a prefetch that is not present upstream, or a normal
   // MSHR request, proceed to get the packet to send downstream
   PacketPtr pkt = createMissPacket(tgt_pkt, blk, mshr->needsWritable(),
                                    mshr->isWholeLineWrite());

Note that we are currently have information about the MSHR entry selected based on the priority and timing. Therefore, the first job is find the associated cache block if exist and generate MissPacket to send it to next level cache or memory.

createMissPacket

Remind that we are here because of the cache miss event. However, based on the event, the cache miss request might be already associated with specific cache block. For example, when the cache block is allocated and set as non-writable state, the cache miss event happens and make the allocated block as exclusively writable. For that purpose, it should generate proper packet and send it through the XBar to the other components that might share the cache block. Let’s take a look at more details.

        
      
PacketPtr
Cache::createMissPacket(PacketPtr cpu_pkt, CacheBlk *blk,
                       bool needsWritable,
                       bool is_whole_line_write) const
{
   // should never see evictions here
   assert(!cpu_pkt->isEviction());

   bool blkValid = blk && blk->isValid();

   if (cpu_pkt->req->isUncacheable() ||
       (!blkValid && cpu_pkt->isUpgrade()) ||
       cpu_pkt->cmd == MemCmd::InvalidateReq || cpu_pkt->isClean()) {
       // uncacheable requests and upgrades from upper-level caches
       // that missed completely just go through as is
       return nullptr;
   }

   assert(cpu_pkt->needsResponse());

   MemCmd cmd;
   // @TODO make useUpgrades a parameter.
   // Note that ownership protocols require upgrade, otherwise a
   // write miss on a shared owned block will generate a ReadExcl,
   // which will clobber the owned copy.
   const bool useUpgrades = true;
   assert(cpu_pkt->cmd != MemCmd::WriteLineReq || is_whole_line_write);
   if (is_whole_line_write) {
       assert(!blkValid || !blk->isSet(CacheBlk::WritableBit));
       // forward as invalidate to all other caches, this gives us
       // the line in Exclusive state, and invalidates all other
       // copies
       cmd = MemCmd::InvalidateReq;
   } else if (blkValid && useUpgrades) {
       // only reason to be here is that blk is read only and we need
       // it to be writable
       assert(needsWritable);
       assert(!blk->isSet(CacheBlk::WritableBit));
       cmd = cpu_pkt->isLLSC() ? MemCmd::SCUpgradeReq : MemCmd::UpgradeReq;
   } else if (cpu_pkt->cmd == MemCmd::SCUpgradeFailReq ||
              cpu_pkt->cmd == MemCmd::StoreCondFailReq) {
       // Even though this SC will fail, we still need to send out the
       // request and get the data to supply it to other snoopers in the case
       // where the determination the StoreCond fails is delayed due to
       // all caches not being on the same local bus.
       cmd = MemCmd::SCUpgradeFailReq;
   } else {
       // block is invalid

       // If the request does not need a writable there are two cases
       // where we need to ensure the response will not fetch the
       // block in dirty state:
       // * this cache is read only and it does not perform
       //   writebacks,
       // * this cache is mostly exclusive and will not fill (since
       //   it does not fill it will have to writeback the dirty data
       //   immediately which generates uneccesary writebacks).
       bool force_clean_rsp = isReadOnly || clusivity == enums::mostly_excl;
       cmd = needsWritable ? MemCmd::ReadExReq :
           (force_clean_rsp ? MemCmd::ReadCleanReq : MemCmd::ReadSharedReq);
   }
   PacketPtr pkt = new Packet(cpu_pkt->req, cmd, blkSize);

   // if there are upstream caches that have already marked the
   // packet as having sharers (not passing writable), pass that info
   // downstream
   if (cpu_pkt->hasSharers() && !needsWritable) {
       // note that cpu_pkt may have spent a considerable time in the
       // MSHR queue and that the information could possibly be out
       // of date, however, there is no harm in conservatively
       // assuming the block has sharers
       pkt->setHasSharers();
       DPRINTF(Cache, "%s: passing hasSharers from %s to %s\n",
               __func__, cpu_pkt->print(), pkt->print());
   }

   // the packet should be block aligned
   assert(pkt->getAddr() == pkt->getBlockAddr(blkSize));

   pkt->allocate();
   DPRINTF(Cache, "%s: created %s from %s\n", __func__, pkt->print(),
           cpu_pkt->print());
   return pkt;
}

Most of the time the else condition will be excuted and the ReadExReq packet will be generated for the cache miss event caused by read operation.

Sending miss packet !

        
      
bool
BaseCache::sendMSHRQueuePacket(MSHR* mshr)
{
......
   mshr->isForward = (pkt == nullptr);

   if (mshr->isForward) {
       // not a cache block request, but a response is expected
       // make copy of current packet to forward, keep current
       // copy for response handling
       pkt = new Packet(tgt_pkt, false, true);
       assert(!pkt->isWrite());
   }

   // play it safe and append (rather than set) the sender state,
   // as forwarded packets may already have existing state
   pkt->pushSenderState(mshr);

   if (pkt->isClean() && blk && blk->isSet(CacheBlk::DirtyBit)) {
       // A cache clean opearation is looking for a dirty block. Mark
       // the packet so that the destination xbar can determine that
       // there will be a follow-up write packet as well.
       pkt->setSatisfied();
   }

   if (!memSidePort.sendTimingReq(pkt)) {
       // we are awaiting a retry, but we
       // delete the packet and will be creating a new packet
       // when we get the opportunity
       delete pkt;

       // note that we have now masked any requestBus and
       // schedSendEvent (we will wait for a retry before
       // doing anything), and this is so even if we do not
       // care about this packet and might override it before
       // it gets retried
       return true;
   } else {
       // As part of the call to sendTimingReq the packet is
       // forwarded to all neighbouring caches (and any caches
       // above them) as a snoop. Thus at this point we know if
       // any of the neighbouring caches are responding, and if
       // so, we know it is dirty, and we can determine if it is
       // being passed as Modified, making our MSHR the ordering
       // point
       bool pending_modified_resp = !pkt->hasSharers() &&
           pkt->cacheResponding();
       markInService(mshr, pending_modified_resp);

       if (pkt->isClean() && blk && blk->isSet(CacheBlk::DirtyBit)) {
           // A cache clean opearation is looking for a dirty
           // block. If a dirty block is encountered a WriteClean
           // will update any copies to the path to the memory
           // until the point of reference.
           DPRINTF(CacheVerbose, "%s: packet %s found block: %s\n",
                   __func__, pkt->print(), blk->print());
           PacketPtr wb_pkt = writecleanBlk(blk, pkt->req->getDest(),
                                            pkt->id);
           PacketList writebacks;
           writebacks.push_back(wb_pkt);
           doWritebacks(writebacks, 0);
       }

       return false;
   }
}

end of the recvTimingReq of the cache.

Two ports in the cache

        
      
/**
* A basic cache interface. Implements some common functions for speed.
*/
class BaseCache : public ClockedObject
{
......
   CpuSidePort cpuSidePort;
   MemSidePort memSidePort;

CpuSidePort: receive request from the processor and send response

        
      
   /**
    * The CPU-side port extends the base cache response port with access
    * functions for functional, atomic and timing requests.
    */
   class CpuSidePort : public CacheResponsePort
   {
     private:

       // a pointer to our specific cache implementation
       BaseCache *cache;

     protected:
       virtual bool recvTimingSnoopResp(PacketPtr pkt) override;

       virtual bool tryTiming(PacketPtr pkt) override;

       virtual bool recvTimingReq(PacketPtr pkt) override;

       virtual Tick recvAtomic(PacketPtr pkt) override;

       virtual void recvFunctional(PacketPtr pkt) override;

       virtual AddrRangeList getAddrRanges() const override;

     public:

       CpuSidePort(const std::string &_name, BaseCache *_cache,
                   const std::string &_label);

   };


        
      
BaseCache::BaseCache(const BaseCacheParams &p, unsigned blk_size)
   : ClockedObject(p),
     cpuSidePort (p.name + ".cpu_side_port", this, "CpuSidePort"),
     memSidePort(p.name + ".mem_side_port", this, "MemSidePort"),
     mshrQueue("MSHRs", p.mshrs, 0, p.demand_mshr_reserve, p.name),
     writeBuffer("write buffer", p.write_buffers, p.mshrs, p.name),

cpuSidePort is a member field of the BaseCache, but it has cache member field which is a pointer to the BaseCache. Note that this field is initialized as pointing to the BaseCache itself that embeds the cpuSidePort. Also, it has recvTimingReq function that will be invoked when the processor tries to send request to the cache.

CacheResponsePort

        
      
   /**
    * A cache response port is used for the CPU-side port of the cache,
    * and it is basically a simple timing port that uses a transmit
    * list for responses to the CPU (or connected requestor). In
    * addition, it has the functionality to block the port for
    * incoming requests. If blocked, the port will issue a retry once
    * unblocked.
    */
   class CacheResponsePort : public QueuedResponsePort
   {

     public:

       /** Do not accept any new requests. */
       void setBlocked();

       /** Return to normal operation and accept new requests. */
       void clearBlocked();

       bool isBlocked() const { return blocked; }

     protected:

       CacheResponsePort(const std::string &_name, BaseCache *_cache,
                      const std::string &_label);

       /** A normal packet queue used to store responses. */
       RespPacketQueue queue;

       bool blocked;

       bool mustSendRetry;

     private:

       void processSendRetry();

       EventFunctionWrapper sendRetryEvent;

   };

        
      
BaseCache::CacheResponsePort::CacheResponsePort(const std::string &_name,
                                         BaseCache *_cache,
                                         const std::string &_label)
   : QueuedResponsePort(_name, _cache, queue),
     queue(*_cache, *this, true, _label),
     blocked(false), mustSendRetry(false),
     sendRetryEvent([this]{ processSendRetry(); }, _name)
{
}

The CpuSidePort class inherits the CacheResponsePort. The main functionality of the CacheResponsePort is allowing the port to be blocked while it is busy to process previous packets.

QueuedResponsePort

        
      
/**
* A queued port is a port that has an infinite queue for outgoing
* packets and thus decouples the module that wants to send
* request/responses from the flow control (retry mechanism) of the
* port. A queued port can be used by both a requestor and a responder. The
* queue is a parameter to allow tailoring of the queue implementation
* (used in the cache).
*/      
class QueuedResponsePort : public ResponsePort
{      

 protected:

   /** Packet queue used to store outgoing responses. */
   RespPacketQueue &respQueue;

   void recvRespRetry() { respQueue.retry(); }

 public:

   /**
    * Create a QueuedPort with a given name, owner, and a supplied
    * implementation of a packet queue. The external definition of
    * the queue enables e.g. the cache to implement a specific queue
    * behaviuor in a subclass, and provide the latter to the
    * QueuePort constructor. 
    */
   QueuedResponsePort(const std::string& name, SimObject* owner,
                   RespPacketQueue &resp_queue, PortID id = InvalidPortID) :
       ResponsePort(name, owner, id), respQueue(resp_queue)
   { }

   virtual ~QueuedResponsePort() { }

   /**
    * Schedule the sending of a timing response.
    *
    * @param pkt Packet to send
    * @param when Absolute time (in ticks) to send packet
    */
   void schedTimingResp(PacketPtr pkt, Tick when)
   { respQueue.schedSendTiming(pkt, when); }

   /** Check the list of buffered packets against the supplied
    * functional request. */
   bool trySatisfyFunctional(PacketPtr pkt)
   { return respQueue.trySatisfyFunctional(pkt); }
};

ResponsePort

        
      
/**
* A ResponsePort is a specialization of a port. In addition to the
* basic functionality of sending packets to its requestor peer, it also
* has functions specific to a responder, e.g. to send range changes
* and get the address ranges that the port responds to.
*
* The three protocols are atomic, timing, and functional, each with its own
* header file.
*/
class ResponsePort : public Port, public AtomicResponseProtocol,
   public TimingResponseProtocol, public FunctionalResponseProtocol
{
   friend class RequestPort;

 private:
   RequestPort* _requestPort;

   bool defaultBackdoorWarned;

 protected:
   SimObject& owner;

 public:
   ResponsePort(const std::string& name, SimObject* _owner,
             PortID id=InvalidPortID);
   virtual ~ResponsePort();

   /**
    * Find out if the peer request port is snooping or not.
    *
    * @return true if the peer request port is snooping
    */
   bool isSnooping() const { return _requestPort->isSnooping(); }

   /**
    * Called by the owner to send a range change
    */
   void sendRangeChange() const { _requestPort->recvRangeChange(); }

   /**
    * Get a list of the non-overlapping address ranges the owner is
    * responsible for. All response ports must override this function
    * and return a populated list with at least one item.
    *
    * @return a list of ranges responded to
    */
   virtual AddrRangeList getAddrRanges() const = 0;

   /**
    * We let the request port do the work, so these don't do anything.
    */
   void unbind() override {}
   void bind(Port &peer) override {}

 public:
   /* The atomic protocol. */

   /**
    * Send an atomic snoop request packet, where the data is moved
    * and the state is updated in zero time, without interleaving
    * with other memory accesses.
    *
    * @param pkt Snoop packet to send.
    *
    * @return Estimated latency of access.
    */
   Tick
   sendAtomicSnoop(PacketPtr pkt)
   {
       try {
           return AtomicResponseProtocol::sendSnoop(_requestPort, pkt);
       } catch (UnboundPortException) {
           reportUnbound();
       }
   }

 public:
   /* The functional protocol. */

   /**
    * Send a functional snoop request packet, where the data is
    * instantly updated everywhere in the memory system, without
    * affecting the current state of any block or moving the block.
    *
    * @param pkt Snoop packet to send.
    */
   void
   sendFunctionalSnoop(PacketPtr pkt) const
   {
       try {
           FunctionalResponseProtocol::sendSnoop(_requestPort, pkt);
       } catch (UnboundPortException) {
           reportUnbound();
       }
   }

 public:
   /* The timing protocol. */

   /**
    * Attempt to send a timing response to the request port by calling
    * its corresponding receive function. If the send does not
    * succeed, as indicated by the return value, then the sender must
    * wait for a recvRespRetry at which point it can re-issue a
    * sendTimingResp.
    *
    * @param pkt Packet to send.
    *
    * @return If the send was successful or not.
   */
   bool
   sendTimingResp(PacketPtr pkt)
   {
       try {
           return TimingResponseProtocol::sendResp(_requestPort, pkt);
       } catch (UnboundPortException) {
           reportUnbound();
       }
   }

   /**
    * Attempt to send a timing snoop request packet to the request port
    * by calling its corresponding receive function. Snoop requests
    * always succeed and hence no return value is needed.
    *
    * @param pkt Packet to send.
    */
   void
   sendTimingSnoopReq(PacketPtr pkt)
   {
       try {
           TimingResponseProtocol::sendSnoopReq(_requestPort, pkt);
       } catch (UnboundPortException) {
           reportUnbound();
       }
   }

   /**
    * Send a retry to the request port that previously attempted a
    * sendTimingReq to this response port and failed.
    */
   void
   sendRetryReq()
   {
       try {
           TimingResponseProtocol::sendRetryReq(_requestPort);
       } catch (UnboundPortException) {
           reportUnbound();
       }
   }

   /**
    * Send a retry to the request port that previously attempted a
    * sendTimingSnoopResp to this response port and failed.
    */
   void
   sendRetrySnoopResp()
   {
       try {
           TimingResponseProtocol::sendRetrySnoopResp(_requestPort);
       } catch (UnboundPortException) {
           reportUnbound();
       }
   }

 protected:
   /**
    * Called by the request port to unbind. Should never be called
    * directly.
    */
   void responderUnbind();

   /**
    * Called by the request port to bind. Should never be called
    * directly.
    */
   void responderBind(RequestPort& request_port);

   /**
    * Default implementations.
    */
   Tick recvAtomicBackdoor(PacketPtr pkt, MemBackdoorPtr &backdoor) override;

   bool
   tryTiming(PacketPtr pkt) override
   {
       panic("%s was not expecting a %s\n", name(), __func__);
   }

   bool
   recvTimingSnoopResp(PacketPtr pkt) override
   {
       panic("%s was not expecting a timing snoop response\n", name());
   }
};

This is the basic class that provides most of the interfaces required for handling receive operations. Although some operations are not provided by the ResponsePort, but they are provided by the TimingResponseProtocol inherited by the ResponsePort.

        
      
/**
* Response port
*/
ResponsePort::ResponsePort(const std::string& name, SimObject* _owner,
   PortID id) : Port(name, id), _requestPort(&defaultRequestPort),
   defaultBackdoorWarned(false), owner(*_owner)
{
}

ResponsePort::~ResponsePort()
{
}

void
ResponsePort::responderUnbind()
{
   _requestPort = &defaultRequestPort;
   Port::unbind();
}

void
ResponsePort::responderBind(RequestPort& request_port)
{
   _requestPort = &request_port;
   Port::bind(request_port);
}

ResponsePort is initialized with defaultRequestPort by default. Because ResponsePort needs to understand who sent the request (_requestPort), the RequestPort object reference should be passed to the ResponsePort at the time of construction. Or dynamically, it can bind to another RequestPort through the responderBind method. When proper RequestPort is not set for the ResponsePort, it will generate error messages during execution of the GEM5.

RespPacketQueue

One thing that should be maintained by the QueuedResponsePort is the response packets. When the all cache accesses finished, it should pass the response packet to the processor. However, when the processor is busy not to get the response from the cache, then it should retry later. For that purpose, the QueuedResponsePort contains RespPacketQueue which maintains all the unhandled response packets.

        
      
class RespPacketQueue : public PacketQueue
{

 protected:

   ResponsePort& cpuSidePort;

   // Static definition so it can be called when constructing the parent
   // without us being completely initialized.
   static const std::string name(const ResponsePort& cpuSidePort,
                                 const std::string& label)
   { return cpuSidePort.name() + "-" + label; }

 public:

   /**
    * Create a response packet queue, linked to an event manager, a
    * CPU-side port, and a label that will be used for functional print
    * request packets.
    *
    * @param _em Event manager used for scheduling this queue
    * @param _cpu_side_port Cpu_side port used to send the packets
    * @param force_order Force insertion order for packets with same address
    * @param _label Label to push on the label stack for print request packets
    */
   RespPacketQueue(EventManager& _em, ResponsePort& _cpu_side_port,
                   bool force_order = false,
                   const std::string _label = "RespPacketQueue");

   virtual ~RespPacketQueue() { }

   const std::string name() const
   { return name(cpuSidePort, label); }

   bool sendTiming(PacketPtr pkt);

};

        
      
RespPacketQueue::RespPacketQueue(EventManager& _em,
                                ResponsePort& _cpu_side_port,
                                bool force_order,
                                const std::string _label)
   : PacketQueue(_em, _label, name(_cpu_side_port, _label), force_order),
     cpuSidePort(_cpu_side_port)
{
}

bool
RespPacketQueue::sendTiming(PacketPtr pkt)
{
   return cpuSidePort.sendTimingResp(pkt);
}

RespPacketQueue has cpuSidePort as its member and initialized by its constructor. When the sendTiming function of the RespPacketQueue is invoked, it sends the packet through the cpuSidePort using the sendTimingResp. Also, note that the RespPacketQueue is initialized with the EventManager object’s reference. However, when you take a look at its initialization in the BaseCache::CacheResponsePort::CacheResponsePort, the queue which is the RespPacketQueue object is initialized with _cache as its first operand. Yeah it is not the EventManager but the BaseCache! Because the BaseCache is SimObject, it must inherit from EventManager class. Therefore, the cache object itself can be handled as the EventManager object. Let’s take a look at the PacketQueue which is the parent class of RespPacketQueue. Also, note that RespPacketQueue itself is not capable of scheduling event because it doesn’t have any member function or field to utilize the passed EventManager, BaseCache.

PacketQueue

Instead of the RespPacketQueue, its parent class, PacketQueue utilizes the EventManager and organize events using the schedule method and EventFunctionWrapper.

        
      
/**
* A packet queue is a class that holds deferred packets and later
* sends them using the associated CPU-side port or memory-side port.
*/
class PacketQueue : public Drainable
{
 private:
   /** A deferred packet, buffered to transmit later. */
   class DeferredPacket
   {
     public:
       Tick tick;      ///< The tick when the packet is ready to transmit
       PacketPtr pkt;  ///< Pointer to the packet to transmit
       DeferredPacket(Tick t, PacketPtr p)
           : tick(t), pkt(p)
       {}
   };

   typedef std::list<DeferredPacket> DeferredPacketList;

   /** A list of outgoing packets. */
   DeferredPacketList transmitList;

   /** The manager which is used for the event queue */
   EventManager& em;

   /** Used to schedule sending of deferred packets. */
   void processSendEvent();

   /** Event used to call processSendEvent. */
   EventFunctionWrapper sendEvent;

    /*
     * Optionally disable the sanity check
     * on the size of the transmitList. The
     * sanity check will be enabled by default.
     */
   bool _disableSanityCheck;

   /**
    * if true, inserted packets have to be unconditionally scheduled
    * after the last packet in the queue that references the same
    * address
    */
   bool forceOrder;

 protected:

   /** Label to use for print request packets label stack. */
   const std::string label;

   /** Remember whether we're awaiting a retry. */
   bool waitingOnRetry;

   /** Check whether we have a packet ready to go on the transmit list. */
   bool deferredPacketReady() const
   { return !transmitList.empty() && transmitList.front().tick <= curTick(); }

   /**
    * Attempt to send a packet. Note that a subclass of the
    * PacketQueue can override this method and thus change the
    * behaviour (as done by the cache for the request queue). The
    * default implementation sends the head of the transmit list. The
    * caller must guarantee that the list is non-empty and that the
    * head packet is scheduled for curTick() (or earlier).
    */
   virtual void sendDeferredPacket();

   /**
    * Send a packet using the appropriate method for the specific
    * subclass (request, response or snoop response).
    */
   virtual bool sendTiming(PacketPtr pkt) = 0;

   /**
    * Create a packet queue, linked to an event manager, and a label
    * that will be used for functional print request packets.
    *
    * @param _em Event manager used for scheduling this queue
    * @param _label Label to push on the label stack for print request packets
    * @param force_order Force insertion order for packets with same address
    * @param disable_sanity_check Flag used to disable the sanity check
    *        on the size of the transmitList. The check is enabled by default.
    */
   PacketQueue(EventManager& _em, const std::string& _label,
               const std::string& _sendEventName,
               bool force_order = false,
               bool disable_sanity_check = false);

   /**
    * Virtual desctructor since the class may be used as a base class.
    */
   virtual ~PacketQueue();

 public:

   /**
    * Provide a name to simplify debugging.
    *
    * @return A complete name, appended to module and port
    */
   virtual const std::string name() const = 0;

   /**
    * Get the size of the queue.
    */
   size_t size() const { return transmitList.size(); }

   /**
    * Get the next packet ready time.
    */
   Tick deferredPacketReadyTime() const
   { return transmitList.empty() ? MaxTick : transmitList.front().tick; }

   /**
    * Check if a packet corresponding to the same address exists in the
    * queue.
    *
    * @param pkt The packet to compare against.
    * @param blk_size Block size in bytes.
    * @return Whether a corresponding packet is found.
    */
   bool checkConflict(const PacketPtr pkt, const int blk_size) const;

   /** Check the list of buffered packets against the supplied
    * functional request. */
   bool trySatisfyFunctional(PacketPtr pkt);

   /**
    * Schedule a send event if we are not already waiting for a
    * retry. If the requested time is before an already scheduled
    * send event, the event will be rescheduled. If MaxTick is
    * passed, no event is scheduled. Instead, if we are idle and
    * asked to drain then check and signal drained.
    *
    * @param when time to schedule an event
    */
   void schedSendEvent(Tick when);

   /**
    * Add a packet to the transmit list, and schedule a send event.
    *
    * @param pkt Packet to send
    * @param when Absolute time (in ticks) to send packet
    */
   void schedSendTiming(PacketPtr pkt, Tick when);

   /**
    * Retry sending a packet from the queue. Note that this is not
    * necessarily the same packet if something has been added with an
    * earlier time stamp.
    */
   void retry();

   /**
     * This allows a user to explicitly disable the sanity check
     * on the size of the transmitList, which is enabled by default.
     * Users must use this function to explicitly disable the sanity
     * check.
     */
   void disableSanityCheck() { _disableSanityCheck = true; }

   DrainState drain() override;
};

Port binding

        
      
class BaseCache(ClockedObject):
   type = 'BaseCache'
......
   cpu_side = ResponsePort("Upstream port closer to the CPU and/or device")
   mem_side = RequestPort("Downstream port closer to memory")

gem5/src/python/m5/params.py

        
      
# Port description object.  Like a ParamDesc object, this represents a
# logical port in the SimObject class, not a particular port on a
# SimObject instance.  The latter are represented by PortRef objects.
class Port(object):
   # Port("role", "description")

   _compat_dict = { }

   @classmethod
   def compat(cls, role, peer):
       cls._compat_dict.setdefault(role, set()).add(peer)
       cls._compat_dict.setdefault(peer, set()).add(role)

   @classmethod
   def is_compat(cls, one, two):
       for port in one, two:
           if not port.role in Port._compat_dict:
               fatal("Unrecognized role '%s' for port %s\n", port.role, port)
       return one.role in Port._compat_dict[two.role]

   def __init__(self, role, desc, is_source=False):
       self.desc = desc
       self.role = role
       self.is_source = is_source

   # Generate a PortRef for this port on the given SimObject with the
   # given name
   def makeRef(self, simobj):
       return PortRef(simobj, self.name, self.role, self.is_source)

   # Connect an instance of this port (on the given SimObject with
   # the given name) with the port described by the supplied PortRef
   def connect(self, simobj, ref):
       self.makeRef(simobj).connect(ref)

   # No need for any pre-declarations at the moment as we merely rely
   # on an unsigned int.
   def cxx_predecls(self, code):
       pass

   def pybind_predecls(self, code):
       cls.cxx_predecls(self, code)

   # Declare an unsigned int with the same name as the port, that
   # will eventually hold the number of connected ports (and thus the
   # number of elements for a VectorPort).
   def cxx_decl(self, code):
       code('unsigned int port_$_connection_count;')

Port.compat('GEM5 REQUESTOR', 'GEM5 RESPONDER')

class RequestPort(Port):
   # RequestPort("description")
   def __init__(self, desc):
       super(RequestPort, self).__init__(
               'GEM5 REQUESTOR', desc, is_source=True)

class ResponsePort(Port):
   # ResponsePort("description")
   def __init__(self, desc):
       super(ResponsePort, self).__init__('GEM5 RESPONDER', desc)


        
      
#####################################################################
#
# Port objects
#
# Ports are used to interconnect objects in the memory system.
#
#####################################################################

# Port reference: encapsulates a reference to a particular port on a
# particular SimObject.
class PortRef(object):
......
   # Full connection is symmetric (both ways).  Called via
   # SimObject.__setattr__ as a result of a port assignment, e.g.,
   # "obj1.portA = obj2.portB", or via VectorPortElementRef.__setitem__,
   # e.g., "obj1.portA[3] = obj2.portB".
   def connect(self, other):
       if isinstance(other, VectorPortRef):
           # reference to plain VectorPort is implicit append
           other = other._get_next()
       if self.peer and not proxy.isproxy(self.peer):
           fatal("Port %s is already connected to %s, cannot connect %s\n",
                 self, self.peer, other);
       self.peer = other

       if proxy.isproxy(other):
           other.set_param_desc(PortParamDesc())
           return
       elif not isinstance(other, PortRef):
           raise TypeError("assigning non-port reference '%s' to port '%s'" \
                 % (other, self))

       if not Port.is_compat(self, other):
           fatal("Ports %s and %s with roles '%s' and '%s' "
                   "are not compatible", self, other, self.role, other.role)

       if other.peer is not self:
           other.connect(self)
......
   # Call C++ to create corresponding port connection between C++ objects
   def ccConnect(self):
       if self.ccConnected: # already done this
           return

       peer = self.peer
       if not self.peer: # nothing to connect to
           return

       port = self.simobj.getPort(self.name, self.index)
       peer_port = peer.simobj.getPort(peer.name, peer.index)
       port.bind(peer_port)

       self.ccConnected = True

        
      
void
RequestPort::bind(Port &peer)
{
   auto *response_port = dynamic_cast<ResponsePort *>(&peer);
   fatal_if(!response_port, "Can't bind port %s to non-response port %s.",
            name(), peer.name());
   // request port keeps track of the response port
   _responsePort = response_port;
   Port::bind(peer);
   // response port also keeps track of request port
   _responsePort->responderBind(*this);
}

void
ResponsePort::responderBind(RequestPort& request_port)
{
   _requestPort = &request_port;
   Port::bind(request_port);
}

        
      
/**
* Ports are used to interface objects to each other.
*/
class Port
{
   /** Attach to a peer port. */
   virtual void
   bind(Port &peer)
   {
       _peer = &peer;
       _connected = true;
   }

        
      
Port &
BaseCache::getPort(const std::string &if_name, PortID idx)
{
   if (if_name == "mem_side") {
       return memSidePort;
   } else if (if_name == "cpu_side") {
       return cpuSidePort;
   }  else {
       return ClockedObject::getPort(if_name, idx);
   }
}

#######################

allocateBlock

        
      
CacheBlk*
BaseCache::allocateBlock(const PacketPtr pkt, PacketList &writebacks)
{  
   // Get address
   const Addr addr = pkt->getAddr();

   // Get secure bit
   const bool is_secure = pkt->isSecure();

   // Block size and compression related access latency. Only relevant if
   // using a compressor, otherwise there is no extra delay, and the block
   // is fully sized
   std::size_t blk_size_bits = blkSize*8;
   Cycles compression_lat = Cycles(0);
   Cycles decompression_lat = Cycles(0);

   // If a compressor is being used, it is called to compress data before
   // insertion. Although in Gem5 the data is stored uncompressed, even if a
   // compressor is used, the compression/decompression methods are called to
   // calculate the amount of extra cycles needed to read or write compressed
   // blocks.
   if (compressor && pkt->hasData()) {
       const auto comp_data = compressor->compress(
           pkt->getConstPtr<uint64_t>(), compression_lat, decompression_lat);
       blk_size_bits = comp_data->getSizeBits();
   }

   // Find replacement victim
   std::vector<CacheBlk*> evict_blks;
   CacheBlk *victim = tags->findVictim(addr, is_secure, blk_size_bits,
                                       evict_blks);
  
   // It is valid to return nullptr if there is no victim
   if (!victim)
       return nullptr;

   // Print victim block's information
   DPRINTF(CacheRepl, "Replacement victim: %s\n", victim->print());

   // Try to evict blocks; if it fails, give up on allocation
   if (!handleEvictions(evict_blks, writebacks)) {
       return nullptr;
   }

   // Insert new block at victimized entry
   tags->insertBlock(pkt, victim);

   // If using a compressor, set compression data. This must be done after
   // insertion, as the compression bit may be set.
   if (compressor) {
       compressor->setSizeBits(victim, blk_size_bits);
       compressor->setDecompressionLatency(victim, decompression_lat);
   }

   return victim;
}

        
      
   /**
    * Find replacement victim based on address. The list of evicted blocks
    * only contains the victim.
    *
    * @param addr Address to find a victim for.
    * @param is_secure True if the target memory space is secure.
    * @param size Size, in bits, of new block to allocate.
    * @param evict_blks Cache blocks to be evicted.
    * @return Cache block to be replaced.
    */
   CacheBlk* findVictim(Addr addr, const bool is_secure,
                        const std::size_t size,
                        std::vector<CacheBlk*>& evict_blks) override
   {
       // Get possible entries to be victimized
       const std::vector<ReplaceableEntry*> entries =
           indexingPolicy->getPossibleEntries(addr);

       // Choose replacement victim from replacement candidates
       CacheBlk* victim = static_cast<CacheBlk*>(replacementPolicy->getVictim(
                               entries));

       // There is only one eviction for this replacement
       evict_blks.push_back(victim);

       return victim;
   }

getPossibleEntries select entries of one set associated with the address passed to the findVictim function. Because it returns N-ways of entries mapped to one set, the getVictim function should search proper entry to evict. As a result, one entry will be selected and pushed into the eviction list. For further memory allocation, the invalidated block is returned.

        
      
bool
BaseCache::handleEvictions(std::vector<CacheBlk*> &evict_blks,
   PacketList &writebacks)
{
   bool replacement = false;
   for (const auto& blk : evict_blks) {
       if (blk->isValid()) {
           replacement = true;

           const MSHR* mshr =
               mshrQueue.findMatch(regenerateBlkAddr(blk), blk->isSecure());
           if (mshr) {
               // Must be an outstanding upgrade or clean request on a block
               // we're about to replace
               assert((!blk->isSet(CacheBlk::WritableBit) &&
                   mshr->needsWritable()) || mshr->isCleaning());
               return false;
           }
       }
   }

   // The victim will be replaced by a new entry, so increase the replacement
   // counter if a valid block is being replaced
   if (replacement) {
       stats.replacements++;

       // Evict valid blocks associated to this victim block
       for (auto& blk : evict_blks) {
           if (blk->isValid()) {
               evictBlock(blk, writebacks);
           }
       }
   }

   return true;
}

        
      
void
BaseCache::evictBlock(CacheBlk *blk, PacketList &writebacks)
{
   PacketPtr pkt = evictBlock(blk);
   if (pkt) {
       writebacks.push_back(pkt);
   }
}

        
      
PacketPtr
Cache::evictBlock(CacheBlk *blk)
{
   PacketPtr pkt = (blk->isSet(CacheBlk::DirtyBit) || writebackClean) ?
       writebackBlk(blk) : cleanEvictBlk(blk);

   invalidateBlock(blk);

   return pkt;
}

        
      
void
BaseCache::invalidateBlock(CacheBlk *blk)
{
   // If block is still marked as prefetched, then it hasn't been used
   if (blk->wasPrefetched()) {
       prefetcher->prefetchUnused();
   }

   // Notify that the data contents for this address are no longer present
   updateBlockData(blk, nullptr, blk->isValid());

   // If handling a block present in the Tags, let it do its invalidation
   // process, which will update stats and invalidate the block itself
   if (blk != tempBlock) {
       tags->invalidate(blk);
   } else {
       tempBlock->invalidate();
   }
}   

gem5/src/mem/cache/tags/base_set_assoc.cc

        
      
void
BaseSetAssoc::invalidate(CacheBlk *blk)
{
   BaseTags::invalidate(blk);

   // Decrease the number of tags in use
   stats.tagsInUse--;

   // Invalidate replacement data
   replacementPolicy->invalidate(blk->replacementData);
}

Because the invalidate function of the BaseTag class is virtual function, it should be implemented by its children class. I utilize the base_set_assoc tags for generating cache in my system, so I will follow the implementation of the BaseSetAssoc class. Note that it invokes the invalidate function of the block first and then invalidate replacement data.

gem5/src/mem/cache_blk.hh

        
      
class CacheBlk : public TaggedEntry
{
 public:
......
   /**
    * Invalidate the block and clear all state.
    */
   virtual void invalidate() override
   {
       TaggedEntry::invalidate();

       clearPrefetched();
       clearCoherenceBits(AllBits);

       setTaskId(context_switch_task_id::Unknown);
       setWhenReady(MaxTick);
       setRefCount(0);
       setSrcRequestorId(Request::invldRequestorId);
       lockList.clear();
   }

Although the invalidate function of the CacheBlk is defined as virtual function, the system utilize the CahceBlk class as it is instead of adopting another class inheriting CacheBlk. Therefore, the invalidate function of the CacheBlk is called. Most importantly it inovkes the invalidate function of its parent class TaggedEntry. Also, it clears all the coherence bits and prefetched bit if they are set.

gem5/src/mem/tags/tagged_entry

        
      
class TaggedEntry : public ReplaceableEntry
{
......
   /** Invalidate the block. Its contents are no longer valid. */
   virtual void invalidate()
   {
       _valid = false;
       setTag(MaxAddr);
       clearSecure();
   }

Finally, it sets the _valid member field of the CacheBlk as false and clear secure flag.

GEM5, Pipeline, O3

This post is licensed under CC BY 4.0 by the author.