O3 Cache Recv

Posted Jun 10, 2021

By Jaehyuk Lee 20 min read

Cache receive

        
      
bool
BaseCache::MemSidePort::recvTimingResp(PacketPtr pkt)
{
   cache->recvTimingResp(pkt);
   return true;
}

BaseCache::recvTimingResp

        
      
void
BaseCache::recvTimingResp(PacketPtr pkt)
{
   assert(pkt->isResponse());

   // all header delay should be paid for by the crossbar, unless
   // this is a prefetch response from above
   panic_if(pkt->headerDelay != 0 && pkt->cmd != MemCmd::HardPFResp,
            "%s saw a non-zero packet delay\n", name());

   const bool is_error = pkt->isError();

   if (is_error) {
       DPRINTF(Cache, "%s: Cache received %s with error\n", __func__,
               pkt->print());
   }

   DPRINTF(Cache, "%s: Handling response %s\n", __func__,
           pkt->print());

   // if this is a write, we should be looking at an uncacheable
   // write
   if (pkt->isWrite()) {
       assert(pkt->req->isUncacheable());
       handleUncacheableWriteResp(pkt);
       return;
   }

   // we have dealt with any (uncacheable) writes above, from here on
   // we know we are dealing with an MSHR due to a miss or a prefetch
   MSHR *mshr = dynamic_cast<MSHR*>(pkt->popSenderState());
   assert(mshr);

   if (mshr == noTargetMSHR) {
       // we always clear at least one target
       clearBlocked(Blocked_NoTargets);
       noTargetMSHR = nullptr;
   }

   // Initial target is used just for stats
   const QueueEntry::Target *initial_tgt = mshr->getTarget();
   const Tick miss_latency = curTick() - initial_tgt->recvTime;
   if (pkt->req->isUncacheable()) {
       assert(pkt->req->requestorId() < system->maxRequestors());
       stats.cmdStats(initial_tgt->pkt)
           .mshrUncacheableLatency[pkt->req->requestorId()] += miss_latency;
   } else {
       assert(pkt->req->requestorId() < system->maxRequestors());
       stats.cmdStats(initial_tgt->pkt)
           .mshrMissLatency[pkt->req->requestorId()] += miss_latency;
   }

Filling the cache with the fetched data (recvTimingResp)

        
      
   PacketList writebacks;

   bool is_fill = !mshr->isForward &&
       (pkt->isRead() || pkt->cmd == MemCmd::UpgradeResp ||
        mshr->wasWholeLineWrite);

   // make sure that if the mshr was due to a whole line write then
   // the response is an invalidation
   assert(!mshr->wasWholeLineWrite || pkt->isInvalidate());

   CacheBlk *blk = tags->findBlock(pkt->getAddr(), pkt->isSecure());

   if (is_fill && !is_error) {
       DPRINTF(Cache, "Block for addr %#llx being updated in Cache\n",
               pkt->getAddr());

       const bool allocate = (writeAllocator && mshr->wasWholeLineWrite) ?
           writeAllocator->allocate() : mshr->allocOnFill();
       blk = handleFill(pkt, blk, writebacks, allocate);
       assert(blk != nullptr);
       ppFill->notify(pkt);
   }

First of all, it needs to search the current cache to find out the block mapped to the current request’s address.

handleFill

        
      
CacheBlk*
BaseCache::handleFill(PacketPtr pkt, CacheBlk *blk, PacketList &writebacks,
                     bool allocate)
{
   assert(pkt->isResponse());
   Addr addr = pkt->getAddr();
   bool is_secure = pkt->isSecure();
   const bool has_old_data = blk && blk->isValid();
   const std::string old_state = blk ? blk->print() : "";

   // When handling a fill, we should have no writes to this line.
   assert(addr == pkt->getBlockAddr(blkSize));
   assert(!writeBuffer.findMatch(addr, is_secure));

   if (!blk) {
       // better have read new data...
       assert(pkt->hasData() || pkt->cmd == MemCmd::InvalidateResp);

       // need to do a replacement if allocating, otherwise we stick
       // with the temporary storage
       blk = allocate ? allocateBlock(pkt, writebacks) : nullptr;

       if (!blk) {
           // No replaceable block or a mostly exclusive
           // cache... just use temporary storage to complete the
           // current request and then get rid of it
           blk = tempBlock;
           tempBlock->insert(addr, is_secure);
           DPRINTF(Cache, "using temp block for %#llx (%s)\n", addr,
                   is_secure ? "s" : "ns");
       }
   } else {
       // existing block... probably an upgrade
       // don't clear block status... if block is already dirty we
       // don't want to lose that
   }

When the blk is nullptr, it allocates new block depending on the allocate flag. When the flat is set, it allocates new block in the cache by invoking allocateBlock function. Note that writebacks is passed together with the packet. When the cache has no slot for the new data, then it evicts the pre-allocated one and will be handled by the writebacks. If there is no require for allocating new block, it assigns tempBlock instead of allocating new one. Note that it is required because the block is necessary in any case to process the current request.

The allocate flag is determined mostly based on the inclusiveness of current cache level. When current cache level is exclusive to the lower cache level, it doesn’t need to allocate cache line for the current cache level and just forward the request to the upper level. Note that even for the exclusive cache, if one cache level is higher than the other, the request from more higher level cache or the memory should goes through the current cache level.

setCoherenceBits:???

        
      

   // Block is guaranteed to be valid at this point
   assert(blk->isValid());
   assert(blk->isSecure() == is_secure);
   assert(regenerateBlkAddr(blk) == addr);

   blk->setCoherenceBits(CacheBlk::ReadableBit);

   // sanity check for whole-line writes, which should always be
   // marked as writable as part of the fill, and then later marked
   // dirty as part of satisfyRequest
   if (pkt->cmd == MemCmd::InvalidateResp) {
       assert(!pkt->hasSharers());
   }

   // here we deal with setting the appropriate state of the line,
   // and we start by looking at the hasSharers flag, and ignore the
   // cacheResponding flag (normally signalling dirty data) if the
   // packet has sharers, thus the line is never allocated as Owned
   // (dirty but not writable), and always ends up being either
   // Shared, Exclusive or Modified, see Packet::setCacheResponding
   // for more details
   if (!pkt->hasSharers()) {
       // we could get a writable line from memory (rather than a
       // cache) even in a read-only cache, note that we set this bit
       // even for a read-only cache, possibly revisit this decision
       blk->setCoherenceBits(CacheBlk::WritableBit);

       // check if we got this via cache-to-cache transfer (i.e., from a
       // cache that had the block in Modified or Owned state)
       if (pkt->cacheResponding()) {
           // we got the block in Modified state, and invalidated the
           // owners copy
           blk->setCoherenceBits(CacheBlk::DirtyBit);

           chatty_assert(!isReadOnly, "Should never see dirty snoop response "
                         "in read-only cache %s\n", name());

       }
   }

   DPRINTF(Cache, "Block addr %#llx (%s) moving from %s to %s\n",
           addr, is_secure ? "s" : "ns", old_state, blk->print());


Filling the block and returning the filled block

        
      
   // if we got new data, copy it in (checking for a read response
   // and a response that has data is the same in the end)
   if (pkt->isRead()) {
       // sanity checks
       assert(pkt->hasData());
       assert(pkt->getSize() == blkSize);

       updateBlockData(blk, pkt, has_old_data);
   }
   // The block will be ready when the payload arrives and the fill is done
   blk->setWhenReady(clockEdge(fillLatency) + pkt->headerDelay +
                     pkt->payloadDelay);

   return blk;
}

        
      
void
BaseCache::updateBlockData(CacheBlk *blk, const PacketPtr cpkt,
   bool has_old_data)
{
   DataUpdate data_update(regenerateBlkAddr(blk), blk->isSecure());
   if (ppDataUpdate->hasListeners()) {
       if (has_old_data) {
           data_update.oldData = std::vector<uint64_t>(blk->data,
               blk->data + (blkSize / sizeof(uint64_t)));
       }
   }

   // Actually perform the data update
   if (cpkt) {
       cpkt->writeDataToBlock(blk->data, blkSize);
   }

   if (ppDataUpdate->hasListeners()) {
       if (cpkt) {
           data_update.newData = std::vector<uint64_t>(blk->data,
               blk->data + (blkSize / sizeof(uint64_t)));
       }
       ppDataUpdate->notify(data_update);
   }
}

The actual data write is done by the updateBlockData function. Because the received packet contains the actual data that should be filled in the cache block, it copies the data from the packet to the cache block.

        
      
   /**  
    * Set tick at which block's data will be available for access. The new
    * tick must be chronologically sequential with respect to previous
    * accesses.
    *   
    * @param tick New data ready tick.
    */  
   void setWhenReady(const Tick tick)
   {        
       assert(tick >= _tickInserted);
       whenReady = tick;
   }    

Also, it needs to set the when the block will becomes ready by invoking setWhenReady function.

Promote MSHR and service its targets (recvTimingResp)

        
      

   if (blk && blk->isValid() && pkt->isClean() && !pkt->isInvalidate()) {
       // The block was marked not readable while there was a pending
       // cache maintenance operation, restore its flag.
       blk->setCoherenceBits(CacheBlk::ReadableBit);

       // This was a cache clean operation (without invalidate)
       // and we have a copy of the block already. Since there
       // is no invalidation, we can promote targets that don't
       // require a writable copy
       mshr->promoteReadable();
   }

   if (blk && blk->isSet(CacheBlk::WritableBit) &&
       !pkt->req->isCacheInvalidate()) {
       // If at this point the referenced block is writable and the
       // response is not a cache invalidate, we promote targets that
       // were deferred as we couldn't guarrantee a writable copy
       mshr->promoteWritable();
   }

serviceMSHRTargets (recvTimingResp)

        
      
   serviceMSHRTargets(mshr, pkt, blk);

Although it has updated the cache block, still the targets of the MSHR entries are waiting the data block is coming to the cache. The main job of the serviceMSHRTargets function is looping targets of the MSHR entries associates with currently received response packet. Because there are three different sources for the targets, it should be handled differently.

        
      
void
Cache::serviceMSHRTargets(MSHR *mshr, const PacketPtr pkt, CacheBlk *blk)
{
   QueueEntry::Target *initial_tgt = mshr->getTarget();
   // First offset for critical word first calculations
   const int initial_offset = initial_tgt->pkt->getOffset(blkSize);

   const bool is_error = pkt->isError();
   // allow invalidation responses originating from write-line
   // requests to be discarded
   bool is_invalidate = pkt->isInvalidate() &&
       !mshr->wasWholeLineWrite;

   MSHR::TargetList targets = mshr->extractServiceableTargets(pkt);
   for (auto &target: targets) {
       Packet *tgt_pkt = target.pkt;
       switch (target.source) {
         case MSHR::Target::FromCPU:
           Tick completion_time;
           // Here we charge on completion_time the delay of the xbar if the
           // packet comes from it, charged on headerDelay.
           completion_time = pkt->headerDelay;

           // Software prefetch handling for cache closest to core
           if (tgt_pkt->cmd.isSWPrefetch()) {
               if (tgt_pkt->needsWritable()) {
                   // All other copies of the block were invalidated and we
                   // have an exclusive copy.

                   // The coherence protocol assumes that if we fetched an
                   // exclusive copy of the block, we have the intention to
                   // modify it. Therefore the MSHR for the PrefetchExReq has
                   // been the point of ordering and this cache has commited
                   // to respond to snoops for the block.
                   //
                   // In most cases this is true anyway - a PrefetchExReq
                   // will be followed by a WriteReq. However, if that
                   // doesn't happen, the block is not marked as dirty and
                   // the cache doesn't respond to snoops that has committed
                   // to do so.
                   //
                   // To avoid deadlocks in cases where there is a snoop
                   // between the PrefetchExReq and the expected WriteReq, we
                   // proactively mark the block as Dirty.
                   assert(blk);
                   blk->setCoherenceBits(CacheBlk::DirtyBit);

                   panic_if(isReadOnly, "Prefetch exclusive requests from "
                           "read-only cache %s\n", name());
               }

               // a software prefetch would have already been ack'd
               // immediately with dummy data so the core would be able to
               // retire it. This request completes right here, so we
               // deallocate it.
               delete tgt_pkt;
               break; // skip response
           }

           // unlike the other packet flows, where data is found in other
           // caches or memory and brought back, write-line requests always
           // have the data right away, so the above check for "is fill?"
           // cannot actually be determined until examining the stored MSHR
           // state. We "catch up" with that logic here, which is duplicated
           // from above.
           if (tgt_pkt->cmd == MemCmd::WriteLineReq) {
               assert(!is_error);
               assert(blk);
               assert(blk->isSet(CacheBlk::WritableBit));
           }

           // Here we decide whether we will satisfy the target using
           // data from the block or from the response. We use the
           // block data to satisfy the request when the block is
           // present and valid and in addition the response in not
           // forwarding data to the cache above (we didn't fill
           // either); otherwise we use the packet data.
           if (blk && blk->isValid() &&
               (!mshr->isForward || !pkt->hasData())) {
               satisfyRequest(tgt_pkt, blk, true, mshr->hasPostDowngrade());

               // How many bytes past the first request is this one
               int transfer_offset =
                   tgt_pkt->getOffset(blkSize) - initial_offset;
               if (transfer_offset < 0) {
                   transfer_offset += blkSize;
               }

               // If not critical word (offset) return payloadDelay.
               // responseLatency is the latency of the return path
               // from lower level caches/memory to an upper level cache or
               // the core.
               completion_time += clockEdge(responseLatency) +
                   (transfer_offset ? pkt->payloadDelay : 0);

               assert(!tgt_pkt->req->isUncacheable());

               assert(tgt_pkt->req->requestorId() < system->maxRequestors());
               stats.cmdStats(tgt_pkt)
                   .missLatency[tgt_pkt->req->requestorId()] +=
                   completion_time - target.recvTime;
           } else if (pkt->cmd == MemCmd::UpgradeFailResp) {
               // failed StoreCond upgrade
               assert(tgt_pkt->cmd == MemCmd::StoreCondReq ||
                      tgt_pkt->cmd == MemCmd::StoreCondFailReq ||
                      tgt_pkt->cmd == MemCmd::SCUpgradeFailReq);
               // responseLatency is the latency of the return path
               // from lower level caches/memory to an upper level cache or
               // the core.
               completion_time += clockEdge(responseLatency) +
                   pkt->payloadDelay;
               tgt_pkt->req->setExtraData(0);
           } else {
               if (is_invalidate && blk && blk->isValid()) {
                   // We are about to send a response to a cache above
                   // that asked for an invalidation; we need to
                   // invalidate our copy immediately as the most
                   // up-to-date copy of the block will now be in the
                   // cache above. It will also prevent this cache from
                   // responding (if the block was previously dirty) to
                   // snoops as they should snoop the caches above where
                   // they will get the response from.
                   invalidateBlock(blk);
               }
               // not a cache fill, just forwarding response
               // responseLatency is the latency of the return path
               // from lower level cahces/memory to the core.
               completion_time += clockEdge(responseLatency) +
                   pkt->payloadDelay;
               if (!is_error) {
                   if (pkt->isRead()) {
                       // sanity check
                       assert(pkt->matchAddr(tgt_pkt));
                       assert(pkt->getSize() >= tgt_pkt->getSize());

                       tgt_pkt->setData(pkt->getConstPtr<uint8_t>());
                   } else {
                       // MSHR targets can read data either from the
                       // block or the response pkt. If we can't get data
                       // from the block (i.e., invalid or has old data)
                       // or the response (did not bring in any data)
                       // then make sure that the target didn't expect
                       // any.
                       assert(!tgt_pkt->hasRespData());
                   }
               }

               // this response did not allocate here and therefore
               // it was not consumed, make sure that any flags are
               // carried over to cache above
               tgt_pkt->copyResponderFlags(pkt);
           }
           tgt_pkt->makeTimingResponse();
           // if this packet is an error copy that to the new packet
           if (is_error)
               tgt_pkt->copyError(pkt);
           if (tgt_pkt->cmd == MemCmd::ReadResp &&
               (is_invalidate || mshr->hasPostInvalidate())) {
               // If intermediate cache got ReadRespWithInvalidate,
               // propagate that.  Response should not have
               // isInvalidate() set otherwise.
               tgt_pkt->cmd = MemCmd::ReadRespWithInvalidate;
               DPRINTF(Cache, "%s: updated cmd to %s\n", __func__,
                       tgt_pkt->print());
           }
           // Reset the bus additional time as it is now accounted for
           tgt_pkt->headerDelay = tgt_pkt->payloadDelay = 0;
           cpuSidePort.schedTimingResp(tgt_pkt, completion_time);
           break;

For the FromCPU case, there are two main conditions that we need to take care. First of all, when the blk associated with current cache block response is available, then it will invoke satisfyRequest function. However, when the blk points to nullptr, then it just copies data from the response packet to the packet selected among the targets. Regardless of the availability of the blk, it invokes schedTimingResp through the cpuSidePort to send the response packet to the upper cache or processor. Note that this response packet deliver one of targets of the resolved MSHR. At the time of exit of the loop, all packers associated with the resolved MSHR entry will be handled.

        
      
         case MSHR::Target::FromPrefetcher:
           assert(tgt_pkt->cmd == MemCmd::HardPFReq);
           if (blk)
               blk->setPrefetched();
           delete tgt_pkt;
           break;

         case MSHR::Target::FromSnoop:
           // I don't believe that a snoop can be in an error state
           assert(!is_error);
           // response to snoop request
           DPRINTF(Cache, "processing deferred snoop...\n");
           // If the response is invalidating, a snooping target can
           // be satisfied if it is also invalidating. If the reponse is, not
           // only invalidating, but more specifically an InvalidateResp and
           // the MSHR was created due to an InvalidateReq then a cache above
           // is waiting to satisfy a WriteLineReq. In this case even an
           // non-invalidating snoop is added as a target here since this is
           // the ordering point. When the InvalidateResp reaches this cache,
           // the snooping target will snoop further the cache above with the
           // WriteLineReq.
           assert(!is_invalidate || pkt->cmd == MemCmd::InvalidateResp ||
                  pkt->req->isCacheMaintenance() ||
                  mshr->hasPostInvalidate());
           handleSnoop(tgt_pkt, blk, true, true, mshr->hasPostInvalidate());
           break;

         default:
           panic("Illegal target->source enum %d\n", target.source);
       }
   }

   maintainClusivity(targets.hasFromCache, blk);

   if (blk && blk->isValid()) {
       // an invalidate response stemming from a write line request
       // should not invalidate the block, so check if the
       // invalidation should be discarded
       if (is_invalidate || mshr->hasPostInvalidate()) {
           invalidateBlock(blk);
       } else if (mshr->hasPostDowngrade()) {
           blk->clearCoherenceBits(CacheBlk::WritableBit);
       }
   }
}

## Finishing MSHR resolving (recvTimingResp)

        
      
   if (mshr->promoteDeferredTargets()) {
       // avoid later read getting stale data while write miss is
       // outstanding.. see comment in timingAccess()
       if (blk) {
           blk->clearCoherenceBits(CacheBlk::ReadableBit);
       }
       mshrQueue.markPending(mshr);
       schedMemSideSendEvent(clockEdge() + pkt->payloadDelay);
   } else {
       // while we deallocate an mshr from the queue we still have to
       // check the isFull condition before and after as we might
       // have been using the reserved entries already
       const bool was_full = mshrQueue.isFull();
       mshrQueue.deallocate(mshr);
       if (was_full && !mshrQueue.isFull()) {
           clearBlocked(Blocked_NoMSHRs);
       }

       // Request the bus for a prefetch if this deallocation freed enough
       // MSHRs for a prefetch to take place
       if (prefetcher && mshrQueue.canPrefetch() && !isBlocked()) {
           Tick next_pf_time = std::max(prefetcher->nextPrefetchReadyTime(),
                                        clockEdge());
           if (next_pf_time != MaxTick)
               schedMemSideSendEvent(next_pf_time);
       }
   }

   // if we used temp block, check to see if its valid and then clear it out
   if (blk == tempBlock && tempBlock->isValid()) {
       evictBlock(blk, writebacks);
   }

   const Tick forward_time = clockEdge(forwardLatency) + pkt->headerDelay;
   // copy writebacks to write buffer
   doWritebacks(writebacks, forward_time);

   DPRINTF(CacheVerbose, "%s: Leaving with %s\n", __func__, pkt->print());
   delete pkt;
}

After processing all targets of the currently selected MSHR entry, we should promote deferred targets or deallocate the MSHR entry. Although we finish processing the targets of the selected MSHR, there could be deferred targets for that MSHR entry. In that case, those targets should be moved to the MSHR, and the selected MSHR should not be freed. However, if there is no deferred targets, then the selected MSHR can be freed. Also, if the cache was blocked because of full of MSHR, it clear blocking. Furthermore, if possible, it generates prefetch request and send it to the memory. After the deallocation, the evicted packet should be written backs to the higher level cache or the memory. The doWritebacks function handles this write back operations. Also, when the current block is tempBlock and no cache entry has been allocated for the current response, it should evict the current block. \XXX{I don’t know why tempBlock need to be evicted here..? Cause it didn’t generate new cache block..}

This post is licensed under CC BY 4.0 by the author.