O3 Cpu Decode

Posted May 28, 2021

By Jaehyuk Lee 25 min read

Sending fetched instructions to decode stage

gem5/src/cpu/o3/fetch_impl.hh

        
      

   // Pick a random thread to start trying to grab instructions from
   auto tid_itr = activeThreads->begin();
   std::advance(tid_itr, random_mt.random<uint8_t>(0, activeThreads->size() - 1));

   while (available_insts != 0 && insts_to_decode < decodeWidth) {
       ThreadID tid = *tid_itr;
       if (!stalls[tid].decode && !fetchQueue[tid].empty()) {
           const auto& inst = fetchQueue[tid].front();
           toDecode->insts[toDecode->size++] = inst;
           DPRINTF(Fetch, "[tid:%i] [sn:%llu] Sending instruction to decode "
                   "from fetch queue. Fetch queue size: %i.\n",
                   tid, inst->seqNum, fetchQueue[tid].size());

           wroteToTimeBuffer = true;
           fetchQueue[tid].pop_front();
           insts_to_decode++;
           available_insts--;
       }

       tid_itr++;
       // Wrap around if at end of active threads list
       if (tid_itr == activeThreads->end())
           tid_itr = activeThreads->begin();
   }

   // If there was activity this cycle, inform the CPU of it.
   if (wroteToTimeBuffer) {
       DPRINTF(Activity, "Activity this cycle.\n");
       cpu->activityThisCycle();
   }

   // Reset the number of the instruction we've fetched.
   numInst = 0;
}   //end of the fetch.tick

The last job of the fetch stage is passing the fetched instructions to the next stage, decode stage. On the above code, toDecode member field of the fetch is used as an storage located in between the fetch and decode stage.

FetchStruct: passing fetch stage’s information to decode stage

gem5/src/cpu/o3/fetch.hh

        
      
   //Might be annoying how this name is different than the queue.
   /** Wire used to write any information heading to decode. */
   typename TimeBuffer<FetchStruct>::wire toDecode;

The toDecode is declared as a wire class defined in the TimeBuffer class. Also, because the TimeBuffer is a template class, it passes the FetchStruct that contains all fetch stage’s information required by the decode stage. Let’s take a look at the FetchStruct to understand which information is passed to the decode stage.

gem5/src/cpu/o3/cpu_policy.hh

        
      
template<class Impl>
struct SimpleCPUPolicy
{
 ......
   /** The struct for communication between fetch and decode. */
   typedef DefaultFetchDefaultDecode<Impl> FetchStruct;

   /** The struct for communication between decode and rename. */
   typedef DefaultDecodeDefaultRename<Impl> DecodeStruct;

   /** The struct for communication between rename and IEW. */
   typedef DefaultRenameDefaultIEW<Impl> RenameStruct;

   /** The struct for communication between IEW and commit. */
   typedef DefaultIEWDefaultCommit<Impl> IEWStruct;

   /** The struct for communication within the IEW stage. */
   typedef ::IssueStruct<Impl> IssueStruct;

   /** The struct for all backwards communication. */
   typedef TimeBufStruct<Impl> TimeStruct;

gem5/src/cpu/o3/comm.h

        
      
/** Struct that defines the information passed from fetch to decode. */
template<class Impl>
struct DefaultFetchDefaultDecode {
   typedef typename Impl::DynInstPtr DynInstPtr;

   int size;

   DynInstPtr insts[Impl::MaxWidth];
   Fault fetchFault;
   InstSeqNum fetchFaultSN;
   bool clearFetchFault;
};

As shown in the above code, it passes the instructions fetched from the Icache. Then how this information is passed to the decode stage? The answer is the TimeBuffer!

TimeBuffer and wire sending the data between two stages

In actual hardware implementation, the register should be placed in between the two pipeline stages to share the information processed by the previous stage to the next stage. For that purpose, GEM5 utilize the TimeBuffer and Wire classes.

TimeBuffer implementation and usage

TimeBuffer is implemented as a template class designed to pass any information between two different stages. Also, it emulates actual behavior of registers. Therefore, at every clock tick, the TimeBuffer is advanced and points to different content of the registers.

Constructor and Desctructor of the TimeBuffer

        
      
template <class T>
class TimeBuffer
{
 protected:
   int past;
   int future;
   unsigned size;
   int _id;

   char *data;
   std::vector<char *> index;
   unsigned base;

   void valid(int idx) const
   {
       assert (idx >= -past && idx <= future);
   }
......
 public:
   TimeBuffer(int p, int f)
       : past(p), future(f), size(past + future + 1),
         data(new char[size * sizeof(T)]), index(size), base(0)
   {   
       assert(past >= 0 && future >= 0);
       char *ptr = data; 
       for (unsigned i = 0; i < size; i++) {
           index[i] = ptr;
           std::memset(ptr, 0, sizeof(T));
           new (ptr) T;
           ptr += sizeof(T);
       }
       
       _id = -1;
   }

   TimeBuffer()
       : data(NULL)
   {
   }

   ~TimeBuffer()
   {
       for (unsigned i = 0; i < size; ++i)
           (reinterpret_cast<T *>(index[i]))->~T();
       delete [] data;
   }

Because the TimeBuffer needs to allocate and deallocate new class object at every clock cycle, it’s constructor is designed to utilize a preallocated memory called data member field. With the help of placement new, its constructor can initialize new object at specific location, index vector. As shown in its constructor, it populates T typed object, size times on the data array. It makes the index vector point to the allocated objects. At its desctructor, it deletes the data array and every objects pointed to by the index vector.

advance TimeBuffer

        
      
   //Tick each of the stages
   fetch.tick();

   decode.tick();

   rename.tick();

   iew.tick();

   commit.tick();

   // Now advance the time buffers
   timeBuffer.advance();

   fetchQueue.advance();
   decodeQueue.advance();
   renameQueue.advance();
   iewQueue.advance();

   activityRec.advance();

The most important function of the TimeBuffer is the advance. This function is invoked at every clock cycle of the processor to advance the TimeBuffer. Let’s take a look at how the advance function emulates next clock tick.

        
      
   void
   advance()
   {
       if (++base >= size)
           base = 0;

       int ptr = base + future;
       if (ptr >= (int)size)
           ptr -= size;
       (reinterpret_cast<T *>(index[ptr]))->~T();
       std::memset(index[ptr], 0, sizeof(T));
       new (index[ptr]) T;
   }

The base member field is initialized as zero at the construction and incremented at every clock cycle because the advance function is invoked at every clock cycle. Also, because it emulates circular storage, the base should be initialized as zero when it exceeds size (line 181-182). And the future is the fixed constant passed by the configuration python script. Therefore, after the first initialization with offset future, at every clock cycle, it allocates new object typed T. Before populating new object, it first invoke deconstructor (line 188) and initiate new object with the placement new (line 189).

Wire

Example motivating interaction between fetch and decode

gem5/src/cpu/o3/cpu.cc

        
      
   // Also setup each of the stages' queues.
   fetch.setFetchQueue(&fetchQueue);
   decode.setFetchQueue(&fetchQueue);

gem5/src/cpu/o3/fetch_impl.hh

        
      
template<class Impl>
void
DefaultFetch<Impl>::setFetchQueue(TimeBuffer<FetchStruct> *ftb_ptr)
{
   // Create wire to write information to proper place in fetch time buf.
   toDecode = ftb_ptr->getWire(0);
}

gem5/src/cpu/o3/decode_impl.hh

        
      
template<class Impl>
void
DefaultDecode<Impl>::setFetchQueue(TimeBuffer<FetchStruct> *fq_ptr)
{
   fetchQueue = fq_ptr;

   // Setup wire to read information from fetch queue.
   fromFetch = fetchQueue->getWire(-fetchToDecodeDelay);
}

gem5/src/cpu/timebuf.hh

        
      
   wire getWire(int idx)
   {
       valid(idx);

       return wire(this, idx);
   }

As shown in the above code, two different stages fetch and decode invoke setFetchQueue function with the same TimeBuffer, fetchQueue. However, note that those two invocations are serviced from different functions of each class. As shown in the above code, both function invokes getWire, but with different argument, 0 and -fetchToDecodeDelay respectively. The getWire function returns the wire object initialized with this and idx. Here this means the TimeBuffer itself and this will be assigned to the buffer member field of the wire object. Also, idx will be assigned to the index member field of the wire object. Because the index is a constant number and used to access the register managed by the buffer, it will generate fetchToDecodeDelay clock timing delays between the fetch and decode stage. Let’s see how this timing delay can be imposed on the register access in detail.

Wire overloads the member reference operator to access the TimeBuffer

Remember that the wire has member field buffer which is the TimeBuffer that actually maintains all the register values that should be passed to the next stage. However, in general, the register is a flip-flop it cannot be read and written at the same cycle. Therefore, naturally, the next stage will get the data written to the register after n clock cycles are elapsed. This behavior of the register is emulated by the wire and TimeBuffer.

        
      
 public:
   friend class wire;
   class wire
   {
       friend class TimeBuffer;
     protected:
       TimeBuffer<T> *buffer;
       int index;

       void set(int idx)
       {   
           buffer->valid(idx);
           index = idx;
       }

       wire(TimeBuffer<T> *buf, int i)
           : buffer(buf), index(i)
       { }
......
       T &operator*() const { return *buffer->access(index); }
       T *operator->() const { return buffer->access(index); }
   };

When the wire is accessed by the -> operator, it invokes access function of the TimeBuffer contained in the buffer member field. Also note that it passes the index argument set at the construction of the wire.

        
      
 protected:
   //Calculate the index into this->index for element at position idx
   //relative to now
   inline int calculateVectorIndex(int idx) const
   {
       //Need more complex math here to calculate index.
       valid(idx);

       int vector_index = idx + base;
       if (vector_index >= (int)size) {
           vector_index -= size;
       } else if (vector_index < 0) {
           vector_index += size;
       }

       return vector_index;
   }

 public:
   T *access(int idx)
   {
       int vector_index = calculateVectorIndex(idx);

       return reinterpret_cast<T *>(index[vector_index]);
   }

When the access is invoked, it first calculates the index for the vector. Note that it adds two variable idx and base. The base member field is increased by 1 every clock cycle as we’ve seen in the advance function before. the idx field is passed from the wire class that embeds the TimeBuffer. For example, it is 0 and -1 for the fetch and decode stage respectively. Therefore, in this settings, the decode stage will access the register set by the previous clock cycle by the fetch stage. Therefore, by setting the index field of the wire at its initialization properly, we can set the delays of register access in two different stages.

Decode stage pipeline analysis

gem5/src/cpu/o3/decode_impl.hh

tick of the decode stage

        
      
template<class Impl>
void
DefaultDecode<Impl>::tick()
{
   wroteToTimeBuffer = false;

   bool status_change = false;

   toRenameIndex = 0;

   list<ThreadID>::iterator threads = activeThreads->begin();
   list<ThreadID>::iterator end = activeThreads->end();

   sortInsts();

   //Check stall and squash signals.
   while (threads != end) {
       ThreadID tid = *threads++;

       DPRINTF(Decode,"Processing [tid:%i]\n",tid);
       status_change =  checkSignalsAndUpdate(tid) || status_change;

       decode(status_change, tid);
   }

   if (status_change) {
       updateStatus();
   }

   if (wroteToTimeBuffer) {
       DPRINTF(Activity, "Activity this cycle.\n");

       cpu->activityThisCycle();
   }
}

As we’ve seen before, the tick function of each stage is the most important function because it is executed every core clock cycle. The tick function consists of three important functions: sortInsts, checkSignalsAndUpdate and decode

sortInsts

At the end of the decode stage, it pushes the fetched instructions to the toDecode register buffers. Therefore, the decode stage should fetch those instructions from the same register located in between the fetch and decode stage.

        
      
template <class Impl>
void
DefaultDecode<Impl>::sortInsts()
{
   int insts_from_fetch = fromFetch->size;
   for (int i = 0; i < insts_from_fetch; ++i) {
       insts[fromFetch->insts[i]->threadNumber].push(fromFetch->insts[i]);
   }
}   

The sortInsts extracts the instructions stored in the register (fromFetch) and save them in the local instruction buffer (insts). Note that the register changes every tick, so each stage should copy and paste the register data to its local memory to process.

checkSignalsAndUpdate

        
      
template <class Impl>
bool
DefaultDecode<Impl>::checkSignalsAndUpdate(ThreadID tid)
{
   // Check if there's a squash signal, squash if there is.
   // Check stall signals, block if necessary.
   // If status was blocked
   //     Check if stall conditions have passed
   //         if so then go to unblocking
   // If status was Squashing
   //     check if squashing is not high.  Switch to running this cycle.
518
   // Update the per thread stall statuses.
   readStallSignals(tid);
521
   // Check squash signals from commit.
   if (fromCommit->commitInfo[tid].squash) {
524
       DPRINTF(Decode, "[tid:%i] Squashing instructions due to squash "
               "from commit.\n", tid);
527
       squash(tid);
529
       return true;
   }
532
   if (checkStall(tid)) {
       return block(tid);
   }

Before executing the decode function, it should first check whether the other stages has sent a signal to stall.

readStallSignals

        
      
template<class Impl>
void
DefaultDecode<Impl>::readStallSignals(ThreadID tid)
{
   if (fromRename->renameBlock[tid]) {
       stalls[tid].rename = true;
   }

   if (fromRename->renameUnblock[tid]) {
       assert(stalls[tid].rename);
       stalls[tid].rename = false;
   }
}

Rename stage can send two signals to the decode stage, block signal and unblock signal through the fromRename wire. Based on the signal sent from the rename stage, it sets or unset an associated entry of the member field stalls.

When stall, just block the decode and return

        
      
template<class Impl>
bool
DefaultDecode<Impl>::checkStall(ThreadID tid) const
{
   bool ret_val = false;

   if (stalls[tid].rename) {
       DPRINTF(Decode,"[tid:%i] Stall fom Rename stage detected.\n", tid);
       ret_val = true;
   }

   return ret_val;
}

When the decode stage has received the stall signal, it returns true, which results in invoking block function and returning is result.

        
      
template<class Impl>
bool
DefaultDecode<Impl>::block(ThreadID tid)
{
   DPRINTF(Decode, "[tid:%i] Blocking.\n", tid);

   // Add the current inputs to the skid buffer so they can be
   // reprocessed when this stage unblocks.
   skidInsert(tid);

   // If the decode status is blocked or unblocking then decode has not yet
   // signalled fetch to unblock. In that case, there is no need to tell
   // fetch to block.
   if (decodeStatus[tid] != Blocked) {
       // Set the status to Blocked.
       decodeStatus[tid] = Blocked;

       if (toFetch->decodeUnblock[tid]) {
           toFetch->decodeUnblock[tid] = false;
       } else {
           toFetch->decodeBlock[tid] = true;
           wroteToTimeBuffer = true;
       }

       return true;
   }

   return false;
}

When the decode stage has instruction to be processed delivered from the fetch stage, it needs to be maintained in the skid buffer so that they can be reprocessed when the decode stage is unblocked. Note that different pipelines can still works even though the decode pipeline is blocked, and the input can continuously arrive to the decode stage.

squash pipeline when the commit stage sent squash signal

After reading the stall signal, it should also check whether the commit stage has sent a squash signal. The decode stage can check whether it needs to squash by checking the fromCommit wire.

        
      
template<class Impl>
void
DefaultDecode<Impl>::squash(const DynInstPtr &inst, ThreadID tid)
{
   DPRINTF(Decode, "[tid:%i] [sn:%llu] Squashing due to incorrect branch "
           "prediction detected at decode.\n", tid, inst->seqNum);
310
   // Send back mispredict information.
   toFetch->decodeInfo[tid].branchMispredict = true;
   toFetch->decodeInfo[tid].predIncorrect = true;
   toFetch->decodeInfo[tid].mispredictInst = inst;
   toFetch->decodeInfo[tid].squash = true;
   toFetch->decodeInfo[tid].doneSeqNum = inst->seqNum;
   toFetch->decodeInfo[tid].nextPC = inst->branchTarget();
   toFetch->decodeInfo[tid].branchTaken = inst->pcState().branching();
   toFetch->decodeInfo[tid].squashInst = inst;
   if (toFetch->decodeInfo[tid].mispredictInst->isUncondCtrl()) {
           toFetch->decodeInfo[tid].branchTaken = true;
   }
323
   InstSeqNum squash_seq_num = inst->seqNum;
325
   // Might have to tell fetch to unblock.
   if (decodeStatus[tid] == Blocked ||
       decodeStatus[tid] == Unblocking) {
       toFetch->decodeUnblock[tid] = 1;
   }
331
   // Set status to squashing.
   decodeStatus[tid] = Squashing;
334
   for (int i=0; i<fromFetch->size; i++) {
       if (fromFetch->insts[i]->threadNumber == tid &&
           fromFetch->insts[i]->seqNum > squash_seq_num) {
           fromFetch->insts[i]->setSquashed();
       }
   }
341
   // Clear the instruction list and skid buffer in case they have any
   // insts in them.
   while (!insts[tid].empty()) {
       insts[tid].pop();
   }
347
   while (!skidBuffer[tid].empty()) {
       skidBuffer[tid].pop();
   }
351
   // Squash instructions up until this one
   cpu->removeInstsUntil(squash_seq_num, tid);
}

Note that squash signal incurs complex operations compared to stalls. When the stall signal is received, the decode stage just waits until the stall signal is removed, receiving the unblock signal. However, when the stall signal is received, it should clear out the pipeline and associated data structures.

When the decode stage finishes blocking and squashing operation

        
      
bool
DefaultDecode<Impl>::checkSignalsAndUpdate(ThreadID tid)
{
......
   if (decodeStatus[tid] == Blocked) {
       DPRINTF(Decode, "[tid:%i] Done blocking, switching to unblocking.\n",
               tid);
540
       decodeStatus[tid] = Unblocking;
542
       unblock(tid);
544
       return true;
   }
547
   if (decodeStatus[tid] == Squashing) {
       // Switch status to running if decode isn't being told to block or
       // squash this cycle.
       DPRINTF(Decode, "[tid:%i] Done squashing, switching to running.\n",
               tid);
553
       decodeStatus[tid] = Running;
555
       return false;
   }
558
   // If we've reached this point, we have not gotten any signals that
   // cause decode to change its status.  Decode remains the same as before.
   return false;
}

After the decode stage is recovered from the stall or squashing. it needs to change the block or stall state to the Running state so that it can receive the instructions to decode from the fetch stage. For the Blocked state, it will execute the line 537-546 And when the squash signal is turned off from commit stage, it will execute the rest of the code (548-557).

Why we need another decode even though we decoded?

It would be confusing because we already finished instruction decoding in the fetch stage. We already know which instructions are located in the fetch buffer. Why we need another decode function? The decode stage does not do much, but it should check any PC-relative branches are correct. Most of the decode operations are actually done by the decodeInsts function.

600 template 601 void 602 DefaultDecode::decode(bool &status_change, ThreadID tid) 603 { 604 // If status is Running or idle, 605 // call decodeInsts() 606 // If status is Unblocking, 607 // buffer any instructions coming from fetch 608 // continue trying to empty skid buffer 609 // check if stall conditions have passed 610 611 if (decodeStatus[tid] == Blocked) { 612 ++decodeBlockedCycles; 613 } else if (decodeStatus[tid] == Squashing) { 614 ++decodeSquashCycles; 615 } 616 617 // Decode should try to decode as many instructions as its bandwidth 618 // will allow, as long as it is not currently blocked. 619 if (decodeStatus[tid] == Running || 620 decodeStatus[tid] == Idle) { 621 DPRINTF(Decode, "[tid:%i] Not blocked, so attempting to run " 622 "stage.\n",tid); 623 624 decodeInsts(tid); 625 } else if (decodeStatus[tid] == Unblocking) { 626 // Make sure that the skid buffer has something in it if the 627 // status is unblocking. 628 assert(!skidsEmpty()); 629 630 // If the status was unblocking, then instructions from the skid 631 // buffer were used. Remove those instructions and handle 632 // the rest of unblocking. 633 decodeInsts(tid); 634 635 if (fetchInstsValid()) { 636 // Add the current inputs to the skid buffer so they can be 637 // reprocessed when this stage unblocks. 638 skidInsert(tid); 639 } 640 641 status_change = unblock(tid) || status_change; 642 } 643 }

decode stage check buffers to retrieve instruction to decode

        
      
template <class Impl>
void
DefaultDecode<Impl>::decodeInsts(ThreadID tid)
{
   // Instructions can come either from the skid buffer or the list of
   // instructions coming from fetch, depending on decode's status.
   int insts_available = decodeStatus[tid] == Unblocking ?
       skidBuffer[tid].size() : insts[tid].size();

   if (insts_available == 0) {
       DPRINTF(Decode, "[tid:%i] Nothing to do, breaking out"
               " early.\n",tid);
       // Should I change the status to idle?
       ++decodeIdleCycles;
       return;
   } else if (decodeStatus[tid] == Unblocking) {
       DPRINTF(Decode, "[tid:%i] Unblocking, removing insts from skid "
               "buffer.\n",tid);
       ++decodeUnblockCycles;
   } else if (decodeStatus[tid] == Running) {
       ++decodeRunCycles;
   }

   std::queue<DynInstPtr>
       &insts_to_decode = decodeStatus[tid] == Unblocking ?
       skidBuffer[tid] : insts[tid];

   DPRINTF(Decode, "[tid:%i] Sending instruction to rename.\n",tid);

Note that the decodeInsts can be invoked in two different state of the decode stage. The Running and Unblocking. Running status means that decode stage continuously receive the packet from the fetch stage. However, the Unblocking stage means that it was blocked and was recovering, which means the packets are still in the skidBuffer. Therefore, it should decode instructions stacked in the skidBuffer while it has been blocked.

forwarding decoded instructions to rename stage

        
      
template<class Impl>
void
DefaultDecode<Impl>::setDecodeQueue(TimeBuffer<DecodeStruct> *dq_ptr)
{
   decodeQueue = dq_ptr;

   // Setup wire to write information to proper place in decode queue.
   toRename = decodeQueue->getWire(0);
}

Similar to the toDecode wire in the fetch stage, decode stage needs a wire to send the decoded instructions to another register connected with the rename stage. For that purpose, it declares toRename wire.

        
      
   while (insts_available > 0 && toRenameIndex < decodeWidth) {
       assert(!insts_to_decode.empty());

       DynInstPtr inst = std::move(insts_to_decode.front());

       insts_to_decode.pop();

       DPRINTF(Decode, "[tid:%i] Processing instruction [sn:%lli] with "
               "PC %s\n", tid, inst->seqNum, inst->pcState());

       if (inst->isSquashed()) {
           DPRINTF(Decode, "[tid:%i] Instruction %i with PC %s is "
                   "squashed, skipping.\n",
                   tid, inst->seqNum, inst->pcState());
           
           ++decodeSquashedInsts;
           
           --insts_available;
           
           continue;
       }

       // Also check if instructions have no source registers.  Mark
       // them as ready to issue at any time.  Not sure if this check
       // should exist here or at a later stage; however it doesn't matter
       // too much for function correctness.
       if (inst->numSrcRegs() == 0) {
           inst->setCanIssue();
       }

       // This current instruction is valid, so add it into the decode
       // queue.  The next instruction may not be valid, so check to
       // see if branches were predicted correctly.
       toRename->insts[toRenameIndex] = inst;

       ++(toRename->size);
       ++toRenameIndex;
       ++decodeDecodedInsts;
       --insts_available;

The while loop selects one instruction from the buffer and sends it to the rename stage through the toRename wire.

        
      
       // Ensure that if it was predicted as a branch, it really is a
       // branch.
       if (inst->readPredTaken() && !inst->isControl()) {
           panic("Instruction predicted as a branch!");

           ++decodeControlMispred;

           // Might want to set some sort of boolean and just do
           // a check at the end
           squash(inst, inst->threadNumber);

           break;
       }

       // Go ahead and compute any PC-relative branches.
       // This includes direct unconditional control and
       // direct conditional control that is predicted taken.
       if (inst->isDirectCtrl() &&
          (inst->isUncondCtrl() || inst->readPredTaken()))
       {
           ++decodeBranchResolved;

           if (!(inst->branchTarget() == inst->readPredTarg())) {
               ++decodeBranchMispred;

               // Might want to set some sort of boolean and just do
               // a check at the end
               squash(inst, inst->threadNumber);
               TheISA::PCState target = inst->branchTarget();

               DPRINTF(Decode,
                       "[tid:%i] [sn:%llu] "
                       "Updating predictions: PredPC: %s\n",
                       tid, inst->seqNum, target);
               //The micro pc after an instruction level branch should be 0
               inst->setPredTarg(target);
               break;
           }
       }
   } //end of the while loop

One thing to note is it really decodes the instruction and check whether the current instruction is really branch. If it was predicted as a branch, but turned out to be a non-branch instruction, then it should squash the current instruction.

        
      
   // If we didn't process all instructions, then we will need to block
   // and put all those instructions into the skid buffer.
   if (!insts_to_decode.empty()) {
       block(tid);
   }

   // Record that decode has written to the time buffer for activity
   // tracking.
   if (toRenameIndex) {
       wroteToTimeBuffer = true;
   }

GEM5, Pipeline, O3

This post is licensed under CC BY 4.0 by the author.