O3 Cpu Iew
IEW: Issue/Execute/Writeback
GEM5 handles both execute and writeback when the execute() function is called on an instruction. Therefore, GEM5 combines Issue, Execute, and Writeback stage into one stage called IEW. This stage (IEW) handles dispatching instructions to the instruction queue, telling the instruction queue to issue instruction, and executing and writing back instructions.
Nice description about the IEW stage provided by the GEM5 Documentation. Also, this documentation provide which functions are mainly designed to achieve those three operations.
1
2
3
4
5
Rename::tick()->Rename::RenameInsts()
IEW::tick()->IEW::dispatchInsts()
IEW::tick()->InstructionQueue::scheduleReadyInsts()
IEW::tick()->IEW::executeInsts()
IEW::tick()->IEW::writebackInsts()
In this posting, I will explain dispatch, schedule, execute, and write back in details. The commit stage will be studied in the other posting. The tick function of the iew stage is the main body of execution as other stages. Therefore, I will explain each part of the iew stage following the tick implementation. The dispatch function tries to dispatch renamed instructions to the LSQ/IQ (Note that already the rename stage checked availability of the LSQ and IQ) and actually issues instructions every cycle. The execute latency is actually tied to the issue latency to allow the IQ to be able to do back-to-back scheduling without having to speculatively schedule instructions. The IEW separates memory instructions from non-memory instructions. (issuing the instruction to different queues, LSQ or IQ) The writeback portion of IEW completes the instructions, wakes up any dependents, and marks the register as ready on the scoreboard. With those information, IQ can tell which instructions can be woke up and to be issued.
Dispatch
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
1502 template<class Impl>
1503 void
1504 DefaultIEW<Impl>::tick()
1505 {
1506 wbNumInst = 0;
1507 wbCycle = 0;
1508
1509 wroteToTimeBuffer = false;
1510 updatedQueues = false;
1511
1512 ldstQueue.tick();
1513
1514 sortInsts();
1515
1516 // Free function units marked as being freed this cycle.
1517 fuPool->processFreeUnits();
1518
1519 list<ThreadID>::iterator threads = activeThreads->begin();
1520 list<ThreadID>::iterator end = activeThreads->end();
1521
1522 // Check stall and squash signals, dispatch any instructions.
1523 while (threads != end) {
1524 ThreadID tid = *threads++;
1525
1526 DPRINTF(IEW,"Issue: Processing [tid:%i]\n",tid);
1527
1528 checkSignalsAndUpdate(tid);
1529 dispatch(tid);
1530 }
As shown in the tick function, after checking signal such as block and squash, the first job done by the IEW is dispatching the renamed instructions. The main goal of the dispatch is inserting the renamed instruction into the IQ and LSQ based on the instruction’s type.
Dispatch implementation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
911 template<class Impl>
912 void
913 DefaultIEW<Impl>::dispatch(ThreadID tid)
914 {
915 // If status is Running or idle,
916 // call dispatchInsts()
917 // If status is Unblocking,
918 // buffer any instructions coming from rename
919 // continue trying to empty skid buffer
920 // check if stall conditions have passed
921
922 if (dispatchStatus[tid] == Blocked) {
923 ++iewBlockCycles;
924
925 } else if (dispatchStatus[tid] == Squashing) {
926 ++iewSquashCycles;
927 }
928
929 // Dispatch should try to dispatch as many instructions as its bandwidth
930 // will allow, as long as it is not currently blocked.
931 if (dispatchStatus[tid] == Running ||
932 dispatchStatus[tid] == Idle) {
933 DPRINTF(IEW, "[tid:%i] Not blocked, so attempting to run "
934 "dispatch.\n", tid);
935
936 dispatchInsts(tid);
937 } else if (dispatchStatus[tid] == Unblocking) {
938 // Make sure that the skid buffer has something in it if the
939 // status is unblocking.
940 assert(!skidsEmpty());
941
942 // If the status was unblocking, then instructions from the skid
943 // buffer were used. Remove those instructions and handle
944 // the rest of unblocking.
945 dispatchInsts(tid);
946
947 ++iewUnblockCycles;
948
949 if (validInstsFromRename()) {
950 // Add the current inputs to the skid buffer so they can be
951 // reprocessed when this stage unblocks.
952 skidInsert(tid);
953 }
954
955 unblock(tid);
956 }
957 }
The dispatch function is just a wrapper function of the dispatchInsts. Based on the current status of the dispatch stage, associated operations should be executed in addition to the main dispatch function, dispatchInsts. Because the dispatchInsts is fairly complex, I will explain one by one.
Checking availability of resources to dispatch instruction
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
959 template <class Impl>
960 void
961 DefaultIEW<Impl>::dispatchInsts(ThreadID tid)
962 {
963 // Obtain instructions from skid buffer if unblocking, or queue from rename
964 // otherwise.
965 std::queue<DynInstPtr> &insts_to_dispatch =
966 dispatchStatus[tid] == Unblocking ?
967 skidBuffer[tid] : insts[tid];
968
969 int insts_to_add = insts_to_dispatch.size();
970
971 DynInstPtr inst;
972 bool add_to_iq = false;
973 int dis_num_inst = 0;
974
975 // Loop through the instructions, putting them in the instruction
976 // queue.
977 for ( ; dis_num_inst < insts_to_add &&
978 dis_num_inst < dispatchWidth;
979 ++dis_num_inst)
980 {
981 inst = insts_to_dispatch.front();
982
983 if (dispatchStatus[tid] == Unblocking) {
984 DPRINTF(IEW, "[tid:%i] Issue: Examining instruction from skid "
985 "buffer\n", tid);
986 }
987
988 // Make sure there's a valid instruction there.
989 assert(inst);
990
991 DPRINTF(IEW, "[tid:%i] Issue: Adding PC %s [sn:%lli] [tid:%i] to "
992 "IQ.\n",
993 tid, inst->pcState(), inst->seqNum, inst->threadNumber);
994
995 // Be sure to mark these instructions as ready so that the
996 // commit stage can go ahead and execute them, and mark
997 // them as issued so the IQ doesn't reprocess them.
998
999 // Check for squashed instructions.
1000 if (inst->isSquashed()) {
1001 DPRINTF(IEW, "[tid:%i] Issue: Squashed instruction encountered, "
1002 "not adding to IQ.\n", tid);
1003
1004 ++iewDispSquashedInsts;
1005
1006 insts_to_dispatch.pop();
1007
1008 //Tell Rename That An Instruction has been processed
1009 if (inst->isLoad()) {
1010 toRename->iewInfo[tid].dispatchedToLQ++;
1011 }
1012 if (inst->isStore() || inst->isAtomic()) {
1013 toRename->iewInfo[tid].dispatchedToSQ++;
1014 }
1015
1016 toRename->iewInfo[tid].dispatched++;
1017
1018 continue;
1019 }
1020
1021 // Check for full conditions.
1022 if (instQueue.isFull(tid)) {
1023 DPRINTF(IEW, "[tid:%i] Issue: IQ has become full.\n", tid);
1024
1025 // Call function to start blocking.
1026 block(tid);
1027
1028 // Set unblock to false. Special case where we are using
1029 // skidbuffer (unblocking) instructions but then we still
1030 // get full in the IQ.
1031 toRename->iewUnblock[tid] = false;
1032
1033 ++iewIQFullEvents;
1034 break;
1035 }
1036
1037 // Check LSQ if inst is LD/ST
1038 if ((inst->isAtomic() && ldstQueue.sqFull(tid)) ||
1039 (inst->isLoad() && ldstQueue.lqFull(tid)) ||
1040 (inst->isStore() && ldstQueue.sqFull(tid))) {
1041 DPRINTF(IEW, "[tid:%i] Issue: %s has become full.\n",tid,
1042 inst->isLoad() ? "LQ" : "SQ");
1043
1044 // Call function to start blocking.
1045 block(tid);
1046
1047 // Set unblock to false. Special case where we are using
1048 // skidbuffer (unblocking) instructions but then we still
1049 // get full in the IQ.
1050 toRename->iewUnblock[tid] = false;
1051
1052 ++iewLSQFullEvents;
1053 break;
1054 }
First, it checks whether the current instruction has been already squashed. If yes, then ignore the current instruction and jump to the next ones. If the instructions is not squashed, it checks the availability of resource required for issuing the instruction. Regardless of the instruction type, it requires one entry from the instruction queue. Also, if it is the memory related instruction, it require one entry from the load queue or store queue based on whether it is load or store instruction.
Checking instruction type
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
1056 // Otherwise issue the instruction just fine.
1057 if (inst->isAtomic()) {
1058 DPRINTF(IEW, "[tid:%i] Issue: Memory instruction "
1059 "encountered, adding to LSQ.\n", tid);
1060
1061 ldstQueue.insertStore(inst);
1062
1063 ++iewDispStoreInsts;
1064
1065 // AMOs need to be set as "canCommit()"
1066 // so that commit can process them when they reach the
1067 // head of commit.
1068 inst->setCanCommit();
1069 instQueue.insertNonSpec(inst);
1070 add_to_iq = false;
1071
1072 ++iewDispNonSpecInsts;
1073
1074 toRename->iewInfo[tid].dispatchedToSQ++;
1075 } else if (inst->isLoad()) {
1076 DPRINTF(IEW, "[tid:%i] Issue: Memory instruction "
1077 "encountered, adding to LSQ.\n", tid);
1078
1079 // Reserve a spot in the load store queue for this
1080 // memory access.
1081 ldstQueue.insertLoad(inst);
1082
1083 ++iewDispLoadInsts;
1084
1085 add_to_iq = true;
1086
1087 toRename->iewInfo[tid].dispatchedToLQ++;
1088 } else if (inst->isStore()) {
1089 DPRINTF(IEW, "[tid:%i] Issue: Memory instruction "
1090 "encountered, adding to LSQ.\n", tid);
1091
1092 ldstQueue.insertStore(inst);
1093
1094 ++iewDispStoreInsts;
1095
1096 if (inst->isStoreConditional()) {
1097 // Store conditionals need to be set as "canCommit()"
1098 // so that commit can process them when they reach the
1099 // head of commit.
1100 // @todo: This is somewhat specific to Alpha.
1101 inst->setCanCommit();
1102 instQueue.insertNonSpec(inst);
1103 add_to_iq = false;
1104
1105 ++iewDispNonSpecInsts;
1106 } else {
1107 add_to_iq = true;
1108 }
1109
1110 toRename->iewInfo[tid].dispatchedToSQ++;
1111 } else if (inst->isMemBarrier() || inst->isWriteBarrier()) {
1112 // Same as non-speculative stores.
1113 inst->setCanCommit();
1114 instQueue.insertBarrier(inst);
1115 add_to_iq = false;
1116 } else if (inst->isNop()) {
1117 DPRINTF(IEW, "[tid:%i] Issue: Nop instruction encountered, "
1118 "skipping.\n", tid);
1119
1120 inst->setIssued();
1121 inst->setExecuted();
1122 inst->setCanCommit();
1123
1124 instQueue.recordProducer(inst);
1125
1126 iewExecutedNop[tid]++;
1127
1128 add_to_iq = false;
1129 } else {
1130 assert(!inst->isExecuted());
1131 add_to_iq = true;
1132 }
Although it is not clear until we understand the internal of the instQueue and ldstQueue, but the above code pushes the instructions based on the instruction type. For example, for the load operation, it pushes the instruction to the ldstQueue with insertLoad function. For the write operation, it is inserted to the same queue through the insertStore function. For the normal instructions they will be just enqueued to the instQueue.
Issuing instruction
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
1134 if (add_to_iq && inst->isNonSpeculative()) {
1135 DPRINTF(IEW, "[tid:%i] Issue: Nonspeculative instruction "
1136 "encountered, skipping.\n", tid);
1137
1138 // Same as non-speculative stores.
1139 inst->setCanCommit();
1140
1141 // Specifically insert it as nonspeculative.
1142 instQueue.insertNonSpec(inst);
1143
1144 ++iewDispNonSpecInsts;
1145
1146 add_to_iq = false;
1147 }
1148
1149 // If the instruction queue is not full, then add the
1150 // instruction.
1151 if (add_to_iq) {
1152 instQueue.insert(inst);
1153 }
1154
1155 insts_to_dispatch.pop();
1156
1157 toRename->iewInfo[tid].dispatched++;
1158
1159 ++iewDispatchedInsts;
1160
1161 #if TRACING_ON
1162 inst->dispatchTick = curTick() - inst->fetchTick;
1163 #endif
1164 ppDispatch->notify(inst);
1165 }
After each instructions are handled by inserting them to the corresponding queues with the associated method provided by the queues, some of them should also be inserted to the instruction queue. Note that add_to_iq flag is set based on the instruction type, When this flag is set, the instruction should be added to the instQueue (line 1151-1153).
End of the dispatching
1
2
3
4
5
6
7
8
9
10
11
12
13
14
1167 if (!insts_to_dispatch.empty()) {
1168 DPRINTF(IEW,"[tid:%i] Issue: Bandwidth Full. Blocking.\n", tid);
1169 block(tid);
1170 toRename->iewUnblock[tid] = false;
1171 }
1172
1173 if (dispatchStatus[tid] == Idle && dis_num_inst) {
1174 dispatchStatus[tid] = Running;
1175
1176 updatedQueues = true;
1177 }
1178
1179 dis_num_inst = 0;
1180 }
After dispatching all renamed instructions, it should check whether it still has some instructions in the queue. When the instruction cannot be processed further because of throttling, it should block and handle rest of the instructions at the next cycle.
Instruction Queue and Load/Store queue
Before moving on to the next stage, I’d like to cover some part of the IQ and LSQ.
Instruction queue has several lists to keep issued instructions
Mainly the job of the queue is managing instructions and providing some interfaces to process the enqueued instructions.
gem5/src/cpu/o3/inst_queue.hh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
311 //////////////////////////////////////
312 // Instruction lists, ready queues, and ordering
313 //////////////////////////////////////
314
315 /** List of all the instructions in the IQ (some of which may be issued). */
316 std::list<DynInstPtr> instList[Impl::MaxThreads];
317
318 /** List of instructions that are ready to be executed. */
319 std::list<DynInstPtr> instsToExecute;
320
321 /** List of instructions waiting for their DTB translation to
322 * complete (hw page table walk in progress).
323 */
324 std::list<DynInstPtr> deferredMemInsts;
325
326 /** List of instructions that have been cache blocked. */
327 std::list<DynInstPtr> blockedMemInsts;
328
329 /** List of instructions that were cache blocked, but a retry has been seen
330 * since, so they can now be retried. May fail again go on the blocked list.
331 */
332 std::list<DynInstPtr> retryMemInsts;
Insert new entries to the instruction queue
The insert function is the essential example of the interface. It inserts new entries to the instruction list managed by the instruction queue.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
578 template <class Impl>
579 void
580 InstructionQueue<Impl>::insert(const DynInstPtr &new_inst)
581 {
582 if (new_inst->isFloating()) {
583 fpInstQueueWrites++;
584 } else if (new_inst->isVector()) {
585 vecInstQueueWrites++;
586 } else {
587 intInstQueueWrites++;
588 }
589 // Make sure the instruction is valid
590 assert(new_inst);
591
592 DPRINTF(IQ, "Adding instruction [sn:%llu] PC %s to the IQ.\n",
593 new_inst->seqNum, new_inst->pcState());
594
595 assert(freeEntries != 0);
596
597 instList[new_inst->threadNumber].push_back(new_inst);
598
599 --freeEntries;
600
601 new_inst->setInIQ();
602
603 // Look through its source registers (physical regs), and mark any
604 // dependencies.
605 addToDependents(new_inst);
606
607 // Have this instruction set itself as the producer of its destination
608 // register(s).
609 addToProducers(new_inst);
610
611 if (new_inst->isMemRef()) {
612 memDepUnit[new_inst->threadNumber].insert(new_inst);
613 } else {
614 addIfReady(new_inst);
615 }
616
617 ++iqInstsAdded;
618
619 count[new_inst->threadNumber]++;
620
621 assert(freeEntries == (numEntries - countInsts()));
622 }
Inserting the instruction to the list is done by simple push_back operation of the list. However, it invokes two important functions: addToProducers and addToDependents. These two functions generates producer and consumer dependency among instructions’s operands, registers. When one instruction waits until the specific register’s value become ready (consumer), it should be tracked by some hardware component. Also, when the data becomes ready as a result of execution of one instruction (producer), it should be forwarded to the consumers waiting for the value. For that purpose, GEM5 utilize the DependencyGraph. After producing dependency for the unavailable registers, if the instruction references memory while its execution, it should be specially handled by the memory dependency unit. The details will be explained together with the DependencyGraph later.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
1450 template <class Impl>
1451 void
1452 InstructionQueue<Impl>::addIfReady(const DynInstPtr &inst)
1453 {
1454 // If the instruction now has all of its source registers
1455 // available, then add it to the list of ready instructions.
1456 if (inst->readyToIssue()) {
1457
1458 //Add the instruction to the proper ready list.
1459 if (inst->isMemRef()) {
1460
1461 DPRINTF(IQ, "Checking if memory instruction can issue.\n");
1462
1463 // Message to the mem dependence unit that this instruction has
1464 // its registers ready.
1465 memDepUnit[inst->threadNumber].regsReady(inst);
1466
1467 return;
1468 }
1469
1470 OpClass op_class = inst->opClass();
1471
1472 DPRINTF(IQ, "Instruction is ready to issue, putting it onto "
1473 "the ready list, PC %s opclass:%i [sn:%llu].\n",
1474 inst->pcState(), op_class, inst->seqNum);
1475
1476 readyInsts[op_class].push(inst);
1477
1478 // Will need to reorder the list if either a queue is not on the list,
1479 // or it has an older instruction than last time.
1480 if (!queueOnList[op_class]) {
1481 addToOrderList(op_class);
1482 } else if (readyInsts[op_class].top()->seqNum <
1483 (*readyIt[op_class]).oldestInst) {
1484 listOrder.erase(readyIt[op_class]);
1485 addToOrderList(op_class);
1486 }
1487 }
1488 }
At the end of the insert function, it adds instruction to the readyInsts buffer if all the registers are available (line 1476). If the instruction is not ready, which means the source registers are not available, the instruction should not be inqueued to the readyInsts buffer. The instructions waiting for the source register to become available will be added to the readyInsts buffer when other dependent instructions complete.
Execute
To understand what should be done after dispatching the instructions, let’s go back to the tick function of the iew stage.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
1532 if (exeStatus != Squashing) {
1533 executeInsts();
1534
1535 writebackInsts();
1536
1537 // Have the instruction queue try to schedule any ready instructions.
1538 // (In actuality, this scheduling is for instructions that will
1539 // be executed next cycle.)
1540 instQueue.scheduleReadyInsts();
1541
1542 // Also should advance its own time buffers if the stage ran.
1543 // Not the best place for it, but this works (hopefully).
1544 issueToExecQueue.advance();
1545 }
If the execution stage is not in the squashing state, it will execute instructions stored in the instQueue, particularly readyInsts queue. Here execute() function of the compute instruction is invoked and sent to commit. Please note execute() will write results to the destination registers. Therefore, after executeInsts is invoked, writebackInsts is called to write the result to destination registers. Furthermore, when there are dependent instructions to the currently executed one, those instructions will be added to the ready list for scheduling.
executeInsts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
1205 template <class Impl>
1206 void
1207 DefaultIEW<Impl>::executeInsts()
1208 {
1209 wbNumInst = 0;
1210 wbCycle = 0;
1211
1212 list<ThreadID>::iterator threads = activeThreads->begin();
1213 list<ThreadID>::iterator end = activeThreads->end();
1214
1215 while (threads != end) {
1216 ThreadID tid = *threads++;
1217 fetchRedirect[tid] = false;
1218 }
1219
1220 // Uncomment this if you want to see all available instructions.
1221 // @todo This doesn't actually work anymore, we should fix it.
1222 // printAvailableInsts();
1223
1224 // Execute/writeback any instructions that are available.
1225 int insts_to_execute = fromIssue->size;
1226 int inst_num = 0;
1227 for (; inst_num < insts_to_execute;
1228 ++inst_num) {
1229
1230 DPRINTF(IEW, "Execute: Executing instructions from IQ.\n");
1231
1232 DynInstPtr inst = instQueue.getInstToExecute();
1233
1234 DPRINTF(IEW, "Execute: Processing PC %s, [tid:%i] [sn:%llu].\n",
1235 inst->pcState(), inst->threadNumber,inst->seqNum);
1236
1237 // Notify potential listeners that this instruction has started
1238 // executing
1239 ppExecute->notify(inst);
1240
1241 // Check if the instruction is squashed; if so then skip it
1242 if (inst->isSquashed()) {
1243 DPRINTF(IEW, "Execute: Instruction was squashed. PC: %s, [tid:%i]"
1244 " [sn:%llu]\n", inst->pcState(), inst->threadNumber,
1245 inst->seqNum);
1246
1247 // Consider this instruction executed so that commit can go
1248 // ahead and retire the instruction.
1249 inst->setExecuted();
1250
1251 // Not sure if I should set this here or just let commit try to
1252 // commit any squashed instructions. I like the latter a bit more.
1253 inst->setCanCommit();
1254
1255 ++iewExecSquashedInsts;
1256
1257 continue;
1258 }
The executeInsts function execute an many instruction as it can afford, which is implemented as the loop in the line 1227 and after. First it retrieves instruction that can be executed by invoking getInstToExecute function of the instQueue. After one instruction is retrieved, it checks if the instruction should be squashed. Although the squashed instructions are not really executed, but it should be treated as executed because it should be committed. After this condition is checked, depending on the type of the instruction, it will process the instruction separately.
execute memory instruction
1259 1260 Fault fault = NoFault; 1261 1262 // Execute instruction. 1263 // Note that if the instruction faults, it will be handled 1264 // at the commit stage. 1265 if (inst->isMemRef()) { 1266 DPRINTF(IEW, “Execute: Calculating address for memory “ 1267 “reference.\n”); 1268 1269 // Tell the LDSTQ to execute this instruction (if it is a load). 1270 if (inst->isAtomic()) { 1271 // AMOs are treated like store requests 1272 fault = ldstQueue.executeStore(inst); 1273 1274 if (inst->isTranslationDelayed() && 1275 fault == NoFault) { 1276 // A hw page table walk is currently going on; the 1277 // instruction must be deferred. 1278 DPRINTF(IEW, “Execute: Delayed translation, deferring “ 1279 “store.\n”); 1280 instQueue.deferMemInst(inst); 1281 continue; 1282 } 1283 } else if (inst->isLoad()) { 1284 // Loads will mark themselves as executed, and their writeback 1285 // event adds the instruction to the queue to commit 1286 fault = ldstQueue.executeLoad(inst); 1287 1288 if (inst->isTranslationDelayed() && 1289 fault == NoFault) { 1290 // A hw page table walk is currently going on; the 1291 // instruction must be deferred. 1292 DPRINTF(IEW, “Execute: Delayed translation, deferring “ 1293 “load.\n”); 1294 instQueue.deferMemInst(inst); 1295 continue; 1296 } 1297 1298 if (inst->isDataPrefetch() || inst->isInstPrefetch()) { 1299 inst->fault = NoFault; 1300 } 1301 } else if (inst->isStore()) { 1302 fault = ldstQueue.executeStore(inst); 1303 1304 if (inst->isTranslationDelayed() && 1305 fault == NoFault) { 1306 // A hw page table walk is currently going on; the 1307 // instruction must be deferred. 1308 DPRINTF(IEW, “Execute: Delayed translation, deferring “ 1309 “store.\n”); 1310 instQueue.deferMemInst(inst); 1311 continue; 1312 } 1313 1314 // If the store had a fault then it may not have a mem req 1315 if (fault != NoFault || !inst->readPredicate() || 1316 !inst->isStoreConditional()) { 1317 // If the instruction faulted, then we need to send it along 1318 // to commit without the instruction completing. 1319 // Send this instruction to commit, also make sure iew stage 1320 // realizes there is activity. 1321 inst->setExecuted(); 1322 instToCommit(inst); 1323 activityThisCycle(); 1324 } 1325 1326 // Store conditionals will mark themselves as 1327 // executed, and their writeback event will add the 1328 // instruction to the queue to commit. 1329 } else { 1330 panic(“Unexpected memory type!\n”); 1331 } 1332 1333 } else {
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
For the memory operation, it can be one of three instruction type:
atomic, load, store.
Basically, the loadstore queue in charge of executing memory instructions,
but based on the type of memory operation, it needs to handle
instruction differently.
Let's take a look at how the load and store instruction will be processed.
### Execute load instruction
```cpp
1283 } else if (inst->isLoad()) {
1284 // Loads will mark themselves as executed, and their writeback
1285 // event adds the instruction to the queue to commit
1286 fault = ldstQueue.executeLoad(inst);
1287
1288 if (inst->isTranslationDelayed() &&
1289 fault == NoFault) {
1290 // A hw page table walk is currently going on; the
1291 // instruction must be deferred.
1292 DPRINTF(IEW, "Execute: Delayed translation, deferring "
1293 "load.\n");
1294 instQueue.deferMemInst(inst);
1295 continue;
1296 }
1297
1298 if (inst->isDataPrefetch() || inst->isInstPrefetch()) {
1299 inst->fault = NoFault;
1300 }
The main execution of the load instruction is done by the executeLoad function of the ldstQueue. After the execution, it needs to check whether the translation is the bottleneck of making progress on the load operation. Note that when the virtual to physical address resolution is delayed because of long TLB latency, it should be executed at the next or later clock cycle when the TLB is ready. Therefore, when the instruction cannot be executed at this moment, it should set the current load instruction is deferred (deferMemInst). Also, when the load operation was just prefetch, then any fault generated by this operation should be ignored (line 1298-1299). Let’s take our important function executeLoad in detail!
gem5/src/o3/cpu/lsq_impl.hh
1
2
3
4
5
6
7
8
251 template<class Impl>
252 Fault
253 LSQ<Impl>::executeLoad(const DynInstPtr &inst)
254 {
255 ThreadID tid = inst->threadNumber;
256
257 return thread[tid].executeLoad(inst);
258 }
gem5/src/o3/cpu/lsq.hh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
63 template <class Impl>
64 class LSQ
65
66 {
......
1104 /** Total Size of LQ Entries. */
1105 unsigned LQEntries;
1106 /** Total Size of SQ Entries. */
1107 unsigned SQEntries;
1108
1109 /** Max LQ Size - Used to Enforce Sharing Policies. */
1110 unsigned maxLQEntries;
1111
1112 /** Max SQ Size - Used to Enforce Sharing Policies. */
1113 unsigned maxSQEntries;
1114
1115 /** Data port. */
1116 DcachePort dcachePort;
1117
1118 /** The LSQ units for individual threads. */
1119 std::vector<LSQUnit> thread;
1120
1121 /** Number of Threads. */
1122 ThreadID numThreads;
1123 };
gem5/src/o3/cpu/lsq_unit_impl.hh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
558 template <class Impl>
559 Fault
560 LSQUnit<Impl>::executeLoad(const DynInstPtr &inst)
561 {
562 using namespace TheISA;
563 // Execute a specific load.
564 Fault load_fault = NoFault;
565
566 DPRINTF(LSQUnit, "Executing load PC %s, [sn:%lli]\n",
567 inst->pcState(), inst->seqNum);
568
569 assert(!inst->isSquashed());
570
571 load_fault = inst->initiateAcc();
572
573 if (load_fault == NoFault && !inst->readMemAccPredicate()) {
574 assert(inst->readPredicate());
575 inst->setExecuted();
576 inst->completeAcc(nullptr);
577 iewStage->instToCommit(inst);
578 iewStage->activityThisCycle();
579 return NoFault;
580 }
581
582 if (inst->isTranslationDelayed() && load_fault == NoFault)
583 return load_fault;
584
585 if (load_fault != NoFault && inst->translationCompleted() &&
586 inst->savedReq->isPartialFault() && !inst->savedReq->isComplete()) {
587 assert(inst->savedReq->isSplit());
588 // If we have a partial fault where the mem access is not complete yet
589 // then the cache must have been blocked. This load will be re-executed
590 // when the cache gets unblocked. We will handle the fault when the
591 // mem access is complete.
592 return NoFault;
593 }
594
595 // If the instruction faulted or predicated false, then we need to send it
596 // along to commit without the instruction completing.
597 if (load_fault != NoFault || !inst->readPredicate()) {
598 // Send this instruction to commit, also make sure iew stage
599 // realizes there is activity. Mark it as executed unless it
600 // is a strictly ordered load that needs to hit the head of
601 // commit.
602 if (!inst->readPredicate())
603 inst->forwardOldRegs();
604 DPRINTF(LSQUnit, "Load [sn:%lli] not executed from %s\n",
605 inst->seqNum,
606 (load_fault != NoFault ? "fault" : "predication"));
607 if (!(inst->hasRequest() && inst->strictlyOrdered()) ||
608 inst->isAtCommit()) {
609 inst->setExecuted();
610 }
611 iewStage->instToCommit(inst);
612 iewStage->activityThisCycle();
613 } else {
614 if (inst->effAddrValid()) {
615 auto it = inst->lqIt;
616 ++it;
617
618 if (checkLoads)
619 return checkViolations(it, inst);
620 }
621 }
622
623 return load_fault;
624 }
initiateAcc: handling TLB request
I already covered InitiateAcc of the memory instructions before. However, compared to simple processors, the O3 cpu have different way to process the initateAcc.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
147 template <class Impl>
148 Fault
149 BaseO3DynInst<Impl>::initiateAcc()
150 {
151 // @todo: Pretty convoluted way to avoid squashing from happening
152 // when using the TC during an instruction's execution
153 // (specifically for instructions that have side-effects that use
154 // the TC). Fix this.
155 bool no_squash_from_TC = this->thread->noSquashFromTC;
156 this->thread->noSquashFromTC = true;
157
158 this->fault = this->staticInst->initiateAcc(this, this->traceData);
159
160 this->thread->noSquashFromTC = no_squash_from_TC;
161
162 return this->fault;
163 }
Because the staticInst stored in the dynamic instruction is the class object of a specific microoperation, it will invokes the initiateAcc function of that micro-load/store operation. For the memory read operation case, it invokes initiateMemRead function of architecture side. This will end up invoking initiateMemRead function of the CPU side.
1
2
3
4
5
6
7
8
9
10
42 namespace X86ISA
43 {
44
45 /// Initiate a read from memory in timing mode.
46 static Fault
47 initiateMemRead(ExecContext *xc, Trace::InstRecord *traceData, Addr addr,
48 unsigned dataSize, Request::Flags flags)
49 {
50 return xc->initiateMemRead(addr, dataSize, flags);
51 }
1
2
3
4
5
6
7
8
9
10
11
12
962 template<class Impl>
963 Fault
964 BaseDynInst<Impl>::initiateMemRead(Addr addr, unsigned size,
965 Request::Flags flags,
966 const std::vector<bool>& byte_enable)
967 {
968 assert(byte_enable.empty() || byte_enable.size() == size);
969 return cpu->pushRequest(
970 dynamic_cast<typename DynInstPtr::PtrType>(this),
971 /* ld */ true, nullptr, size, addr, flags, nullptr, nullptr,
972 byte_enable);
973 }
Because the instruction of the O3 CPU is instance of BaseO3DynInst inheriting the BaseDynInst, when the instruction implementation invokes initateMemRead (invoked through the InitateAcc implementation of the instruction), it invokes the corresponding method implemented in the BaseDynInst class.
pushRequest
1
2
3
4
5
6
7
8
9
10
11
713 /** CPU pushRequest function, forwards request to LSQ. */
714 Fault pushRequest(const DynInstPtr& inst, bool isLoad, uint8_t *data,
715 unsigned int size, Addr addr, Request::Flags flags,
716 uint64_t *res, AtomicOpFunctorPtr amo_op = nullptr,
717 const std::vector<bool>& byte_enable =
718 std::vector<bool>())
719
720 {
721 return iew.ldstQueue.pushRequest(inst, isLoad, data, size, addr,
722 flags, res, std::move(amo_op), byte_enable);
723 }
Instead of directly handling the load operation, initiateMemRead pushes the request to the load queue through the pushRequest function. This design seems to be odd because the initateAcc function has been invoked by the lsq at the first place, and the instruction forward the request to the loadstore queue once again. It might have been just implemented as simple function that handles the request directly without going through multiple different units. Anyway, initiateMemRead invokes the pushRequest of the CPU side and it will end up invoking pushRequest of the LSQ.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
693 template<class Impl>
694 Fault
695 LSQ<Impl>::pushRequest(const DynInstPtr& inst, bool isLoad, uint8_t *data,
696 unsigned int size, Addr addr, Request::Flags flags,
697 uint64_t *res, AtomicOpFunctorPtr amo_op,
698 const std::vector<bool>& byte_enable)
699 {
700 // This comming request can be either load, store or atomic.
701 // Atomic request has a corresponding pointer to its atomic memory
702 // operation
703 bool isAtomic M5_VAR_USED = !isLoad && amo_op;
704
705 ThreadID tid = cpu->contextToThread(inst->contextId());
706 auto cacheLineSize = cpu->cacheLineSize();
707 bool needs_burst = transferNeedsBurst(addr, size, cacheLineSize);
708 LSQRequest* req = nullptr;
709
710 // Atomic requests that access data across cache line boundary are
711 // currently not allowed since the cache does not guarantee corresponding
712 // atomic memory operations to be executed atomically across a cache line.
713 // For ISAs such as x86 that supports cross-cache-line atomic instructions,
714 // the cache needs to be modified to perform atomic update to both cache
715 // lines. For now, such cross-line update is not supported.
716 assert(!isAtomic || (isAtomic && !needs_burst));
717
718 if (inst->translationStarted()) {
719 req = inst->savedReq;
720 assert(req);
721 } else {
722 if (needs_burst) {
723 req = new SplitDataRequest(&thread[tid], inst, isLoad, addr,
724 size, flags, data, res);
725 } else {
726 req = new SingleDataRequest(&thread[tid], inst, isLoad, addr,
727 size, flags, data, res, std::move(amo_op));
728 }
729 assert(req);
730 if (!byte_enable.empty()) {
731 req->_byteEnable = byte_enable;
732 }
733 inst->setRequest();
734 req->taskId(cpu->taskId());
735
736 // There might be fault from a previous execution attempt if this is
737 // a strictly ordered load
738 inst->getFault() = NoFault;
739
740 req->initiateTranslation();
741 }
742
743 /* This is the place were instructions get the effAddr. */
744 if (req->isTranslationComplete()) {
745 if (req->isMemAccessRequired()) {
746 inst->effAddr = req->getVaddr();
747 inst->effSize = size;
748 inst->effAddrValid(true);
749
750 if (cpu->checker) {
751 inst->reqToVerify = std::make_shared<Request>(*req->request());
752 }
753 Fault fault;
754 if (isLoad)
755 fault = cpu->read(req, inst->lqIdx);
756 else
757 fault = cpu->write(req, data, inst->sqIdx);
758 // inst->getFault() may have the first-fault of a
759 // multi-access split request at this point.
760 // Overwrite that only if we got another type of fault
761 // (e.g. re-exec).
762 if (fault != NoFault)
763 inst->getFault() = fault;
764 } else if (isLoad) {
765 inst->setMemAccPredicate(false);
766 // Commit will have to clean up whatever happened. Set this
767 // instruction as executed.
768 inst->setExecuted();
769 }
770 }
771
772 if (inst->traceData)
773 inst->traceData->setMem(addr, size, flags);
774
775 return inst->getFault();
776 }
The dynamic instruction can track whether the current instruction has started TLB translation by checking the flag stored in the instruction. It provide the interface to access that information, called translationStarted When the instruction set that flag, it means that the instruction already started the TLB access but waiting response. In the delayed TLB response case, the instruction stores the request information in its instruction object. Therefore, it can retrieve the request that has sent to TLB before. However, if it is the first time of execution, then it should generate new request. As shown in line 722-728, if the request should access two separate cache blocks, it generates SplitDataRequest request object. However, if it only access one block, then SingleDataRequest request object is generated instead. After the request has been produced, it should set proper flags of the instruction object to indicate the instruction initiated the TLB access (line 733). After that, the initiateTranslation function provided by the request object is invoked to actually generate accesses to the TLBs.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
860 template<class Impl>
861 void
862 LSQ<Impl>::SingleDataRequest::initiateTranslation()
863 {
864 assert(_requests.size() == 0);
865
866 this->addRequest(_addr, _size, _byteEnable);
867
868 if (_requests.size() > 0) {
869 _requests.back()->setReqInstSeqNum(_inst->seqNum);
870 _requests.back()->taskId(_taskId);
871 _inst->translationStarted(true);
872 setState(State::Translation);
873 flags.set(Flag::TranslationStarted);
874
875 _inst->savedReq = this;
876 sendFragmentToTranslation(0);
877 } else {
878 _inst->setMemAccPredicate(false);
879 }
880 }
The addRequest just generates packet need to be sent to the TLB unit. Although the current object can be interpreted as just an request itself that can be directly sent to the TLB unit, but it is a wrapper for all the required interface and data structures to resolve TLB access. For example, it includes the ports connected with the TLB unit so that the generated request and its response can be communicated through that port. Anyway, the addRequest function just generates the real packet understandable by the TLB unit.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
407 void
408 addRequest(Addr addr, unsigned size,
409 const std::vector<bool>& byte_enable)
410 {
411 if (byte_enable.empty() ||
412 isAnyActiveElement(byte_enable.begin(), byte_enable.end())) {
413 auto request = std::make_shared<Request>(_inst->getASID(),
414 addr, size, _flags, _inst->masterId(),
415 _inst->instAddr(), _inst->contextId(),
416 std::move(_amo_op));
417 if (!byte_enable.empty()) {
418 request->setByteEnable(byte_enable);
419 }
420 _requests.push_back(request);
421 }
422 }
The addRequest function of the LSQRquest class just generates the request and save it to the _requests vector to send them later. After the request packets are generated, initiateTranslation invokes sendFragmentToTranslation to send the generated packet(s) to the TLB.
1
2
3
4
5
6
7
8
9
10
980 template<class Impl>
981 void
982 LSQ<Impl>::LSQRequest::sendFragmentToTranslation(int i)
983 {
984 numInTranslationFragments++;
985 _port.dTLB()->translateTiming(
986 this->request(i),
987 this->_inst->thread->getTC(), this,
988 this->isLoad() ? BaseTLB::Read : BaseTLB::Write);
989 }
Remember that the SingleDataRequest has only one request packet. Therefore, it has only one entry in the _requests vector. This function sends the request stored in the _requests vector to the TLB. Note that the argument is used to index the entry stored in the _requests vector. You can see that it invokes the translateTiming of the dTLB connected to the LSQ. The details of the translateTiming function of the TLB is explained in the previous posting. Also, note that it passes the this as the translation object parameter. Because the translation object is used to invoke the finish function when the TLB access is resolved.
Response of LSQ for the TLB resolution
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
778 template<class Impl>
779 void
780 LSQ<Impl>::SingleDataRequest::finish(const Fault &fault, const RequestPtr &req,
781 ThreadContext* tc, BaseTLB::Mode mode)
782 {
783 _fault.push_back(fault);
784 numInTranslationFragments = 0;
785 numTranslatedFragments = 1;
786 /* If the instruction has been squahsed, let the request know
787 * as it may have to self-destruct. */
788 if (_inst->isSquashed()) {
789 this->squashTranslation();
790 } else {
791 _inst->strictlyOrdered(req->isStrictlyOrdered());
792
793 flags.set(Flag::TranslationFinished);
794 if (fault == NoFault) {
795 _inst->physEffAddr = req->getPaddr();
796 _inst->memReqFlags = req->getFlags();
797 if (req->isCondSwap()) {
798 assert(_res);
799 req->setExtraData(*_res);
800 }
801 setState(State::Request);
802 } else {
803 setState(State::Fault);
804 }
805
806 LSQRequest::_inst->fault = fault;
807 LSQRequest::_inst->translationCompleted(true);
808 }
809 }
When the translation is completed, the finish function provided by the Request generated by the LSQ will be invoked at the end of the translation. As shwon in the above code, it first checks whether the instruction has been squashed while the TLB process the request. If it has not been squashed, it will set required flags indicating the translation is completed for specific instruction. Note that it sets various fields of the instruction that has initiated the TLB request (_inst in the line 790-808). One of the most important field changed by the finish function is _state field of the request. This field indicates current status of the TLB request and can be set to other state by using the setState function. Remind that how the simpleCPU starts memory access after the TLB is resolved. It initiates memory operation at the end of the finish function. However, O3 cpu does not invoke any related functions to generate actual memory request when the TLB is resolved. Then when and where the O3, especially the LSQ initiates the memory operation? The answer is in the pushRequest!
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
693 template<class Impl>
694 Fault
695 LSQ<Impl>::pushRequest(const DynInstPtr& inst, bool isLoad, uint8_t *data,
696 unsigned int size, Addr addr, Request::Flags flags,
697 uint64_t *res, AtomicOpFunctorPtr amo_op,
698 const std::vector<bool>& byte_enable)
699 {
......
743 /* This is the place were instructions get the effAddr. */
744 if (req->isTranslationComplete()) {
745 if (req->isMemAccessRequired()) {
746 inst->effAddr = req->getVaddr();
747 inst->effSize = size;
748 inst->effAddrValid(true);
749
750 if (cpu->checker) {
751 inst->reqToVerify = std::make_shared<Request>(*req->request());
752 }
753 Fault fault;
754 if (isLoad)
755 fault = cpu->read(req, inst->lqIdx);
756 else
757 fault = cpu->write(req, data, inst->sqIdx);
758 // inst->getFault() may have the first-fault of a
759 // multi-access split request at this point.
760 // Overwrite that only if we got another type of fault
761 // (e.g. re-exec).
762 if (fault != NoFault)
763 inst->getFault() = fault;
764 } else if (isLoad) {
765 inst->setMemAccPredicate(false);
766 // Commit will have to clean up whatever happened. Set this
767 // instruction as executed.
768 inst->setExecuted();
769 }
770 }
771
772 if (inst->traceData)
773 inst->traceData->setMem(addr, size, flags);
774
775 return inst->getFault();
776 }
It first checks the TLB translation is finished by invoking isTranslationComplete.
1
2
3
4
5
6
7
8
9
10
11
12
586 bool
587 isInTranslation()
588 {
589 return _state == State::Translation;
590 }
591
592 bool
593 isTranslationComplete()
594 {
595 return flags.isSet(Flag::TranslationStarted) &&
596 !isInTranslation();
597 }
You might remember that the _state field was changed when the finish function of the TLB request packet is invoked. Therefore, if the TLB request is already resolved, the isTranslationComplete function will return true. And then the actual memory read or write operation is made based on the instruction type. Because the translation packet req has translated physical address from the virtual address, it should also be passed to the operation because memory operation should target the physical address not the virtual address. Because we care currently dealing with the read operation, let’s take a look at how the O3 access the real memory.
CPU->read->LSQ::read->LSQUnit::read
1
2
3
4
5
725 /** CPU read function, forwards read to LSQ. */
726 Fault read(LSQRequest* req, int load_idx)
727 {
728 return this->iew.ldstQueue.read(req, load_idx);
729 }
1
2
3
4
5
6
7
8
1125 template <class Impl>
1126 Fault
1127 LSQ<Impl>::read(LSQRequest* req, int load_idx)
1128 {
1129 ThreadID tid = cpu->contextToThread(req->request()->contextId());
1130
1131 return thread.at(tid).read(req, load_idx);
1132 }
The processor load function handles four different memory load operations: LLSC (locked load/store), MappedIPR (memory mapped register), store forwarding, and just memory load operation. I will cover the plain memory load operation that will try to access the data from the cache and memory. The store forwarding case will be handled in the other posting.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
621 LSQUnit<Impl>::read(LSQRequest *req, int load_idx)
622 {
623 LQEntry& load_req = loadQueue[load_idx];
624 const DynInstPtr& load_inst = load_req.instruction();
625
626 load_req.setRequest(req);
627 assert(load_inst);
628
629 assert(!load_inst->isExecuted());
630
631 // Make sure this isn't a strictly ordered load
632 // A bit of a hackish way to get strictly ordered accesses to work
633 // only if they're at the head of the LSQ and are ready to commit
634 // (at the head of the ROB too).
635
636 if (req->mainRequest()->isStrictlyOrdered() &&
637 (load_idx != loadQueue.head() || !load_inst->isAtCommit())) {
638 // Tell IQ/mem dep unit that this instruction will need to be
639 // rescheduled eventually
640 iewStage->rescheduleMemInst(load_inst);
641 load_inst->clearIssued();
642 load_inst->effAddrValid(false);
643 ++lsqRescheduledLoads;
644 DPRINTF(LSQUnit, "Strictly ordered load [sn:%lli] PC %s\n",
645 load_inst->seqNum, load_inst->pcState());
646
647 // Must delete request now that it wasn't handed off to
648 // memory. This is quite ugly. @todo: Figure out the proper
649 // place to really handle request deletes.
650 load_req.setRequest(nullptr);
651 req->discard();
652 return std::make_shared<GenericISA::M5PanicFault>(
653 "Strictly ordered load [sn:%llx] PC %s\n",
654 load_inst->seqNum, load_inst->pcState());
655 }
656
657 DPRINTF(LSQUnit, "Read called, load idx: %i, store idx: %i, "
658 "storeHead: %i addr: %#x%s\n",
659 load_idx - 1, load_inst->sqIt._idx, storeQueue.head() - 1,
660 req->mainRequest()->getPaddr(), req->isSplit() ? " split" : "");
661
662 if (req->mainRequest()->isLLSC()) {
663 // Disable recording the result temporarily. Writing to misc
664 // regs normally updates the result, but this is not the
665 // desired behavior when handling store conditionals.
666 load_inst->recordResult(false);
667 TheISA::handleLockedRead(load_inst.get(), req->mainRequest());
668 load_inst->recordResult(true);
669 }
670
671 if (req->mainRequest()->isMmappedIpr()) {
672 assert(!load_inst->memData);
673 load_inst->memData = new uint8_t[MaxDataBytes];
674
675 ThreadContext *thread = cpu->tcBase(lsqID);
676 PacketPtr main_pkt = new Packet(req->mainRequest(), MemCmd::ReadReq);
677
678 main_pkt->dataStatic(load_inst->memData);
679
680 Cycles delay = req->handleIprRead(thread, main_pkt);
681
682 WritebackEvent *wb = new WritebackEvent(load_inst, main_pkt, this);
683 cpu->schedule(wb, cpu->clockEdge(delay));
684 return NoFault;
685 }
686
687 // Check the SQ for any previous stores that might lead to forwarding
......
840 // If there's no forwarding case, then go access memory
841 DPRINTF(LSQUnit, "Doing memory access for inst [sn:%lli] PC %s\n",
842 load_inst->seqNum, load_inst->pcState());
843
844 // Allocate memory if this is the first time a load is issued.
845 if (!load_inst->memData) {
846 load_inst->memData = new uint8_t[req->mainRequest()->getSize()];
847 }
848
849 // For now, load throughput is constrained by the number of
850 // load FUs only, and loads do not consume a cache port (only
851 // stores do).
852 // @todo We should account for cache port contention
853 // and arbitrate between loads and stores.
854
855 // if we the cache is not blocked, do cache access
856 if (req->senderState() == nullptr) {
857 LQSenderState *state = new LQSenderState(
858 loadQueue.getIterator(load_idx));
859 state->isLoad = true;
860 state->inst = load_inst;
861 state->isSplit = req->isSplit();
862 req->senderState(state);
863 }
864 req->buildPackets();
865 req->sendPacketToCache();
866 if (!req->isSent())
867 iewStage->blockMemInst(load_inst);
868
869 return NoFault;
870 }
Execute store instruction
Execute non-memory instruction
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
1333 } else {
1334 // If the instruction has already faulted, then skip executing it.
1335 // Such case can happen when it faulted during ITLB translation.
1336 // If we execute the instruction (even if it's a nop) the fault
1337 // will be replaced and we will lose it.
1338 if (inst->getFault() == NoFault) {
1339 inst->execute();
1340 if (!inst->readPredicate())
1341 inst->forwardOldRegs();
1342 }
1343
1344 inst->setExecuted();
1345
1346 instToCommit(inst);
1347 }
1348
1349 updateExeInstStats(inst);
1351 // Check if branch prediction was correct, if not then we need 1352 // to tell commit to squash in flight instructions. Only 1353 // handle this if there hasn’t already been something that 1354 // redirects fetch in this group of instructions. 1355 1356 // This probably needs to prioritize the redirects if a different 1357 // scheduler is used. Currently the scheduler schedules the oldest 1358 // instruction first, so the branch resolution order will be correct. 1359 ThreadID tid = inst->threadNumber; 1360 1361 if (!fetchRedirect[tid] || 1362 !toCommit->squash[tid] || 1363 toCommit->squashedSeqNum[tid] > inst->seqNum) { 1364 1365 // Prevent testing for misprediction on load instructions, 1366 // that have not been executed. 1367 bool loadNotExecuted = !inst->isExecuted() && inst->isLoad(); 1368 1369 if (inst->mispredicted() && !loadNotExecuted) { 1370 fetchRedirect[tid] = true; 1371 1372 DPRINTF(IEW, “[tid:%i] [sn:%llu] Execute: “ 1373 “Branch mispredict detected.\n”, 1374 tid,inst->seqNum); 1375 DPRINTF(IEW, “[tid:%i] [sn:%llu] “ 1376 “Predicted target was PC: %s\n”, 1377 tid,inst->seqNum,inst->readPredTarg()); 1378 DPRINTF(IEW, “[tid:%i] [sn:%llu] Execute: “ 1379 “Redirecting fetch to PC: %s\n”, 1380 tid,inst->seqNum,inst->pcState()); 1381 // If incorrect, then signal the ROB that it must be squashed. 1382 squashDueToBranch(inst, tid); 1383 1384 ppMispredict->notify(inst); 1385 1386 if (inst->readPredTaken()) { 1387 predictedTakenIncorrect++; 1388 } else { 1389 predictedNotTakenIncorrect++; 1390 } 1391 } else if (ldstQueue.violation(tid)) { 1392 assert(inst->isMemRef()); 1393 // If there was an ordering violation, then get the 1394 // DynInst that caused the violation. Note that this 1395 // clears the violation signal. 1396 DynInstPtr violator; 1397 violator = ldstQueue.getMemDepViolator(tid); 1398 1399 DPRINTF(IEW, “LDSTQ detected a violation. Violator PC: %s “ 1400 “[sn:%lli], inst PC: %s [sn:%lli]. Addr is: %#x.\n”, 1401 violator->pcState(), violator->seqNum, 1402 inst->pcState(), inst->seqNum, inst->physEffAddr); 1403 1404 fetchRedirect[tid] = true; 1405 1406 // Tell the instruction queue that a violation has occured. 1407 instQueue.violation(inst, violator); 1408 1409 // Squash. 1410 squashDueToMemOrder(violator, tid); 1411 1412 ++memOrderViolationEvents; 1413 } 1414 } else { 1415 // Reset any state associated with redirects that will not 1416 // be used. 1417 if (ldstQueue.violation(tid)) { 1418 assert(inst->isMemRef()); 1419 1420 DynInstPtr violator = ldstQueue.getMemDepViolator(tid); 1421 1422 DPRINTF(IEW, “LDSTQ detected a violation. Violator PC: “ 1423 “%s, inst PC: %s. Addr is: %#x.\n”, 1424 violator->pcState(), inst->pcState(), 1425 inst->physEffAddr); 1426 DPRINTF(IEW, “Violation will not be handled because “ 1427 “already squashing\n”); 1428 1429 ++memOrderViolationEvents; 1430 } 1431 } 1432 } 1433 1434 // Update and record activity if we processed any instructions. 1435 if (inst_num) { 1436 if (exeStatus == Idle) { 1437 exeStatus = Running; 1438 } 1439 1440 updatedQueues = true; 1441 1442 cpu->activityThisCycle(); 1443 } 1444 1445 // Need to reset this in case a writeback event needs to write into the 1446 // iew queue. That way the writeback event will write into the correct 1447 // spot in the queue. 1448 wbNumInst = 0; 1449 1450 } ```
Schedule
Schedule (InstructionQueue::scheduleReadyInsts()) The IQ manages the ready instructions (operands ready) in a ready list, and schedules them to an available FU. The latency of the FU is set here, and instructions are sent to execution when the FU done.
Comments powered by Disqus.