TD VM Life Cycle Part 2
Deep dive into TD VCPU creation (TDH_VP_CREATE-TDH_VP_INIT)
Instantiating TD VCPU
After the VM has been initialized, note that it has not been finalized yet, it can generate VCPUs assigned to the generated TD instance. The logistics of TD VCPU generation consists of two parts mainly: generate VCPU (KVM_CREATE_VCPU) and initialize the generated VCPU as TDX VCPU (KVM_TDX_INIT_VCPU) through the SEAMCALL, TDH_VP_INIT.
VCPU Creation for TDX
The first step is generating VCPU instance as we do for vanilla VM’s VCPU. It utilizes the same interface from QEMU side, KVM_CREATE_VCPU of the kvm_vm_ioctl. kvm_vm_ioctl_create_vcpu is the main function handling the ioctl and invokes following functions to initialize the VCPU related features including MMU. MMU initialization details are described in [here].
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
10997 int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
......
r = static_call(kvm_x86_vcpu_create)(vcpu);
if (r)
goto free_guest_fpu;
vcpu->arch.arch_capabilities = kvm_get_arch_capabilities();
vcpu->arch.msr_platform_info = MSR_PLATFORM_INFO_CPUID_FAULT;
kvm_xen_init_vcpu(vcpu);
kvm_vcpu_mtrr_init(vcpu);
vcpu_load(vcpu);
kvm_set_tsc_khz(vcpu, vcpu->kvm->arch.default_tsc_khz);
kvm_vcpu_reset(vcpu, false);
kvm_init_mmu(vcpu);
vcpu_put(vcpu);
return 0;
kvm_arch_vcpu_create is the architecture specific VCPU initialization function.
1
2
struct kvm_x86_ops vt_x86_ops __initdata = {
.vcpu_create = vt_vcpu_create,
1
2
3
4
5
6
7
static int vt_vcpu_create(struct kvm_vcpu *vcpu)
{
if (is_td_vcpu(vcpu))
return tdx_vcpu_create(vcpu);
return vmx_vcpu_create(vcpu);
}
And the kvm_x86_vcpu_create function, actually the tdx_vcpu_create function generates the TD VM’s VCPU instances.
Trust Domain Virtual Processor State (TDVPS)
Trust Domain Virtual Processor Root (TDVPR) is the 4KB root page of TDVPS. Its physical address serves as a unique identifier of the VCPU (as long as it resides in memory).
As the TDR is the id of the TD VM, each TD VCPU requires associated TDX metadata called Trust Domain Virtual Processor State (TDVPS). For example, TD VCPU and TD VMCS structure are stored in the TDVPS as shown in the figure.
Trust Domain Virtual Processor eXtension (TDVPX) 4KB pages extend TDVPR to help provide enough physical space for the logical TDVPS structure.
TDX logically views the TDVPS as a consecutive memory region containing all VMX standard control structure such as TD VMCS and TD VCPU. Especially, TD VCPU Management fields manage the operation of the VCPU. However, physically, it consists of multiple physical pages represented as TDVPR root page and TDVPX child pages.
The required number of 4KB TDVPR/TDVPX pages in TDVPS is enumerated to the VMM by the TDH.SYS.INFO function.
TDVPS management in TDX Module
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
typedef struct tdvps_management_s
{
uint8_t state; /**< The activity state of the VCPU */
/**
* A boolean flag, indicating whether the TD VCPU has been VMLAUNCH’ed
* on this LP since it has last been associated with this VCPU. If TRUE,
* VM entry should use VMRESUME. Else, VM entry should use VMLAUNCH.
*/
bool_t launched;
/**
* Sequential index of the VCPU in the parent TD. VCPU_INDEX indicates the order
* of VCPU initialization (by TDHVPINIT), starting from 0, and is made available to
* the TD via TDINFO. VCPU_INDEX is in the range 0 to (MAX_VCPUS_PER_TD - 1)
*/
uint32_t vcpu_index;
uint8_t num_tdvpx; /**< A counter of the number of child TDVPX pages associated with this TDVPR */
uint8_t reserved_0[1]; /**< Reserved for aligning the next field */
/**
* An array of (TDVPS_PAGES) physical address pointers to the TDVPX pages
*
* PA is without HKID bits
* Page 0 is the PA of the TDVPR page
* Pages 1,2,... are PAs of the TDVPX pages
*/
uint64_t tdvps_pa[MAX_TDVPS_PAGES];
/**
* The (unique hardware-derived identifier) of the logical processor on which this VCPU
* is currently associated (either by TDHVPENTER or by other VCPU-specific SEAMCALL flow).
* A value of 0xffffffff (-1 in signed) indicates that VCPU is not associated with any LP.
* Initialized by TDHVPINIT to the LP_ID on which it ran
*/
uint32_t assoc_lpid;
/**
* The TD's ephemeral private HKID at the last time this VCPU was associated (either
* by TDHVPENTER or by other VCPU-specific SEAMCALL flow) with an LP.
* Initialized by TDHVPINIT to the current TD ephemeral private HKID.
*/
uint32_t assoc_hkid;
/**
* The value of TDCS.TD_EPOCH, sampled at the time this VCPU entered TDX non-root mode
*/
uint64_t vcpu_epoch;
bool_t cpuid_supervisor_ve;
bool_t cpuid_user_ve;
bool_t is_shared_eptp_valid;
uint8_t reserved_1[5]; /**< Reserved for aligning the next field */
uint64_t last_exit_tsc;
bool_t pend_nmi;
uint8_t reserved_2[7]; /**< Reserved for aligning the next field */
uint64_t xfam;
uint8_t last_epf_gpa_list_idx;
uint8_t possibly_epf_stepping;
uint8_t reserved_3[150]; /**< Reserved for aligning the next field */
uint64_t last_epf_gpa_list[EPF_GPA_LIST_SIZE]; // Array of GPAs that caused EPF at this TD vCPU instruction
uint8_t reserved_4[256]; /**< Reserved for aligning the next field */
} tdvps_management_t;
Allocating TDVPR page and TDVPX pages
VMM should prepare TDVPR and TDVPX pages before it bounds the pages to the VCPU. tdx_vcpu_create function allocate one TDVPR page and multiple TDVPX pages.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
634 int tdx_vcpu_create(struct kvm_vcpu *vcpu)
635 {
......
648 ret = tdx_alloc_td_page(&tdx->tdvpr);
649 if (ret)
650 return ret;
651
652 tdx->tdvpx = kcalloc(tdx_caps.tdvpx_nr_pages, sizeof(*tdx->tdvpx),
653 GFP_KERNEL_ACCOUNT);
654 if (!tdx->tdvpx) {
655 ret = -ENOMEM;
656 goto free_tdvpr;
657 }
658 for (i = 0; i < tdx_caps.tdvpx_nr_pages; i++) {
659 ret = tdx_alloc_td_page(&tdx->tdvpx[i]);
660 if (ret)
661 goto free_tdvpx;
662 }
To utilize the created VCPU, the TDX VM should have a TDVPR page bound to the VCPU assigned to the TD VM. Physical page for TDVPR is allocated when VCPU is created. Also note that it allocates multiple physical pages for TDVPX. The generated TDVPR is used when registering the generated VCPU to specific TD VM through TDH_VP_CREATE, which adds a TDVPR page as a child of a TDR page. Also, TDH.VP.ADDCX adds a TDVPX page as a child of a given TDVPR (tdh_vp_addcx). We will cover when and how the TDVPX pages are initialized based on their semantics as VMCS for example.
Create VCPU for TD VM
Note that we haven’t generated any VCPU instance, but TDVPS page required for generating TDX VPCU instance. Because the TD VCPU should be belong to one TD VM, it requires TDR page to denote which TD VM will have the access for the newly generated TD VCPU. You will see that TDH_VP_CREATE SEAMCALL receive these two data structures to instantiate new VCPU for TDX.
1
2
3
11148 void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
......
11230 static_call(kvm_x86_vcpu_reset)(vcpu, init_event);
1
2
3
992 static struct kvm_x86_ops vt_x86_ops __initdata = {
......
1008 .vcpu_reset = vt_vcpu_reset,
1
2
3
4
5
6
7
169 static void vt_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
170 {
171 if (is_td_vcpu(vcpu))
172 return tdx_vcpu_reset(vcpu, init_event);
173
174 return vmx_vcpu_reset(vcpu, init_event);
175 }
If current kvm_vcpu indicates that it is VCPU for TD-VM, it invokes tdx_vcpu_reset function that calls TDH_VP_CREATE SEAMCALL.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
825 void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
826 {
......
839 err = tdh_vp_create(kvm_tdx->tdr.pa, tdx->tdvpr.pa);
840 if (WARN_ON_ONCE(err)) {
841 pr_tdx_error(TDH_VP_CREATE, err, NULL);
842 goto td_bugged;
843 }
844 tdx_mark_td_page_added(&tdx->tdvpr);
845
846 for (i = 0; i < tdx_caps.tdvpx_nr_pages; i++) {
847 err = tdh_vp_addcx(tdx->tdvpr.pa, tdx->tdvpx[i].pa);
848 if (WARN_ON_ONCE(err)) {
849 pr_tdx_error(TDH_VP_ADDCX, err, NULL);
850 goto td_bugged;
851 }
852 tdx_mark_td_page_added(&tdx->tdvpx[i]);
853 }
The TDH_VP_CREATE SEAMCALL creates the VCPU by registering TDVPR page as a child of TDR. Note that TDR page is owned by the TDX module, so the TDX module should register the TDVPR page as its child. After the TDVPR page is added, because it is a logical concept, additional TDVPX pages should be added in addition to the first TDVPR page by TDH_VP_ADDCX SEAMCALL.
TDX_VP_CREATE (TDX Module side)
Because the TDVPR page address is passed from VMM, it needs to be converted into private page of the TD VM. We already covered how this conversion is done by the TDX module ([XXX]). After the private mapping is created, the TDVPR page should be initialized and registered as a child of TDR. Through this process, the TD VCPU is created and bound to specific TD VM.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
api_error_type tdh_vp_create(uint64_t target_tdvpr_pa, uint64_t target_tdr_pa)
{
......
// Check, lock and map the new TDVPR page
return_val = check_lock_and_map_explicit_private_4k_hpa(tdvpr_pa,
OPERAND_ID_RCX,
tdr_ptr,
TDX_RANGE_RW,
TDX_LOCK_EXCLUSIVE,
PT_NDA,
&tdvpr_pamt_block,
&tdvpr_pamt_entry_ptr,
&tdvpr_locked_flag,
(void**)&tdvps_ptr);
......
tdvps_ptr->management.assoc_lpid = (uint32_t)-1;
tdvps_ptr->management.tdvps_pa[0] = tdvpr_pa.raw;
// Register the new TDVPR page in its owner TDR
_lock_xadd_64b(&(tdr_ptr->management_fields.chldcnt), 1);
// Set the new TDVPR page PAMT fields
tdvpr_pamt_entry_ptr->pt = PT_TDVPR;
set_pamt_entry_owner(tdvpr_pamt_entry_ptr, tdr_pa);
TDH_VP_ADDCX (TDX Module side)
Physically the TDVPR page consists of multiple TDVPX pages, and the TDH_VP_ADDCX adds the TDVPX page as the child of TDVPR. Note that this function receives the TDVPR and TDVPX page not the TDR page.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
api_error_type tdh_vp_addcx(uint64_t target_tdvpx_pa, uint64_t target_tdvpr_pa)
{
......
// Check and lock the parent TDVPR page
return_val = check_and_lock_explicit_4k_private_hpa(tdvpr_pa,
OPERAND_ID_RDX,
TDX_LOCK_EXCLUSIVE,
PT_TDVPR,
&tdvpr_pamt_block,
&tdvpr_pamt_entry_ptr,
&page_leaf_size,
&tdvpr_locked_flag);
// Get and lock the owner TDR page
tdr_pa = get_pamt_entry_owner(tdvpr_pamt_entry_ptr);
return_val = lock_and_map_implicit_tdr(tdr_pa,
OPERAND_ID_TDR,
TDX_RANGE_RW,
TDX_LOCK_SHARED,
&tdr_pamt_entry_ptr,
&tdr_locked_flag,
&tdr_ptr);
......
// Map the TDVPS structure. Note that only the 1st page (TDVPR) is
// accessible at this point.
tdvps_ptr = (tdvps_t*)map_pa((void*)(set_hkid_to_pa(tdvpr_pa, td_hkid).full_pa), TDX_RANGE_RW);
......
// Check, lock and map the new TDVPX page
return_val = check_lock_and_map_explicit_private_4k_hpa(tdvpx_pa,
OPERAND_ID_RCX,
tdr_ptr,
TDX_RANGE_RW,
TDX_LOCK_EXCLUSIVE,
PT_NDA,
&tdvpx_pamt_block,
&tdvpx_pamt_entry_ptr,
&tdvpx_locked_flag,
(void**)&tdvpx_ptr);
......
// Clear the content of the TDVPX page using direct writes
zero_area_cacheline(tdvpx_ptr, TDX_PAGE_SIZE_IN_BYTES);
// Register the new TDVPX in its parent TDVPS structure
// Note that tdvpx_pa[0] is the PA of TDVPR, so TDVPX
// pages start from index 1
tdvpx_index_num++;
tdvps_ptr->management.num_tdvpx = (uint8_t)tdvpx_index_num;
tdvps_ptr->management.tdvps_pa[tdvpx_index_num] = tdvpx_pa.raw;
// Register the new TDVPX page in its owner TDR
_lock_xadd_64b(&(tdr_ptr->management_fields.chldcnt), 1);
// Set the new TDVPX page PAMT fields
tdvpx_pamt_entry_ptr->pt = PT_TDVPX;
set_pamt_entry_owner(tdvpx_pamt_entry_ptr, tdr_pa);
Because the TDVPR page has been initialized and registered as a private page for the TD VM before in the TDH_VP_CREATE SEMCALL, its corresponding PAMT block will be also retrieved as a result of the check_and_lock_explicit_4k_private_hpa func. Because each PAMT entry memorize owner TD VM where the physical address mapped by the PAMT belongs to, it can easily retrieve its owner, the TDR. It also maps TDVPR and TDVPX, but because TDVPX is mapped to private page first time, mapping is done by map_pa and check_lock_and_map_explicit_private_4k_hpa, respectively. It updates the TDVPR and initialize TDVPX page. Note that still each TDVPX page is not initialized. We will see how each TDVPX page will be initialized to carry VCPU information.
Initialize registered VCPU (TDH_VP_INIT)
After adding all VCPU related physical pages to the TD VM, it is ready for VCPU initialization. When the TDX module finish initialization of VCPU of the TD VM, the status of the VM is changed to initialized.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
2472 static int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)
......
2488 if (cmd.metadata || cmd.id != KVM_TDX_INIT_VCPU)
2489 return -EINVAL;
2490
2491 err = tdh_vp_init(tdx->tdvpr.pa, cmd.data);
2492 if (TDX_ERR(err, TDH_VP_INIT, NULL))
2493 return -EIO;
2494
2495 tdx->initialized = true;
2496
2497 td_vmcs_write16(tdx, POSTED_INTR_NV, POSTED_INTR_VECTOR);
2498 td_vmcs_write64(tdx, POSTED_INTR_DESC_ADDR, __pa(&tdx->pi_desc));
2499 td_vmcs_setbit32(tdx, PIN_BASED_VM_EXEC_CONTROL, PIN_BASED_POSTED_INTR);
It invokes tdh_vp_init function which invokes TDH_VP_INIT SEAMCALL. We can say that The VCPU initialization is equal to TDVPR page initialization. Let’s see how TDX Module initializes the TDVPR pages and related data structures used to manage TDVPS.
Initialize TDVPS page
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
typedef enum
{
TDVPS_VE_INFO_PAGE_INDEX = 0,
TDVPS_VMCS_PAGE_INDEX = 1,
TDVPS_VAPIC_PAGE_INDEX = 2,
MAX_TDVPS_PAGES = 6
} tdvps_pages_e;
typedef struct ALIGN(TDX_PAGE_SIZE_IN_BYTES) tdvps_s
{
tdvps_ve_info_t ve_info;
uint8_t reserved_0[128]; /**< Reserved for aligning the next field */
tdvps_management_t management;
tdvps_guest_state_t guest_state;
tdvps_guest_msr_state_t guest_msr_state;
uint8_t reserved_1[2432]; /**< Reserved for aligning the next field */
tdvps_td_vmcs_t td_vmcs;
uint8_t reserved_2[TDX_PAGE_SIZE_IN_BYTES - SIZE_OF_TD_VMCS_IN_BYTES]; /**< Reserved for aligning the next field */
tdvps_vapic_t vapic;
tdvps_guest_extension_state_t guest_extension_state;
} tdvps_t;
TDVPS page can semantically be divided into 3 different pages as shown in the tdvps_pages_e: VE_INFO, VMCS, VAPIC page. Let’s see how TDX Module functions initialize those three pages as a part of TDH_VP_INIT SEAMCALL.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
api_error_type tdh_vp_init(uint64_t target_tdvpr_pa, uint64_t td_vcpu_rcx)
{
......
// Get the TD's ephemeral HKID
curr_hkid = tdr_ptr->key_management_fields.hkid;
// Map the multi-page TDVPS structure
tdvps_ptr = map_tdvps(tdvpr_pa, curr_hkid, TDX_RANGE_RW);
......
/**
* Initialize the TD VCPU GPRs. Default GPR value is 0.
* Initialize the TD VCPU non-GPR register state in TDVPS:
* CRs, DRs, XCR0, IWK etc.
*/
init_vcpu_gprs_and_registers(tdvps_ptr, tdcs_ptr, init_rcx, vcpu_index);
/**
* Initialize the TD VCPU MSR state in TDVPS
*/
init_vcpu_msrs(tdvps_ptr);
/**
* No need to explicitly initialize TD VCPU extended state pages.
* Since the pages are initialized to 0 on TDHVPCREATE/TDVPADDCX.
*/
// Bit 63 of XCOMP_BV should be set to 1, to indicate compact format.
// Otherwise XSAVES and XRSTORS won't work
tdvps_ptr->guest_extension_state.xbuf.xsave_header.xcomp_bv = BIT(63);
// Initialize TDVPS.LBR_DEPTH to MAX_LBR_DEPTH supported on the core
if (((ia32_xcr0_t)tdcs_ptr->executions_ctl_fields.xfam).lbr)
{
tdvps_ptr->guest_msr_state.ia32_lbr_depth = (uint64_t)get_global_data()->max_lbr_depth;
}
/**
* No need to explicitly initialize VAPIC page.
* Since the pages are initialized to 0 on TDHVPCREATE/TDVPADDCX,
* VAPIC page is already 0.
*/
Note that it receives cmd.data which will be set as an initial RCX value of the VCPU. This RCX value will be used when VBIOS initially starts from TD VM.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
_STATIC_INLINE_ void init_vcpu_gprs_and_registers(tdvps_t * tdvps_ptr, tdcs_t * tdcs_ptr, uint64_t init_rcx, uint32_t vcpu_index)
{
/**
* GPRs init
*/
if (tdcs_ptr->executions_ctl_fields.gpaw)
{
tdvps_ptr->guest_state.rbx = MAX_PA_FOR_GPAW;
}
else
{
tdvps_ptr->guest_state.rbx = MAX_PA_FOR_GPA_NOT_WIDE;
}
// Set RCX and R8 to the input parameter's value
tdvps_ptr->guest_state.rcx = init_rcx;
tdvps_ptr->guest_state.r8 = init_rcx;
// CPUID(1).EAX - returns Family/Model/Stepping in EAX - take the saved value by TDHSYSINIT
tdx_debug_assert(get_cpuid_lookup_entry(0x1, 0x0) < MAX_NUM_CPUID_LOOKUP);
tdvps_ptr->guest_state.rdx = (uint64_t)get_global_data()->cpuid_values[get_cpuid_lookup_entry(0x1, 0x0)].values.eax;
/**
* Registers init
*/
tdvps_ptr->guest_state.xcr0 = XCR0_RESET_STATE;
tdvps_ptr->guest_state.dr6 = DR6_RESET_STATE;
// Set RSI to the VCPU index
tdvps_ptr->guest_state.rsi = vcpu_index & BITS(31,0);
/**
* All other GPRs/Registers are set to 0 or
* that their INIT state is 0
* Doesn’t include values initialized in VMCS
*/
}
VMCS initialization
The most important page of the TDVPS is VMCS of the TD VCPU. It is literally identical with the VMCS for vanilla VM VCPU. Let’s see how the TDX Module sets up VMCS structure for TD VCPU.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
vmcs_pa = set_hkid_to_pa((pa_t)tdvps_ptr->management.tdvps_pa[TDVPS_VMCS_PAGE_INDEX], curr_hkid);
/**
* Map the TD VMCS page.
*
* @note This is the only place the VMCS page is directly accessed.
*/
vmcs_ptr = map_pa((void*)vmcs_pa.raw, TDX_RANGE_RW);
vmcs_ptr->revision.vmcs_revision_identifier =
get_global_data()->plt_common_config.ia32_vmx_basic.vmcs_revision_id;
// Clear the TD VMCS
ia32_vmclear((void*)vmcs_pa.raw);
/**
* No need to explicitly initialize VE_INFO.
* Since the pages are initialized to 0 on TDHVPCREATE/TDVPADDCX,
* VE_INFO.VALID is already 0.
*/
// Mark the VCPU as initialized and ready
tdvps_ptr->management.state = VCPU_READY_ASYNC;
/**
* Save the host VMCS fields before going to TD VMCS context
*/
save_vmcs_host_fields(&td_vmcs_host_values);
/**
* Associate the VCPU - no checks required
*/
associate_vcpu_initial(tdvps_ptr, tdcs_ptr, tdr_ptr, &td_vmcs_host_values);
td_vmcs_loaded = true;
/**
* Initialize the TD VMCS fields
*/
init_td_vmcs(tdcs_ptr, tdvps_ptr, &td_vmcs_host_values);
Because TDX Module doesn’t manage the virtual mapping of all physical pages of the meta data of one TD VM, it should be mapped first and should retrieve a virtual address for TD VMCS page. The management structure maintain physical pages of the TDVPS pages. After the mapping is done, init_td_vmcs initializes VMCS for TD VCPU.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
void save_vmcs_host_fields(vmcs_host_values_t* host_fields_ptr)
{
read_vmcs_field_info(VMX_HOST_CR0_ENCODE, &host_fields_ptr->CR0);
read_vmcs_field_info(VMX_HOST_CR3_ENCODE, &host_fields_ptr->CR3);
read_vmcs_field_info(VMX_HOST_CR4_ENCODE, &host_fields_ptr->CR4);
read_vmcs_field_info(VMX_HOST_CS_SELECTOR_ENCODE, &host_fields_ptr->CS);
read_vmcs_field_info(VMX_HOST_SS_SELECTOR_ENCODE, &host_fields_ptr->SS);
read_vmcs_field_info(VMX_HOST_FS_SELECTOR_ENCODE, &host_fields_ptr->FS);
read_vmcs_field_info(VMX_HOST_GS_SELECTOR_ENCODE, &host_fields_ptr->GS);
read_vmcs_field_info(VMX_HOST_TR_SELECTOR_ENCODE, &host_fields_ptr->TR);
read_vmcs_field_info(VMX_HOST_IA32_S_CET_ENCODE, &host_fields_ptr->IA32_S_CET);
read_vmcs_field_info(VMX_HOST_SSP_ENCODE, &host_fields_ptr->SSP);
read_vmcs_field_info(VMX_HOST_IA32_PAT_FULL_ENCODE, &host_fields_ptr->IA32_PAT);
read_vmcs_field_info(VMX_HOST_IA32_EFER_FULL_ENCODE, &host_fields_ptr->IA32_EFER);
read_vmcs_field_info(VMX_HOST_FS_BASE_ENCODE, &host_fields_ptr->FS_BASE);
read_vmcs_field_info(VMX_HOST_RSP_ENCODE, &host_fields_ptr->RSP);
read_vmcs_field_info(VMX_HOST_GS_BASE_ENCODE, &host_fields_ptr->GS_BASE);
}
VMCS needs to be configured for two interfaces, host to VM and VM to host. The all required information to control VM to host interface is maintained by the TDX Module. Recall that we are still in the VMX root operation while the TDX Module executes. Also, VMREAD instruction reads from the current VMCS when the processor runs as VMX root operation. If executed in VMX non-root operation, the instruction reads from the VMCS referenced by the VMCS link pointer field in the current VMCS.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
void associate_vcpu_initial(tdvps_t * tdvps_ptr,
tdcs_t * tdcs_ptr,
tdr_t * tdr_ptr,
vmcs_host_values_t * host_values)
{
uint32_t curr_lp_id = get_local_data()->lp_info.lp_id;
uint16_t curr_hkid;
pa_t vmcs_addr;
tdvps_ptr->management.assoc_lpid = curr_lp_id;
curr_hkid = tdr_ptr->key_management_fields.hkid;
// Set the TD VMCS as the current VMCS
vmcs_addr = set_hkid_to_pa((pa_t)tdvps_ptr->management.tdvps_pa[TDVPS_VMCS_PAGE_INDEX], curr_hkid);
ia32_vmptrld((void*)vmcs_addr.raw);
/**
* Update multiple TD VMCS physical address fields with the new HKID.
*/
init_guest_td_address_fields(tdr_ptr, tdvps_ptr, curr_hkid);
/**
* Update the TD VMCS LP-dependent host state fields.
* Applicable fields are HOST_RSP, HOST_SSP and HOST_GS_BASE
*/
ia32_vmwrite(host_values->RSP.encoding, host_values->RSP.value);
ia32_vmwrite(host_values->SSP.encoding, host_values->SSP.value);
ia32_vmwrite(host_values->GS_BASE.encoding, host_values->GS_BASE.value);
// Atomically increment the number of associated VCPUs
_lock_xadd_32b(&(tdcs_ptr->management_fields.num_assoc_vcpus), 1);
}
To initialize the TD VMCS of the target TD, the VMCS should be first loaded into the processor. It retrieves the VMCS from the tdvps and run vmptrld inst to switch VCPU of TDX Module to VCPU of TD VM.
After switching the VCPU the most of the VCPU initialization is done by the func init_td_vmcs. I will not cover the details, but previously stored host registers SEAM VMCS will be written to TD VMCS’s host registers because VM EXIT from the TD should jump to the TDX Module. Also the initialization includes private EPTP. Note that the EPTP address is stored in the TDCS (refer to part 1). However, note that the entire EPTP has not been initialized, but the root.
1
2
3
4
#define VMX_GUEST_EPT_POINTER_FULL_ENCODE 0x201AULL
#define VMX_GUEST_EPT_POINTER_HIGH_ENCODE 0x201bULL
#define VMX_GUEST_SHARED_EPT_POINTER_FULL_ENCODE 0x203C
#define VMX_GUEST_SHARED_EPT_POINTER_HIGH_ENCODE 0x203D
Also, note that there are two different types of EPTP for TD VM. For that VMCS needs to be updated to contain pointers of shared and private EPTP. The private EPTP should be protected from the non-TD software layers, so it should be initialized by the TDX Module and written to VMCS through TDH_VP_INIT. However, the shared EPTP is provided by the Host VMM (see detail in part 3), and it doesn’t need to be secure. The purpose of shared EPTP is for sharing data with host VMM. Therefore, another SEAMCALL is used to write the VMCS located in t he TDX memory (TDH.VP.WR).
Read Write to TD VMCS from Host VMM
Based on the debugging mode or production mode, TDX allows the VMM to read/write some pages belong to TD VM, for example, TDVPS and VMCS. For this, TDX provides two SEAMCALL: TDH.VP.RD and TDH.VP.WR. To utilize the two SEAMCALL, KVM defines below macro functions.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
199 #define TDX_BUILD_TDVPS_ACCESSORS(bits, uclass, lclass) \
200 static __always_inline u##bits td_##lclass##_read##bits(struct vcpu_tdx *tdx, \
201 u32 field) \
202 { \
203 struct tdx_ex_ret ex_ret; \
204 u64 err; \
205 \
206 tdvps_##lclass##_check(field, bits); \
207 err = tdh_vp_rd(tdx->tdvpr.pa, TDVPS_##uclass(field), &ex_ret); \
208 if (unlikely(err)) { \
209 pr_err("TDH_VP_RD["#uclass".0x%x] failed: %s (0x%llx)\n", \
210 field, tdx_seamcall_error_name(err), err); \
211 return 0; \
212 } \
213 return (u##bits)ex_ret.regs.r8; \
214 } \
215 static __always_inline void td_##lclass##_write##bits(struct vcpu_tdx *tdx, \
216 u32 field, u##bits val) \
217 { \
218 struct tdx_ex_ret ex_ret; \
219 u64 err; \
220 \
221 tdvps_##lclass##_check(field, bits); \
222 err = tdh_vp_wr(tdx->tdvpr.pa, TDVPS_##uclass(field), val, \
223 GENMASK_ULL(bits - 1, 0), &ex_ret); \
224 if (unlikely(err)) \
225 pr_err("TDH_VP_WR["#uclass".0x%x] = 0x%llx failed: %s (0x%llx)\n", \
226 field, (u64)val, tdx_seamcall_error_name(err), err); \
227 } \
228 static __always_inline void td_##lclass##_setbit##bits(struct vcpu_tdx *tdx, \
229 u32 field, u64 bit) \
230 { \
231 struct tdx_ex_ret ex_ret; \
232 u64 err; \
233 \
234 tdvps_##lclass##_check(field, bits); \
235 err = tdh_vp_wr(tdx->tdvpr.pa, TDVPS_##uclass(field), bit, bit, \
236 &ex_ret); \
237 if (unlikely(err)) \
238 pr_err("TDH_VP_WR["#uclass".0x%x] |= 0x%llx failed: %s (0x%llx)\n", \
239 field, bit, tdx_seamcall_error_name(err), err); \
240 } \
241 static __always_inline void td_##lclass##_clearbit##bits(struct vcpu_tdx *tdx, \
242 u32 field, u64 bit) \
243 { \
244 struct tdx_ex_ret ex_ret; \
245 u64 err; \
246 \
247 tdvps_##lclass##_check(field, bits); \
248 err = tdh_vp_wr(tdx->tdvpr.pa, TDVPS_##uclass(field), 0, bit, \
249 &ex_ret); \
250 if (unlikely(err)) \
251 pr_err("TDH_VP_WR["#uclass".0x%x] &= ~0x%llx failed: %s (0x%llx)\n", \
252 field, bit, tdx_seamcall_error_name(err), err); \
253 }
254
255 TDX_BUILD_TDVPS_ACCESSORS(16, VMCS, vmcs);
256 TDX_BUILD_TDVPS_ACCESSORS(32, VMCS, vmcs);
257 TDX_BUILD_TDVPS_ACCESSORS(64, VMCS, vmcs);
258
259 TDX_BUILD_TDVPS_ACCESSORS(64, APIC, apic);
260 TDX_BUILD_TDVPS_ACCESSORS(64, GPR, gpr);
261 TDX_BUILD_TDVPS_ACCESSORS(64, DR, dr);
262 TDX_BUILD_TDVPS_ACCESSORS(64, STATE, state);
263 TDX_BUILD_TDVPS_ACCESSORS(64, STATE_NON_ARCH, state_non_arch);
264 TDX_BUILD_TDVPS_ACCESSORS(64, MSR, msr);
265 TDX_BUILD_TDVPS_ACCESSORS(8, MANAGEMENT, management);
The defined macro functions are also utilized by the vmread and vmwrite macro functions
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
19 #define VT_BUILD_VMCS_HELPERS(type, bits, tdbits) \
20 static __always_inline type vmread##bits(struct kvm_vcpu *vcpu, \
21 unsigned long field) \
22 { \
23 if (unlikely(is_td_vcpu(vcpu))) { \
24 if (KVM_BUG_ON(!is_debug_td(vcpu), vcpu->kvm)) \
25 return 0; \
26 return td_vmcs_read##tdbits(to_tdx(vcpu), field); \
27 } \
28 return vmcs_read##bits(field); \
29 } \
30 static __always_inline void vmwrite##bits(struct kvm_vcpu *vcpu, \
31 unsigned long field, type value) \
32 { \
33 if (unlikely(is_td_vcpu(vcpu))) { \
34 if (KVM_BUG_ON(!is_debug_td(vcpu), vcpu->kvm)) \
35 return; \
36 return td_vmcs_write##tdbits(to_tdx(vcpu), field, value); \
37 } \
38 vmcs_write##bits(field, value); \
39 }
Comments powered by Disqus.