../kvm-vpmu

KVM: vPMU

Table of contents

PMU architecture support

We have the following registers:

RegisterAddressPurpose
PMC0..PMCm0xc1 + i
PERFEVTSEL0..PERFEVTSELm0x186 + iConfigure which event PMCi counts
FIXED_PMC0..FIXED_PMCn0x309 + i
FIXED_CTR_CTL0x38dConfigure which events all FIXED_PMC count
PERF_CAPACILITIES0x345(Read-only) Indicating perfmon version
GLOBAL_STATUS0x38e
GLOBAL_CTRL0x38f
GLOBAL_STATUS_RESET/GLOBAL_OVF_CTRL0x390
GLOBAL_STATUS_SET0x391
GLOBAL_INUSE0x392
PEBS_ENABLE0x3f1
PEBS_DATA_CFG0x3f2
DS_AREA0x600

In theory, there could be a total of 64 PMU counters, which consisting of 32 general purpose counters and 32 fixed purpose counters. But in reality, our Icelake 8360Y has 8 general and 4 fixed counters per logical processor.

This can be reflected by the design of some control registers, such as GLOBAL_CTRL which is 64 bits long. GLOBAL_CTRL's lower bits are used for enabling/disabling general purpose counters, and the bits starting at the 32-nd bit are for fixed purpose counters.

I have made a cheatsheet:

cheatsheet

pmu-msrs

Linux perf core

The job of Linux perf core is to facilite the sharing of the PMU hardware. It should be able to let each user process think it owns the PMU exclusively, and perform state saving or switching seamlessly.

Concepts

The initialization code intel_pmi_init() of Linux perf core will emit such dmesg on kernel boot:

[    0.709089] Performance Events: PEBS fmt4+-baseline,  AnyThread deprecated, Icelake events, 32-deep LBR, full-width counters, Intel PMU driver.
[    0.709089] ... version:                5
[    0.709089] ... bit width:              48
[    0.709089] ... generic registers:      8
[    0.709089] ... value mask:             0000ffffffffffff
[    0.709089] ... max period:             00007fffffffffff
[    0.709089] ... fixed-purpose events:   4
[    0.709089] ... event mask:             0001000f000000ff

NameMeaningWhere
VersionArchitectural perfmon versioncupid(10)
FormatPEBS record format

Data structures

Global initialization

Event creation

PEBS buffer draining

KVM vPMU architecture

KVM will initialize vPMU in kvm_init_pmu_capability() and kvm_ops_update()->kvm_pmu_ops_update(). kvm_pmu_ops_update() will update ops according to intel_pmu_ops listed below:

struct kvm_pmu_ops intel_pmu_ops __initdata = {
	.hw_event_available = intel_hw_event_available,
	.pmc_idx_to_pmc = intel_pmc_idx_to_pmc,
	.rdpmc_ecx_to_pmc = intel_rdpmc_ecx_to_pmc,
	.msr_idx_to_pmc = intel_msr_idx_to_pmc,
	.is_valid_rdpmc_ecx = intel_is_valid_rdpmc_ecx,
	.is_valid_msr = intel_is_valid_msr,
	.get_msr = intel_pmu_get_msr,
	.set_msr = intel_pmu_set_msr,
	.refresh = intel_pmu_refresh,
	.init = intel_pmu_init,
	.reset = intel_pmu_reset,
	.deliver_pmi = intel_pmu_deliver_pmi,
	.cleanup = intel_pmu_cleanup,
	.EVENTSEL_EVENT = ARCH_PERFMON_EVENTSEL_EVENT,
	.MAX_NR_GP_COUNTERS = KVM_INTEL_PMC_MAX_GENERIC,
	.MIN_NR_GP_COUNTERS = 1,
};

These ops could be called via static_call(kvm_x86_pmu_<OP>)(<ARGS>), e.g. static_call(kvm_x86_pmu_get_msr)(vcpu, msr_info).

Linux perf core in guests

This is what the the same perf core initialization code in guests emits:

[    0.499971] Performance Events: PEBS fmt4+-baseline, Icelake events, 32-deep LBR, full-width counters, Intel PMU driver.
[    0.500059] ... version:                2
[    0.500061] ... bit width:              48
[    0.500062] ... generic registers:      8
[    0.500063] ... value mask:             0000ffffffffffff
[    0.500064] ... max period:             00007fffffffffff
[    0.500064] ... fixed-purpose events:   3
[    0.500065] ... event mask:             00000007000000ff