Performance counter 3
Published:
In this post, We will explore three monitor modes supported by perf_event_open() to monitor events.
Counting and Sampling Mode
Counting mode is the default mode for perf_event_open(). When we are only interested in the total number of times an event occurs, counting mode is the best choice. perf_event_open() return a file descriptor that can be used to read the counter value.
int count;
read(fd, count, sizeof(*count));
Sampling mode has two sub-modes: sampling by event-occurrence and sampling by frequency.
Sampling by event-occurrence mode is that we preset a threadhold/overflow value for the performance counter. when a event occurs that many times, a snapshot of information such as timestamp, current event count, thread id, cpu id, etc. will be saved to the mmap, which is accessible in the userspace.
Here is how to configure attr struct for sampling by event-occurrence mode:
attr.sample_period = 10; // take a snapshot every 10 occurrence of the event
attr.freq = 0; // 0 configure event-occurrence modem, 1 configure frequency mode
// we want the snapshot to containt the following information:
attr.sample_type = PERF_SAMPLE_IP | PERF_SAMPLE_TID | PERF_SAMPLE_TIME | PERF_SAMPLE_READ | PERF_SAMPLE_CPU | PERF_SAMPLE_PERIOD;
attr.wakeup_events = 1; //wake up after N new samples/snapshots have been written to the mmap ring buffer
Sampling by frequency mode
Sampling by frequency mode is a little bit conter-intuitive.
Let’s see how to configure attr struct for sampling by frequency mode first.
// Frequency-based sampling: sample at N Hz
attr.sample_freq = 30000;
attr.freq = 1; // Enable frequency mode
attr.sample_type = PERF_SAMPLE_IP | PERF_SAMPLE_TID | PERF_SAMPLE_TIME | PERF_SAMPLE_READ | PERF_SAMPLE_CPU | PERF_SAMPLE_PERIOD;
attr.wakeup_events = 1;
30000 Hz means we are expecting to sample 30000 times or taking 30000 snapshots per second. But the PMU can’t trigger by time — it can only trigger by event counts. Let’s define:
- λ = event rate = events per second for your hardware event
- P = period = events per sample
- fₛ = sample rate = samples per second (what we care about)
By definition:
[ \text{events per second} = \text{samples per second} \times \text{events per sample} ]
So:
[ \lambda \approx f_s \times P ]
and therefore:
[ f_s \approx \frac{\lambda}{P} ]
To achieve user-desired sample rate fₛ, the kernel will try to dynamically adjust the period P. The event rate λ is fixed since it is dependent on the program itself and the underlying hardware. Sample period P means how many occurances of the event we have to wait for before taking a snapshot.
Sampling Feedback Loop
On every sampling overflow:
- Read the hardware counter:
current_count = pmu_read()
- Compute time since last sample:
delta_time = now - last_time
- Estimate actual sampling frequency:
actual_freq = 1 / delta_time
- Compare with the target frequency:
target_freq = sample_freq
- Adjust the event period:
- If
actual_freq > target_freq
→ Sampling too fast → Increase period
- If
- If
actual_freq < target_freq
→ Sampling too slow → Decrease period
Example
Here is an example to illustrate the concept.
{
"ip": "0x5591f152694d",
"pid": 5440,
"tid": 5440,
"cpu": 0,
"time": 34369394238916,
"count": 5,
"period": 1
},
{
"ip": "0x5591f1526c23",
"pid": 5440,
"tid": 5440,
"cpu": 0,
"time": 34369394243207,
"count": 7,
"period": 2
},
PERF_SAMPLE_TIME is in nanoseconds, from a monotonic clock (typically CLOCK_MONOTONIC).
Δt = 34369394243207 - 34369394238916 = 4291 ns = 4.291 × 10⁻⁶ seconds
Δcount = 7 - 5 = 2 events
The event rate λ = Δcount / Δt = 2 / 4.291 × 10⁻⁶ = 466091.82 events per second. Therefore, the current frequency fₛ is λ / period = 466091.82 / 1 = 466091.82 events per second. So the kernel need to increase the P in the next snapshot because the actual frequency is higher than the target frequency 30000. This explains why the period is 2 in the second snapshot. In the next snapshot below, we can observe that count = previous count + period updated in the previous snapshot = 7 + 2 = 9, which is consistent with the current event count.
Note: period here is actually 2 becuase 1 is what the kernel thought it wanted for the period, and it didn’t line up with the actual count in the next snapshot. So in reality fₛ is λ / period = 466091.82 / 2 = 233046 samples/s
{
"ip": "0x5591f1526c21",
"pid": 5440,
"tid": 5440,
"cpu": 0,
"time": 34369394247374,
"count": 9,
"period": 4
},
As the kernel keep adjusting the period, the frequency will eventually converge to the target frequency 30000. And the period will be stable around values. As you can see, the period is around 1389 and 1390 after many many snapshots. And the sampling frequency is
Δt = 34369396620000 - 34369396586648 = 33352 ns = 3.3352 × 10⁻⁵ seconds
Δcount = 96097 - 94708 = 1389 events
Event Rate Calculation
The event rate is:
[ \lambda = \frac{\Delta \text{count}}{\Delta t} = \frac{1389}{3.3352 \times 10^{-5}} \approx 4.1647 \times 10^{7}\ \text{events/s} ]
Result:
≈ 41,647,000 events per second
Sampling Frequency Calculation
Now plug into:
[ f_s \approx \frac{\lambda}{P} \approx \frac{4.1647 \times 10^{7}}{1389} \approx 2.998 \times 10^{4}\ \text{samples/s} ]
Result:
→ ≈ 29,980 samples/sec
→ ≈ 30 kHz, exactly what you requested.
{
"ip": "0x5591f1526c47",
"pid": 5440,
"tid": 5440,
"cpu": 0,
"time": 34369396586648,
"count": 94708,
"period": 1389
},
{
"ip": "0x5591f1526c21",
"pid": 5440,
"tid": 5440,
"cpu": 0,
"time": 34369396620000,
"count": 96097,
"period": 1390
},
{
"ip": "0x5591f1526c28",
"pid": 5440,
"tid": 5440,
"cpu": 0,
"time": 34369396653155,
"count": 97487,
"period": 1390
},
