in this post i will try to estimate resolution and correctness of performance counters in intel processors. i should remark that these test are conducted in a single threaded application. we shouldn't forget multi-threaded programs can led very different results.
perf is a powerful tool to measure hardware performance counters in linux. perf also provides a syscall interface to interact with various counters like llc miss.
syscall(__NR_perf_event_open, ....)
creating a last level cache miss is simple. first we should create a variable, after we should flush (with using
clflush
instruction) from cache and access it again. trying to access this variable causes a cache miss, and cpu needs to fetch from main-memory. we can simplify this process as follow:
flush(&k);
__asm__("nop;lfence;");
access_memory(&k);
to measure correctness of performance counters we will create
n
number of cache misses and compare with the expected results. results are shown like that :
sample count | expected | observed (mean) | standart deviation | correctness (%) |
---|---|---|---|---|
3000 | 1 | 1.0026 | 0.051579 | 99,9331551 |
3000 | 3 | 3.0001 | 0.143783 | 98,1333333 |
3000 | 10 | 10.0023 | 0.14945 | 98,1666667 |
3000 | 20 | 20.013 | 0.231334 | 96,4666667 |