» » Jetpack Microbenchmark - testing code performance

Jetpack Microbenchmark - testing code performance

In mobile development, situations occasionally arise when it is necessary to estimate the execution time of the code. In addition to theoretical approaches (for example, Big O), which allow you to weed out obviously unsuccessful solutions, there are benchmarks for testing code and finding smaller differences. 

In this article, I will tell you how the Microbenchmark library from Google is arranged and works, as well as show examples of use. With it, you can not only evaluate performance, but also resolve disputes during code reviews.

When it comes to estimating code execution time, the first thing that comes to mind is something like this:

val startTime = System.currentTimeMillis()
//Execute the code we want to evaluate
val totalTime = System.currentTimeMillis() - startTime[/code]

This approach is simple, but has several drawbacks:

  • does not take into account the "warming up" of the code under study;

  • does not take into account the state of the device, for example, Thermal Throttling;

  • gives only one result, with no idea of ​​the runtime variance;

  • can make it harder to isolate code under test.

It is for these reasons that estimating the execution time is not as trivial a task as it might seem at first glance. There is a solution in the form of, for example, Firebase Performance Monitoring, but it is more suitable for performance monitoring in production and not very suitable for isolated parts of code. 

The library from Google will cope with the solution of this problem better. 

What is microbenchmark

Microbenchmark is a Jetpack library that allows you to quickly evaluate the execution time of Kotlin and Java code. To some extent, it can exclude the influence of warming up, throttling and other factors on the final result, and also generate reports to the console or JSON file. Also, the tool can be used with CI, which will allow you to notice performance problems at the initial stages.

Detailed information on connection and configuration can be found in the documentation . Or in the GitHub repository .

The library gives the best results when profiling code that is used repeatedly. Good examples would be RecyclerView scrolling, data transformation, and so on. 

It is also desirable to exclude the influence of the cache, if any - this can be done by generating unique data before each run. Also, performance tests require specific settings (for example, disabled debuggable), so the right decision would be to put them in a separate module.

How Microbenchmark works

Let's see how the library is arranged.

All benchmarks are launched inside IsolationActivity (the AndroidBenchmarkRunner class is responsible for the first launch), and this is where the initial setup takes place. 

It consists of the following steps:

  1. Having other Activities with a test. In case of duplication, the test will fail with an exception - "Only one IsolationActivity should exist".

  2. Checking Sustained Mode support . This is a mode in which the device can maintain a constant level of performance, which is good for the consistency of the results.

  3. Starting the BenchSpinThread process in parallel with the test with THREAD_PRIORITY_LOWEST. This is done so that at least one core is constantly loaded, only works in combination with Sustained Mode.

In general terms: the job of a benchmark is to run the code from the test a certain number of times and measure the average time it takes to complete. But there are subtleties. For example, with this approach, the first launches will take several times longer. The reason is that the code under test may have a dependency that spends a lot of time initializing. In a way, this is similar to a car engine that needs some time to warm up.

Before the test runs, you need to make sure that everything is working in the normal mode and the warm-up is completed. In the library code, its end is the state when the next test run gives a result that fits within the boundaries of a certain error. 

All the main logic is contained in the WarmupManager class, and this is where all the magic happens. The onNextIteration method contains the logic for determining if the benchmark is stable. The variables fastMovingAvg and slowMovingAvg store the average runtime indicators of the benchmark, which converge to the average value with some error (the error is stored inside the TRESHOLD constant).

    fun onNextIteration(durationNs: Long): Boolean {
        iteration++
        totalDuration += durationNs

        if (iteration == 1) {
            fastMovingAvg = durationNs.toFloat()
            slowMovingAvg = durationNs.toFloat()
            return false
        }

        fastMovingAvg = FAST_RATIO * durationNs + (1 - FAST_RATIO) * fastMovingAvg
        slowMovingAvg = SLOW_RATIO * durationNs + (1 - SLOW_RATIO) * slowMovingAvg

        // If fast moving avg is close to slow, the benchmark is stabilizing
        val ratio = fastMovingAvg / slowMovingAvg
        if (ratio < 1 + THRESHOLD && ratio > 1 - THRESHOLD) {
            similarIterationCount++
        } else {
            similarIterationCount = 0
        }

        if (iteration >= MIN_ITERATIONS && totalDuration >= MIN_DURATION_NS) {
            if (similarIterationCount > MIN_SIMILAR_ITERATIONS ||
                totalDuration >= MAX_DURATION_NS) {
                // benchmark has stabilized, or we're out of time
                return true
            }
        }
        return false
    }[/code]

In addition to warming up the code inside the library, Thermal Throttling detection is implemented. You should not allow this state to affect tests, because due to the throttling of cycles, the average execution time increases. 

Overheating detection works much easier than WarmupManager. The isDeviceThermalThrottled method checks the execution time of a small test function within this class. Namely, the time of copying a small ByteArray is measured.

private fun measureWorkNs(): Long {
        // Access a non-trivial amount of data to try and 'reset' any cache state.
        // Have observed this to give more consistent performance when clocks are unlocked.
        copySomeData()

        val state = BenchmarkState()
        state.performThrottleChecks = false
        val input = FloatArray(16) { System.nanoTime().toFloat() }
        val output = FloatArray(16)

        while (state.keepRunningInline()) {
            // Benchmark a simple thermal
            Matrix.translateM(output, 0, input, 0, 1F, 2F, 3F)
        }

        return state.stats.min
    }

    /**
     * Called to calculate throttling baseline, will be ignored after first call.
     */
    fun computeThrottleBaseline() {
        if (initNs == 0L) {
            initNs = measureWorkNs()
        }
    }

    /**
     * Makes a guess as to whether the device is currently thermal throttled based on performance
     * of single-threaded CPU work.
     */
    fun isDeviceThermalThrottled(): Boolean {
        if (initNs == 0L) {
            // not initialized, so assume not throttled.
            return false
        }

        val workNs = measureWorkNs()
        return workNs > initNs * 1.10
    }[/code]

The above data is used when running the main tests. They help to exclude warm-up runs and those affected by throttling (if any). By default, 50 significant runs are performed, if desired, this number and other constants can be easily changed to the necessary ones. But you need to be careful - this can greatly affect the work of the library.

@Before
	fun init() {
		val field = androidx.benchmark.BenchmarkState::class.java.getDeclaredField("REPEAT_COUNT")
		field.isAccessible = true
		field.set(benchmarkRule, GLOBAL_REPEAT_COUNT)
	}[/code]

A little practice

Let's try to work with the library as ordinary users. Let's test the speed of reading and writing JSON for GSON and Kotlin Serialization. 

@RunWith(AndroidJUnit4::class)
class KotlinSerializationBenchmark {
	
	private val context = ApplicationProvider.getApplicationContext<Context>()
	private val simpleJsonString = Utils.readJsonAsStringFromDisk(context, R.raw.simple)
	
	@get:Rule val benchmarkRule = BenchmarkRule()
	
	@Before
	fun init() {
		val field = androidx.benchmark.BenchmarkState::class.java.getDeclaredField("REPEAT_COUNT")
		field.isAccessible = true
		field.set(benchmarkRule, Utils.GLOBAL_REPEAT_COUNT)
	}
	
	@Test
	fun testRead() {
		benchmarkRule.measureRepeated {
			Json.decodeFromString<List<SmallObject>>(simpleJsonString ?: "")
		}
	}
	
	@Test
	fun testWrite() {
		val testObjects = Json.decodeFromString<List<SmallObject>>(simpleJsonString ?: "")
		benchmarkRule.measureRepeated {
			Json.encodeToString(testObjects)
		}
	}
}[/code]

To evaluate the test results, you can use the console in Android Studio or generate a report in a JSON file. Moreover, the report detailing in the console and the file is very different: in the first case, it will be possible to find out only the average execution time, and in the second, a full report with the time of each run (useful for plotting) and other information.

The report settings are located in the Edit Run Configuration > Instrumentation Extra Params window. The parameter that is responsible for saving reports is called androidx.benchmark.output.enable. Additionally, here you can configure the import of values ​​​​from Gradle, which will be useful when running on CI.

 

Now, when running tests, the reports will be saved to the application directory, and the file name will match the class name. An example of the report structure can be viewed here .

Conclusion

On our project, this tool was used to find the best solution among JSON parsers. As a result, Kotlin Serialization won. At the same time, profiling by CPU and memory consumption during testing was very lacking - they had to be removed separately.

It may seem that the tool has little functionality, its capabilities are limited, and the scope is very specific. In general, it is, but in some cases it can be very useful. Here are some cases:

  • Evaluate the performance of a new library in a project.

  • Solving disputable situations for code review, when it is necessary to justify the choice in favor of a particular solution.

  • Collection of statistics and evaluation of code quality over a long period of time when integrating with CI.

Microbenchmark also has an older brother - Macrobenchmark, which is designed to evaluate UI operations, such as launching an application, scrolling and animation. But this is a topic for a separate article.

Related Articles

Add Your Comment

reload, if the code cannot be seen

All comments will be moderated before being published.