UI Testing Powered by Machine Learning: A Fad or a Feasible Opportunity?

Is it possible to automate UI testing using machine learning? We explore a real-world case to find it out.

Published: June 14, 2018

Updated: October 27, 2022

By Yaroslav Kuflinski

The Challenge of Automated UI Testing

Automation of UI testing requires predicting how humans will interact with the UI, both the users and the developers. Although designed as a straightforward procedure, human interaction with computers is notoriously idiosyncratic and therefore difficult to automate. If a field will not take an odd datum, the human operator will find a way to mash it.

One solution to this problem is to simply record a user’s gestures during a test, and then replay that recorded test in future regression test suites. But this too has limitations, and Selenium is a popular manifestation of both success and failure.

Selenium is an open source test automation tool that records a user’s gestures during UI testing, for later playback to test the app again following future code changes. This reuse of existing tests is one of the holy grails of implementing ML in UI testing. But what if a future modification alters the Xpath of a drop-down menu item so that Selenium cannot find the element in future playbacks of recorded tests?

Something always changes and at this precise juncture previously recorded tests fail. This is a perfect illustration of why UI testing is difficult to automate. Today, a QA engineer must open the script generated by Selenium, and modify it to include the new Xpath ID. This is a tedious vacuum of resources into which modern QA engineers cast days and weeks, and it is one of the malicious targets of ML in testware.

Now, the true complexity involved in UI testing is revealed. Engineers are intricately involved in testing. This is why QA services lags in the Agile workflow, and this is a powerful motivator to develop ML programs that can relieve some of the burden of UI testing. Clearly, existing testware including recorders like Selenium is not adequate, and machine learning may be the sharpest arrow in the quiver.

ML for UI Testing Automation

Several lines of attack currently explore using ML algorithms to enhance standard test tool performance and reduce the need for engineers involved in testing. Computer vision is among the most practical today. Most testware systems now use computer vision to determine the completion of visual rendering of pages. This is especially valuable in automated testing of apps that need modification to run on hundreds of different devices and display types. This massive level of testing thousands of simultaneous users on a UI is now orchestrated on cloud-based virtual machines called containers.

Computer Vision to Recognize Page Elements

Computer vision, one of the most successful of emergent ML techs can be used to find a number of bugs in a UI, such as incomplete page rendering, time to complete rendering, and thus in performance testing the benchmarking of “visual completion.” This is a hallmark use of ML in UI testing, and one of the few that really work. Computer vision ML methods rely on supervised learning methods, which use the enormous data stores resulting from previous test runs.

Supervised ML algorithms achieve best results when trained with big data, and in this case, the big data are images. After several test runs, the testware collects enough screenshots of a page layout to supply the necessary data for ML to recognize regions of a page even if they move—this is called object recognition. UI testers can manually or automatically study screenshots for consistency and rendering problems.

The other broad class of ML methods is unsupervised learning, which does not require training data upfront.

K-Means Clustering to Detect Oddities

A current application of unsupervised ML in UI testing is K-means clustering. This type of ML is called unsupervised because it does not require a set of training data to make reasonably good predictions. Imagine that you want to learn which little fishes a dolphin likes to eat. So, you drop one each of five different little fish into the water, and as those little fish swim around the tank, dolphins are attracted to one more than another.

In the language of K-means clustering, the little fish are the Centroids and they classify the dolphin behavior. As dolphins are attracted to one bait and avoid another, a pattern is revealed about the dolphins’ behavior. Upon many iterations of the algorithm, the data—in this metaphor the dolphins—all eventually hover around a delicious little fish. In UI testing, the centroids are page elements and the patterns we seek are in users’ gestures.

Finding Bugs in the Car UI

Drivers operating a dashboard UI in a car exhibit such patterns of behavior, and the patterns can be classified much in the same way as a dolphin attracted to its favorite sardine.

A popular global car manufacturer now contracts a testware company to develop ML-based testing for their embedded dashboard systems. These are typically Android-based UIs that expose the car’s GPS, sound system, climate control, and even some mechanical functions. The theory of using K-means clustering to classify the driver/UI interaction is that, if the ML algorithm fails to accurately predict the user’s gesture, this may indicate a bug in the UI.

Suppose a user of the car dashboard UI changes the language setting. The translated text in the new language overflows onto another button or page element making it unusable to the driver—a malfunction. The driver is now distracted. Subsequent gestures show a sequence that does not fit any previous model of behavior. This also updates the model—it learns the new pattern, and attaches a reason to the behavior—in this case, a bug in element overflow.

Still Far from Perfect

Coding a learning algorithm that can update a heuristic graph based on UI testing is tricky and involves a paradox. When something wonky occurs, it is very difficult to calculate whether it was the ML fault-finding code, the UI under test, or a combination of the two. The author’s lab currently achieves a 76% accuracy from an ML testing algorithm in this class. This is not sufficient for critical testing in a CI environment, but the current momentum driving ML research demonstrates that progress is inevitable.

Should We Give Up?

As all enterprises compete to implement ML in UI testing, ML-based testware is now an intense research focus. However, conundrums and paradoxes continue to bewilder attempts to apply machine learning to user interface testing, and there is only a modicum of success to report today.

In the case of a dashboard UI in a car, the safety of the driver and passengers literally depends on the accuracy of testing. ML algorithms in this critical zone cannot be experimental; nothing less than perfect accuracy is appropriate.

However, this doesn’t mean that we should give up on the idea of using ML for automated UI testing in general. While not removing the need for human intervention entirely, it still can facilitate the process to some extent. In other fields, such as computer CPU chip design, for example, engineers require assistance from ML programs called theorem provers which are now crucial to building and testing new circuits. Besides, ML is evolving rapidly, and the accuracy of such algorithms is bound to increase in the near future.

UI Testing Powered by Machine Learning: A Fad or a Feasible Opportunity?

The Challenge of Automated UI Testing

ML for UI Testing Automation

Computer Vision to Recognize Page Elements

K-Means Clustering to Detect Oddities

Finding Bugs in the Car UI

Still Far from Perfect

Should We Give Up?

Machine Learning in Finance: 7 Applications and Their Global Footprint

Extracting Meaning: The Convergence of Machine Learning and Text Analysis

Industries to Be Transformed by Machine Learning for Image Classification

Machine Learning Overview: Understanding The 'Gold Rush'

The Ins and Outs of UI Customization in SAP Hybris

AI Applications: an Overview of 8 Emerging Artificial Intelligence Use Cases

8 Features of Fintech Apps that Appeal to Millennials

Facial Recognition Software Pros and Cons in the Privacy Age

15 Progressive Web App Examples that Brand Owners can Learn From