Utilising machine learning and augmented reality technology to create a real-world adblock - Master's thesis

gergokocsis
Feb 19
4 min read

Updated: Feb 20

Read full paper and access pre-trained models

Grade awarded: Distinction (81%)

Supervisor's feedback:

"Good source material for a journal paper."

"Detailed appraisal of different ML libraries and a thorough discussion of their evaluation justifying choices, can’t fault it."

"Detailed discussion of possible libraries and their interconnection, how they are evaluated, well documented and interesting to the professional."

"Implementation is professional and the deliverable should be of interest to industry. AI applications are a hot topic and this work is very much of the moment."

Abstract

There is growing concern regarding the psychological effects of advertising humans are subjected to on a daily basis, which is reflected in the population’s perspective on potentially harmful activities and products such as gambling and fast food. While exposure can be prevented online by using adblocking software, there exists no real-world alternative to censor traditional methods of advertising. While previous research identified potential approaches to the identification of advertisements and their geometrical extent in the real world utilising machine learning techniques, these were limited to pre-recorded and gathered data such as videos and static images as opposed to real-time application which could be used on a daily basis. This research examined the applicability of individual machine learning techniques previously utilised such as ADNet and UNET and their potential to work in combination of each other alongside augmented reality techniques of object tracking like the Lucas-Kanade algorithm to create a software prototype which is able to identify, segment, and track areas of interest in real time. Findings suggest that these cutting-edge techniques can be applied in the real world to block advertisements in real time, achieving previously established, industry-standard benchmarks in terms supervised evaluation through accuracy, dice score and rand error, alongside unsupervised evaluation through examining the geometrical shape of image segmentation, as well as the reconstructed segmentation mask from point tracking. Execution times of machine learning and augmented reality techniques further support potential for real-time application. Although augmented reality technology is not yet advanced enough for widespread adaptation, attitude from investigated literature remains optimistic about the future of this technology. While more technological development is necessary to fully materialise this theoretical approach, results gathered from this software prototype strongly supports the hypothesised approach’s applicability to this real-world problem.

Demonstration

Demonstration of hypothesised pipeline. Performance (FPS) not evaluated.

Results

ADNet performance

UNET performance

Model performance against test dataset

Metric	Minimum	Maximum	Mean	Required
Accuracy	99.69%	99.99%	99.95%	97.39%
Dice score	98.24%	99.90%	99.49%	96.5%
Rand error	1.2e-4	3.7e-3	8.4e-4	4.0e-2

Example mask generated

Lucas-Kanade performance

Reconstructed (tracked) mask performance compared to true mask and predicted mask

Metric	True to predicted	True to tracked	Predicted to tracked	Required
Accuracy	99.88%	99.56%	99.5%	97.39%
Dice score	99.56%	98.27%	98.03%	96.5%
Rand error	2.3e-3	8.8e-3	1.0e-2	4.0e-2

Mask generated from first frame of video

Reconstructed mask following tracking compared to ground truth and UNET predicted mask

Conclusion

In this paper a novel approach was presented to identify, segment, track and, areas of interest in real-time utilising machine learning and augmented reality techniques for real-world application. The pipeline steps: identification, segmentation, and tracking all performed within expectation outlined following industry-standard evaluation.

Image classification using ADNet achieved 100% accuracy on the test dataset, performing better than the required 93.06% suggesting real-world applicability. Furthermore, it outperformed Inception-ResNet-v2, which was identified as a potential contender. It also achieved an average of 16.88ms prediction time, which highlights its suitability for real-time application, as required in Section 1.3.3.

Image segmentation using UNET also outperformed the required metrics. On average, it achieved a 99.95% accuracy, 99.49% dice score, and 8.4e-4 rand error and all generated masks of the test set visually appeared to have 4 sharp corners. These metrics are all within the outlined requirements in Section 4.3.2 and suggest it generalises well to unseen problems and is appropriate for real world application. In addition, its average prediction time of 18.88ms also supports the argument that it is applicable for real-time applications. This timeframe combined with the detection time totals around 35.76ms on average, as required in Section 1.3.3.

Totalling the maximum recorded times of 21ms (image classification) and 24ms (image segmentation) gives a total of 45ms in a worst case scenario, which is less than the allocated 50ms. This suggests that the chosen approaches to pipeline steps 1 and 2 are capable of censoring areas of interest before a human brain has time to recognise any content before image modification, reinforcing the potential of real-world application, as highlighted in Section 1.3.3.

Following point tracking with the Lucas-Kanade point tracking algorithm, the reconstructed segmentation mask also performed within the required parameters of segmentation evaluation. This achieved on average 99.56% accuracy, 98.27% dice score, and 8.8e-3 rand error compared to the ground truth. It’s average prediction time of 6.97ms outperforms generating new segmentation masks every frame, justifying its inclusion in the pipeline. Additionally, it also supports its potential for real-time application, as its average processing time is within the allocated timeframe of 10ms to allow for a framerate of at least 50 FPS. However, during testing the tracking speed reached as high as 18.86ms, which could introduce fluctuating processing speed on a per frame basis. While this may be distracting for users, human trials are necessary to definitively answer this question, as highlighted in Section 6.6.

Finally, these steps were visualised to show how it would look if implemented on augmented reality glasses in the future when technology makes this application plausible.

In conclusion, metrics gathered by evaluating the implementations of machine learning and augmented reality techniques alongside evidence presented during research suggest that this prototype software may be a possible solution to the outlined problem. However, implementation of this software, which is able to identify, track, segment, and censor areas of interest in real-time on AR hardware is technologically not possible at this point in time.