Bridge inspection is one important operation that must be performed periodically by public road administrations or similar entities. Inspections are often carried out manually, sometimes in hazardous conditions. Furthermore, such a process may be very expensive and time consuming. Recently companies such as Orbiton AS have started providing bridge inspection services using drones (multicopters) with high resolution cameras. These are able to perform and inspect bridges in many adverse conditions, such as with a bridge collapse, and/or inspection of the underside of elevated bridges. The videos and images acquired with this method are first stored and then subsequently reviewed manually by bridge administration engineers, who decide which actions are needed. Even though this sort of automation provides clear advantages, it is
still very time consuming, since a physical person must sit and watch hours and hours of acquired video and images. Moreover, the problem with this approach is twofold. Not only are man-hours an issue for infrastructure asset managers, so is human subjectivity. Infrastructure operators are nowadays requesting methods to analyse pixel-based datasets without the need for human intervention and interpretation. The end result desired is to objectively conclude if their assets present a fault or not. Currently, this conclusion varies according to the person doing the image interpretation and analysis. Results are therefore inconsistent, since the existence of a fault or not is interpreted differently depending on the individual. Where one individual sees a fault, another may not. Developing an objective fault recognition system would add value to existing datasets by providing a reliable baseline for infrastructure asset managers. One of the key indicators most asset managers look for during inspections is the presence of corrosion. Therefore, this feasibility study has focused on automatic rust detection. This project created an autonomous classifier that enabled detection of rust present in pictures or frames. The challenge associated with this approach was the fact that the rust has no defined shape and colour. Also, the changing landscape and the presence of misleading object (red coloured leaves, houses, road signs, etc) may lead to miss-classification of the images. Furthermore, the classification process should still be relatively fast in order to be able to process large amount of videos in a reasonable time.
We decided to implement one version of classic computer vision (based on red component) and one deep learning model and perform a comparison test between the two different approaches.
For the Classic Approach we used OpenCV libraries to detect, filter out and count the red pixels in the image. If the number of red pixels was more than 0.3% than the image was classified as rust. This approach gave us few false negative (it classified as non-rust images where there was actually rust) but it performed poorly to detect false positives (detecting a lot of images as rust while there were not; for examle red apple was classified as rust, since is red!)
For the deep learning approach, we used CAFFE as framework. This framework is specifically suited for image processing, offering good speed and great flexibility. It also offers the opportunity to easily use clusters of GPUs support for model training which could be useful in the case of large networks. Furthermore, it is released under a BSD 2 license. The first step was to collect a good dataset to be used to train the network. We were able to collect around 1300 images for the “rust” class and 2200 images for the “non-rust” class. Around 80% of the images were used for the training set, while the rest was used for the validation set. Since the dataset was relatively small, we decided to fine tune an existing model called “bvlc_reference_caffenet” which is based on the AlexNet model and released with license for unrestricted use. In fine tuning, the framework took an already trained network and adjusted it (resuming the training) using the new data as input.
Test results have shown that the deep learning model performed generally better than the open-cv model, haveing a better accuracy up 88% (19% more than the open-cv based solution). The results were also presented in the “3rd International Conference on Artificial Intelligence and Applications (AIAP-2016)” in Vienna (Austria). Full paper is available here –> full_paper.pdf
We have also a presentation available on youtube where techniques are explained more in detail –>
If you have questions, suggestions or if you are just curious about techology/business case behind this project, just contact us!