Blog | Mind Foundry

Innovating Civil Infrastructure with Computer Vision

Written by Claire Butcher | Sep 5, 2024 10:27:55 AM

Managing civil infrastructure is a challenge that requires innovative new approaches and solutions to keep up with demands on time and resources. With computer vision, we can harness the power of AI to revolutionise how our built assets are inspected.


Civil engineering literally transforms the world around us and represents centuries of generational investment into our societies. But as our civil infrastructure ages, it begins to deteriorate, and as more assets decline, the process of maintaining and managing them becomes more difficult, more costly, and more time-sensitive. It’s time we applied the latest technological advancements to this vital sector and bring infrastructure management into the 21st century.

Keeping the structures we rely on safe and operational for the ever-evolving demands of modern life is no small feat for engineers and asset owners alike. After all, we often expose them to numerous demands that the original designers could not have foreseen. Classic examples include modern Heavy Goods Vehicles (HGV) driving over bridges that were constructed before the invention of lorries. However, there are also more modern examples, such as multi-story car parks built in the 1960s that take unplanned additional loading from the rise of electric vehicles.

While the pace of innovation has resulted in greater demands on these ageing assets, it has also led to the development of new solutions to help manage them. Some of these solutions have emerged through combining expertise from other sectors with traditional civil engineering knowledge to tackle old problems from new angles. A prime example of this is the use of computer vision for the intelligent monitoring of structural assets. 

What is Computer Vision?

Computer vision is a rapidly evolving field that focuses on enabling computers and systems to interpret and process visual information from the world, such as images or videos, in a way that mimics human vision. Computer vision allows autonomous driving vehicles to “see” the road in front of them. It allows spatial computing platforms like the Apple Vision Pro to seamlessly blend augmented reality digital graphics with the real world. It also allows your local parking lot to read your license plate as soon as you pass by a camera. It is a rapidly evolving field, existing within an ever-developing ecosystem. As a result, varying terminology and overlapping definitions within this sphere make it challenging to understand from a layman’s perspective. To understand computer vision, we need to familiarise ourselves with the terms that encompass it.

The first of these terms is Artificial Intelligence (AI). Computer vision is a subfield of AI. There is no general definition of AI that enjoys widespread consensus, but at Mind Foundry, we think of AI as a machine that performs tasks which had previously been possible exclusively by human intelligence. Computer vision overlaps with AI because it involves using algorithms and models to enable machines to understand and interpret visual data intelligently. Essentially, computer vision gives machines the ability to "see" and make decisions based on visual inputs. The threshold determining what computer vision tasks are construed as “intelligent”, however, is blurred and is evolving as our interactions with AI increase. For instance, tasks like automatic number plate recognition are so widely used and predate the recent surge in AI interest that many people forget they are examples of AI.

Machine Learning refers to a branch within AI that focuses on writing “software by example” by providing the computer with illustrative examples to “learn” from. Unlike classical programming, which requires a person to provide rules and data, and then the computer provides the answer, the Machine Learning approach asks you to provide data and answers, and then it tells you the rules. The algorithms and models that constitute the AI that gives computer vision unique capabilities for very specific tasks are frequently based on Machine Learning, especially deep learning. Deep learning refers to a particular type of modelling within Machine Learning whereby multi-layered neural networks are used to solve tasks. These neural networks are formed by interconnected nodes similar to a simplistic representation of the neurons in our brains and are particularly adept at dealing with data such as imagery and text.

Each of these terms does not exist as an island; there are many instances where they overlap.


It is also worth noting that the above diagram is inherently subjective. Each of the terms utilised is an umbrella term for concepts or techniques with slightly varying definitions depending on the source provided. Hence, there will be a multitude of variations on the diagram, depending on the date of publication of the source as well as the context of the article or paper.

We often think of these concepts and technologies as being relatively modern inventions. However, computer vision actually has its roots in the late 1950s, while the concept of AI dates back even further. Recent developments in data availability, hardware, software, and Machine Learning techniques have converged and enabled computer vision to transition from science projects to actionable solutions. Making the present day an ideal time to consider how computer vision solutions can be utilised to address complex problems.

The Key Concepts That Underlie Computer Vision

In its simplest form, computer vision is made up of two core concepts:

  1. Image formation - Receiving visual inputs.
  2. Machine perception - Processing and interpreting these inputs to extract information. It is at this stage that Machine Learning, deep learning, or photogrammetric techniques are applied.

Image Formation

Image formation is governed by our choice of hardware devices. This can range from any sensory device capable of producing images or other visual outputs. There is an ever-increasing choice, from phone cameras, vehicle-mounted rigs, and infrared (IR) sensors to Light Detection and Ranging (LiDAR) and Synthetic Aperture Radar (SAR). 

One key advantage of computer vision is that it can provide us with data from wavelengths other than the visual spectrum, such as infrared and microwave. It can also do so with greater detail than the human eye can perceive.  When choosing the best device or sensor for a computer vision system, there is no one-size-fits-all approach. The right sensor for one solution is the wrong for another. One must first understand the problem they are trying to solve and then weigh the pros and cons of cost, speed, data quality, and many other factors.

Navigating this tightrope is one of the components required to make computer vision solutions an operational reality. Once you have decided on the most appropriate sensor for your problem, you can begin to capture images or visual outputs to use a more technical term. Examples of these visual outputs could be as simple as a 2D image or as complex as a 3D point cloud.

In the case of images, they are composed of pixel arrays, with each pixel carrying its associated information, such as Red-Green-Blue (RGB) colour space and location. Point clouds, on the other hand, are sets of data containing points located in a 3D coordinate system, representing the surfaces of objects. These 3D point clouds can be created by applying photogrammetric techniques to overlapping images or through a LiDAR scan. In more advanced cases, the point cloud may also have an RGB colour space value associated with each point to allow us to perceive colour as well as shape.  

Machine Perception

But how do you go from the abstract world of a 3D point cloud to the boots-on-the-ground, real-world problems civil engineers must solve daily? Myriad techniques can be used to interpret our visual data and transform it into meaningful outputs. These vary depending on the data available and the end goal of the solution. In the case of condition intelligence of structures, various methods are available to coax out information from visual data.

In the case of existing structures, there are varying asset types made of a wide range of materials, providing different functions and, consequently, prone to varying types of deterioration. Using relevant image datasets, we can utilise computer vision to categorise the materials used, the key components (be that from a structural perspective or safety perspective) and the defects present. By identifying the visual signs of defects and contextual information, the condition of the asset can be determined to enable engineers to make diagnoses and recommendations.

In practice, this may look like computer vision being used to identify that an image contains concrete, that there is spalling present in the image, and that compared to a paired photo of the same defect from the last inspection, the spalling has increased by approximately 30cm². Through presenting engineers with clearly presented, objective, quantifiable metrics, better engineering decisions can more efficiently be made.

Visual historical information about an asset can also be incorporated into this assessment to highlight and quantify what has changed based on visual metrics. For instance, we can highlight that an area of spalling on a structurally significant element has increased by 20% in a time series of photos captured routinely. This allows paths of deterioration to be predicted and the rates at which they occur to be calculated. Gathering this data allows us to predict what will occur if no action is taken.

Understanding how a defect has changed over time can help predict how it will deteriorate in the future.

Finally, the results of this process facilitate human-AI collaboration for portfolio-level analysis, removing subjectivity and increasing the quality of results presented to aid in decision-making. This allows any concerning trends to be highlighted and enables asset stock owners to prioritise remedial works where they are objectively needed most. This prioritisation can be combined with constraining factors, such as budget limitations, to optimise remedial works to effectively tackle the needs of the asset stock on a macro-scale.

There are further civil engineering solutions outside of asset condition management that machine perception can contribute towards. An example would be analysing earth observation images to categorise land use as part of the infrastructure planning process. Alternatively, on construction sites, computer vision can be used to automatically quantify material pile sizes or detect workers entering exclusion zones in plant operators’ blind spots. The possibilities are manifold, but there are certain constraints and considerations to consider.

Data Considerations

As a civil engineer, the detailed mechanisms of the sensors may seem irrelevant. However, it is important to understand the method by which data is collected and the associated impacts and limitations. Understanding the hardware in a data collection pipeline is a key part of the considerations required to build a system suitable for use in the real world. 

As an overly simple rule of thumb, you want your image formation stage to detect the objects and metrics you care about in sufficient detail and accuracy. In general, it is useful to consider what you, as a human, can see from the images to approximate the quality and feasibility. For instance, is the object of interest barely a pixel wide in your photographs? Or is the subject of the image so zoomed in that there is no discernable context to locate it on a wider image?

All solutions in this space require visual data during the training or calibration stages, and the availability of this data directly impacts solution feasibility. As such, high-quality, readily available data is crucial to the workflow in creating a solution suitable for deployment in safety-critical environments. 

What does this good training data look like? In an ideal world, the dataset would be as large as possible, covering every possible variation and combination of variations, perfectly representative of operational scenarios, and accurately labelled on a pixel level. Unsurprisingly, this is not the reality we find ourselves in. Datasets are frequently imperfect and do not always hold value; therefore, being able to distil value and maximise the performance and explainability of models trained on imperfect data is one of the key challenges to rise to.

The Benefits of Computer Vision for Infrastructure Inspections

Computer vision has the potential to enhance almost every visual aspect of civil engineering and equip us with new and exciting abilities. It can allow us to improve how we monitor and analyse assets in safety-challenging environments by processing vast quantities of visual data without errors resulting from boredom or fatigue. It can allow us to “see” in spectrums outside of human vision and quantify objects of interest, and it can detect small defects and changes not visible to the human eye. Ultimately, if adopted and integrated successfully, it can revolutionise the asset inspection process.

Although the spectrum of possibilities is vast, the challenges with implementing computer vision solutions are also considerable. By working with expert partners in the AI field, you can collaborate to curate potential solutions, considering aspects from the full spectrum of available techniques and their suitability for the available data. With timely action, strategic partnerships, and an innovation mindset, infrastructure managers will feel the full benefits of AI as it transforms their processes for the better.


Enjoyed this blog? Read our piece on Addressing the Infrastructure Condition Crisis.