Predicting Energy Efficiency across all Welsh Homes

Post by Joseph HC, Data Science Unit, Welsh Government

Darllenwch y dudalen hon yn Gymraeg

Information on the energy efficiency of homes across Wales is helpful for policy makers as evidence for fuel poverty, energy strategy and green initiative policies. In the Data Science Unit, we have been working on a project to build a stronger evidence base for the energy efficiency of Welsh homes.

The most up-to-date source of information available on the energy efficiency of homes is the Energy Performance Certificates (EPC) Register. The Register, however, only covers half of all homes in Wales and is overrepresented by newer, more energy efficient homes. This is because legislation was introduced in 2008 that made EPC surveys a mandatory requirement when selling or renting a property.

This blog describes how we have been using the register of Energy Performance Certificates (EPC) to build a dataset that covers every residential property in Wales. This work builds on a project undertaken by the Office for National Statistics’ Data Science Campus: Using machine learning to predict energy efficiency | Data Science Campus.

Information that is currently available

Despite having a bias toward newer and more energy efficient properties, the EPC Register is a good starting point for building an all-Wales energy efficiency dataset. The calculation of EPC energy ratings is based on the UK Building Research Establishment‘s Domestic Energy Model (BREDEM). EPCs are provided to a property following an energy assessment, performed by a qualified and accredited energy assessor who visits the property and assesses key building features such as:

  • wall, floor, and loft insulation;
  • boiler efficiency;
  • type of heating system;
  • heating controls; and,
  • windows.

The assessor then provides a numeric score, called the Energy Efficiency Rating (EER), which is grouped into one of 7 categories, A-G, as shown in in Figure 1.

Figure 1 - Energy Efficiency Ratings and A-G scores

Figure 1 – Energy Efficiency Ratings and A-G scores

The Data Science Unit is developing a machine learning model to predict the EERs for homes that are yet to have an EPC energy assessment. This means finding new information that can be used as a substitute for manual EPC assessments.

Finding substitute information

A challenge in predicting the EER for homes is obtaining the information that determines EER like wall and floor insulation which usually comes from a manual assessment. This information is only available for properties that have been assigned an EPC. We need to find other data sources to use as substitutes to predict EER scores for properties that haven’t been assessed. We call this substitute data “proxy data”.

Current proxy data sources

Our proxy data set combines information from multiple sources:

  1. Ordnance Survey’s AddressBase dataset
  2. Ordnance Survey’s National Geographic Database (NGD)
  3. Land Registry’s Price Paid Dataset
  4. Welsh Index of Multiple Deprivation
  5. The Rent Smart Wales register

Using these datasets we have, so far, managed to directly substitute some of the EPC features, such as property type, age of building, and floor area. Some of the manually collected information is harder to substitute so we have used approximations like replacing number of rooms with number of bedrooms, using Rent Smart Wales information. This is because generally the more bedrooms a household has, the more rooms it has overall. Other pieces of information are more difficult to obtain any proxies for and may not be represented in the final model.

In addition to looking for substitute data, we also consider neighbouring properties with EPC certificates as approximate to a property without a certificate. The EER ratings for nearby properties are included in the proxy dataset.

Next Steps

The proxy data will be used to train a machine learning model. We hope that the model will give us a wider view of EPCs across Wales that can be used to support policy development. If you’d like to know more about the work, please contact us at: DataScienceUnit@gov.wales