Some customer data is missing

How often does this come up as a problem to solve? It may happen more frequently than you think.

Having clean, comprehensive, and consistent data is paramount to the most appropriate customer engagement and interaction. If your business is also an advocate and heavy user of automation, machine learning and artificial intelligence then your technical teams will tell you that the results of their efforts are commensurate not only with their efforts but also with the quality of the data that they are working with.

Without the best possible customer data, your staff and systems are exposed to a partial picture which can result in bad decisions, model bias and skewed results.

The US National Library of Medicine and National Institute of Health (PMC) journal contains an article from May 2013 describing three types of potential data deficiency in any given data set. While the focus in this case by the author, Hyun Kang would be on suitability for studies, this basis is useful for considering customer master data in general.

The three types are Missing Completely at Random (MCAR), Missing at Random (MAR), and MNAR (Missing Not at Random). Each with its own cause and potential solutions.

We’ll look at this through the lens of a customer master data management system.

Missing Completely at Random (MCAR)

MCAR is data that may be missing due to sheer bad luck, such as damage to physical devices, missing migration data or integration glitches.

This could also be data that would have been manually captured but some of which may not have been available for capture at the time because the sources or forms did not provide for that data. However over time in some cases that data may have been captured through the evolution of the data.

Other reasons could be data corruption in feeds. You can tell this data is missing when you look at perhaps the source of the data if you have lineage or you look to the vintage of the data. Here an entire swathe of uncaptured data may be physically unaccounted for in every record of that era.

Either way, the data may misalign with the rest of the data in terms of what should be there. This can be particularly problematic when you want to say use a particular data attribute to either enrich. filter or sort the customer data for some secondary or downstream activity. Your objective here might be to examine ways that you build that missing data with some specific data collection approaches like customer outreach or augmentation from transactional data or layering in another data set.

Missing at Random (MAR)

When data are MAR you know this because you detect this in the data that you are examining and care about, this could also be true for data that you are less concerned about but you might not know this because you’re less dependent on that data. Perhaps you store gender, but you don’t really care whether you have that in the system or not.

Conceptually, MAR is a bit more specific than MCAR, as MAR is data that’s missing within a certain variable. Here we may have some records with full first names, some with just the first letter, we may have some with honorifics and some without. There is not much you can do here but this may be problematic when you examine the overall quality of the master data, it can lead to duplicative data in particular and of course may influence effective customer communication or customer experience customization.

Missing Not at Random (MNAR)

MNAR data is data which is missing for a specific reasons such as deliberate omission or a lack of definition in the source schema or source system. An example here might be data that has come from another system which is very old in nature and some of which may predate automated systematic capture.

Say, all the data that came from the mainframe in 1999; data from migrations may acutely suffer from this type of deficiency or data that has been created through a derivation process.

Pretectum C-MDM helps you to not only identify these kinds of issues in your data. It also provides you with potential ways to remediate depending on what other data sources you have available or through directly engaging with customers to provide you with the missing data.

The original posting of this piece can be found at

https://www.pretectum.com/2021/11/30/some-customer-data-is-missing/