Why You Need Data Governance

In this blog I’m going to look at why you really should do data governance. When I tell people what I do, I get a mixed response. Some people seem genuinely surprised that everyone isn’t already doing Data Governance, and an awful lot of people ask why would you need that?

Now I’m biased, as I believe that every organization would benefit from implementing data governance. It may not solve all problems, but it really does provide a framework which can be used to proactively manage your data.

A few years ago the main driver of Data Governance initiatives was regulatory compliance and while that is definitely still a factor, there is a move towards companies embracing Data Governance for the business value which it can enable. For example if your organisation is starting a digital transformation or wants to become “data driven”, you are not going to be successful if your data is currently not well understood, managed and is of poor quality.

If you embrace Data Governance and achieve better quality data, all sorts of benefits start to be seen. But you don’t have to take my word for it; take the DAMA DMBoK Wheel for instance: 

askham01.png

As you can see, it lists all the Data Management disciplines around the outside of the wheel. There in the middle, at the heart of it all, is Data Governance.  Now it didn’t just get put in the middle because there were no more spaces on the outside of the wheel – it’s there for a reason. Data Governance provides the foundation for all other data management disciplines.

Let’s look at a few of these disciplines to illustrate the point:

Data Quality

Without Data Governance all data quality efforts tend to be tactical at best. This means a company will be constantly cleaning or fixing data, perhaps adding default values when a key field has been left blank. With Data Governance in place, you will have processes, roles, and responsibilities to ensure that the root causes of poor data quality are identified and fixed so that data cleansing is not necessary on an on-going basis.

Reference and Master Data

Anyone who has been involved in any master data projects will have no doubt heard or read numerous dire warnings about the dangers of attempting these without having Data Governance in place. While I am not a fan of wholesale scaremongering to get people to embrace Data Governance, these warnings are genuine. For master data projects to be successful, you need data owners identified and definitions of all the fields involved drafted and agreed, as well as processes for how suspect matches will be dealt with. Without these things (which of course Data Governance provides) you are likely to be faced with a mess of under, over or mismatching!

Data Security

Of course Data Security is primarily an IT managed area, but it makes things a lot easier to manage consistently if there are agreed Data Owners in place to make decisions on who should and should not have access to a given set of data.

I hope you agree that these examples and explanations make sense, but don’t forget that is theory; and explaining this in data management terms to your senior stakeholders in order to get agreement to start a Data Governance initiative is unlikely to be successful. Instead, you are going to need to explain it in terms of the benefits it will bring. The primary reason to do Data Governance is to improve the quality of data.  So the benefits of Data Governance are those things that will improve, if the quality of your data improves.  This can cover a whole myriad of areas including the following:

Improved Efficiency

Have a look around your company. How many “work-arounds” exist because of issues with data? What costs could be reduced if all the manual cleansing and fixing of data were reduced or even eliminated?

Better Decisions

We have to assume that the senior management in your organization intends to make the best decisions. But what happens if they make those decisions based on reports that contain poor quality data? Better quality data leads to more accurate reporting.

Compliance

Very few organizations operate in an industry that does not have to comply with some regulation, and many regulations now require that you manage your data better. Indeed, GDPR (the General Data Protection Regulation) impacts everyone who holds data on EU Citizens (customers and employees), and having a solid Data Governance Framework in place will enable you to manage your data better and meet regulatory requirements.

So, at this point you are probably thinking, “isn’t it just a generic best practice thing that everyone ought to do?” And the answer is, yes – I do believe that every organization could benefit from having a Data Governance Framework that is appropriate for its needs.

What Happens if you Don’t Have Data Governance?

Well I’ll leave that to you have a look around you and decide what the likely consequences for your company could be, but it is usually the opposite of the benefits that can be achieved.

Remember data is used for dealing with your customers, making decisions, generating reports, understanding revenue and expenditures. Everyone from the Customer Service Team to your Senior Executive Team use data and rely on it being good enough to use.

Data Governance provides the foundation so that everything else can work.  This will include obvious “data” activities like Master Data Management, Business Intelligence, Big Data Analytics, Machine Learning, and Artificial Intelligence.  But don’t get stuck thinking only in terms of data.  Lots of processes in your organization can go wrong if the data is wrong, leading to customer complaints, damaged stock, and halted production lines. Don’t limit your thinking to only data activities.

If your organization is using data (and to be honest, which companies aren’t?) you need Data Governance.  Some people may not believe that Data Governance is sexy, but it is important for everyone.  It need not (in fact it should not) be an overly complex burden that adds controls and obstacles to getting things done. Data Governance should be a practical thing, designed to proactively manage the data that is important to your organization.

Just one final word of advice: I hope that this article has convinced you that your organization needs to embrace Data Governance; but if that is the case, please don’t just spout the generic benefits and examples I have shared here in your efforts to gain stakeholder buy in. It is very important to spend time working out the specific reasons your company should be doing Data Governance. You can find more advice on that and how to engage your senior stakeholders here.

Does it have to be called Data Governance?

This is a question that I get asked fairly regularly. After all it is not an exciting title and in no way conveys the benefits that an organisation can achieve by implementing Data Governance. Sadly however, there is no easy yes or no answer. There are a number of reasons for this:

  1. Data governance is a misunderstood and misused data management term

Naturally I am biased, but in my view, data governance is the foundation of all other data management disciplines (and of course therefore the most important). But the fact remains that despite an increasing focus on the topic, it remains a largely misunderstood discipline.

On top of this, it is a term which is frequently misused. A few years ago, a number of Data Security software vendors were using the term to describe their products. More recently the focus on meeting the EU GDPR requirements has led to a lot of confusion as to whether Data Protection and Data Governance are the same thing and I find that the terms are being used interchangeably. (For the record, having Data Governance in place does help you meet a chunk of the GDPR requirements, but they are not the same thing).

Having more people talking about Data Governance is definitely a good thing, but unless they are all meaning the same thing, it leads to much confusion over what data governance really is.

I explored this topic in a bit more detail in this blog: Why are there so many Data Governance Definitions?

In order to understand whether Data Governance is the right title for your organisation to call it, I would start with looking at how you define data governance. And this step leads nicely to the next item for consideration.

  1. Sometimes it is right to include things which are not pure data governance in the scope of your data governance initiative.

This is a topic that I covered in my last blog which you can read here.

To summarize that article, it is just not possible to have one or more people focus purely on Data Governance in smaller organisations. It’s a luxury of large organizations to be able to have separate teams responsible for each different data management discipline (e.g. Data Architecture, Data Modelling or Data Security).  Going back to my point above, if data governance is the foundation for all other data management disciplines, it is only natural that the line between them can sometimes get a little blurred. As a result of this, the responsibilities of the Data Governance Team can get expanded.

So consider what is included within the scope of your data governance initiative and decide whether it be more appropriate to name the initiative and your team (either or both)  something that is more aligned to the wider scope of the initiative and activities of the team.

Is the name going to make cultural change harder to achieve?

Achieving a sustainable cultural change is one of the biggest challenges in implementing data governance and insisting on calling it “data governance” could make achieving that cultural change more difficult if the term doesn’t resonate within your organization. This is related to a topic that I explored in another old blog Do we have to call them Data Owners?

Whether we’re talking about the roles, the team, or even the initiative the same principles are true. It is better to choose a name that works for the culture in your organization than to waste considerable effort trying to convince people that the “correct” terminology is the only one to use.

It would be my preference to explain that the initiative is to design and implement a Data Governance Framework, but if the primary reason for implementing data governance is to improve the quality of your data, perhaps calling it the “Data Quality Team” and “Data Quality Initiative” would fit better? After all, that very much focuses on the outcome of what you’re doing.  It also addresses the question that everybody asks (or should ask) when approached to get involved in data governance of “why are we doing this,” which is usually followed by “what’s in it for me?”

When having these conversations, I explain the initiative in terms of its outcomes (e.g. better quality data which will lead to more efficient ways of working, reduced costs and better customer service). That is a far easier concept to sell rather than implementing a governance structure, which can sound dull and boring.

Is the name causing confusion?

In the early days of a data governance initiative, the talk is all about designing and implementing a data governance framework. Once this work has been achieved you start designing and implementing processes which have “Data Quality” in their titles:

  • Data Quality Issue Resolution

  • Data Quality Reporting

I have been fortunate enough to work with organizations in the past who have had both a Data Governance Team (supporting the Data Owners and Data Stewards) and a Data Quality Team (responsible for the processes mentioned above) but that is fairly unusual in my experience. It is more common for the Data Governance Team to support the above processes. So it is worth considering whether it would confuse people if they had to report data quality issues to the Data Governance Team?

In summary, I would not want to miss the opportunity to educate more people on what Data Governance really is. But the banner under which it is delivered can be altered to make your data governance implementation both more successful and more sustainable. So if having considered all the points above in respect of your organization and you want to call it something else, then that is fine with me.

Deciding what to call your initiative is only the start of many things you need to do to make your Data Governance initiative successful.   You can download a free checklist of the things you need to do here. (Don't forget this is a high level summary view, but everyone who attends either my face to face or online training gets  a copy of the complete detailed checklist which I use when working with my clients.)

What should you include in a Data Governance initiative?

Scope of a Data Governance initiative

One of the many challenges you will have to face when implementing Data Governance is agreeing the scope of the initial phase of your initiative. By this I don’t just mean which data domains or business functions are going to be in scope. I’m thinking of associated activities like data retention, end-user computing, and data protection. Being a bit of a Data Governance purist I maintain that such activities are most definitely NOT data governance. It is easy therefore to make the logical conclusion that they should not be in the scope of your initiative. So what I say next may surprise you:

Do not immediately go on the defensive and refuse to take any (or even all) of these activities into the scope of your initiative!

Now you may be wondering why someone who spends her time educating people on what Data Governance is would say that! Well, when I’m training and coaching people it is important that they understand what Data Governance is, but when I’m implementing Data Governance in practice, I take a pragmatic approach.

However, I would not want you to think that I would just say yes to an ever-expanding scope. There are a number of factors that would make me consider bringing these additional data activities into the scope of my data governance work, which include:

  • If you work for a small organization that does not have the luxury of separate specialist teams to cover each data management discipline;

  • If they overlap with other projects ongoing at the same time;

  • Or if a senior stakeholder requests it.

Whilst you may become aware of other activities that you want to bring into scope, they are most likely to come to your attention through your senior stakeholders – so let’s consider this question:

How do you manage senior stakeholders who ask you to extend the scope of your initiative?

Now whilst it may be tempting to protect the scope of your initiative, remember they have their own agenda. They are not trying to derail your plans, they just have concerns of their own or issues that they need addressed. The first thing you are going to need to do is to listen and understand what their concerns are before you try to educate or influence them. After all, how can you properly allay their concerns if you don’t fully understand them?

But remember whilst it is imperative that you understand why they’re asking you to extend the scope, when I say educate or influence them, I don’t mean your initial stance is to say no! When talking to your senior stakeholder, ask lots of questions and constantly consider the following:

  • What exactly does this person need done?

  • Does it have any alignment or overlap with your data governance work?

  • What will happen if this additional work does not get done? (And in particular will it cause a problem for your data governance initiative?)

Even if the answer to this last question is no, it may still be necessary for you to consider that if you say no, that this senior stakeholder could divert resources currently allocated to your initiative to address this other issue.

Are there benefits and/or efficiencies to be achieved by taking on this work? This can be especially true if you are talking to the same stakeholders.

My advice is to look for solutions that help everyone. This is not about you or them winning. This is about doing the right thing for your organization. Find out why he/she is concerned about these other topics. Is it because they are not being done, or is it that they are being done but are not visible or are being done but not well enough or quickly enough?

Now obviously I’m biased, but I truly believe that well implemented data governance can be the framework against which you align an awful lot of other activities in your organization (well at least those concerning data)! Once in place, you can use your data governance framework to coordinate, oversee, and escalate other data matters to the appropriate people. That said, it is not the answer to everything and you should resist taking on everything (unless of course you are Superman/Superwoman), or at least agree to timescales for adding additional scope once the implementation of your data governance framework has reached a certain stage.

If you do take on something that perhaps you feel is not in the area of your expertise, that is ok – just be honest and clear on the matter. Explain that whilst, for example, you may not be a data retention expert, you see how including that in your data governance initiative has benefits for the organization. Confirm that you are happy to do the necessary research and support the work if you are given the necessary expert support (for example from your Legal Department).

Remember that whether your data governance initiative is small and focused or has gained additional scope, stakeholder engagement is absolutely vital for success. You need to spend a lot of effort engaging your stakeholders. If you could lose their support by not addressing their other concerns, it’s got to be worth considering whether the additional work is something that you can take on.

Finally, if you want ideas on how to go about engaging your stakeholders, you can download my top tips on stakeholder engagement for free if you click here.

Originally posted on TDAN.com

Data Governance Interview - Bonny McClain

bonny.jpeg

I haven’t done a Data Governance interview for quite a while, but while preparing for a webinar I am doing with Bonny on Data Governance in Healthcare (you can register here) it became clear that I had to ask her to do an interview as she has so much expertise to share.

Bonny curates data from the intersection of health policy, health economics, and healthcare to create powerful storytelling narratives. Real insights come from the ability to hold tensions and bring multiple data sources to the conversation.

The data revolution is here, and her expertise is tackling industry specific problems-- rendering them solvable with data-- relying on a wide variety of tools like Python, R, SQL, and Tableau data visualization.

A big advocate of data literacy Bonny is a life-cycle data consultant. The ability to appreciate the overall concept while simultaneously thinking about detailed aspects of implementation allows different levels of abstraction to be curated creatively and empathetically.

How long have you been working in Data Governance?

Since I first began working with electronic medical record (EMR) systems. Working with relational databases it became painfully obvious that data systems and data assets were not being managed across their lifecycle. Data quality issues were impacted by non-existent information governance often exacerbated by the move from paper records and charts to electronic databases.

 

Some people view Data Governance as an unusual career choice, would you mind sharing how you got into this area of work?

In smaller community medical practices and even larger health systems—information was being generated and shared across brick and mortar institutions. As physicians became curious about data questions that extended beyond administrative concerns there was a palpable need to understand “data dictionaries” and the schema and architecture of data storage. Understanding data assets was a natural evolution and requirement to glean insights from the digital data being created.

 

What characteristics do you have that make you successful at Data Governance and why?

Because I work with the populations that rely on the quality of data—the emphasis is on usability but not at the expense of safety—I have the data analytic skills (recent certificate in applied data analytics from Columbia School of Engineering) to identify which measures and variables are needed to answer a data question. Knowing that pieces of information often live in different parts of a database generates concern and an evolution of skills to ensure that patient records are matched to avoid duplicate records, missing data, inappropriately merged records, current medications and procedures—all with an eye to seek out potential opportunities for harm if data is incomplete, incorrect, missing, or low value.

 

You work a lot with the Healthcare Industry – how mature would you say they are in Data Governance?

You should mention that I snorted and began laughing. Although in full disclosure I work with many smaller organizations that haven’t been able to prioritize data governance for one reason or another. Our conversations about many data governance policies being articulated as “best endeavors” or a best effort illuminated for me why many of the best intentioned guidance falls short without out a complete and accountable data governance strategy.  

I do stress scalability with new clients so they don’t feel overwhelmed and tempted to park the whole process until “later”. I meet clients where they are—and most make significant process relatively quickly. Healthcare is unique as bad data here—can lead to deleterious outcomes and harms.

 The typical data client actually has low data literacy and maturity and often answers a data question before the analyses have been initiated. They want data that shows what they actually believe to be true—and in the face of a contrary outcome—move on to the average analyst that will gladly only report the curated outcome they seek.

Given that this is the typical healthcare professional—you can imagine that governance strategies that sit upstream from information governance are not recognized or prioritized.

 

How clear do you believe the Healthcare Industry as a whole is on the difference between Data Governance and Information Governance?

I may be wrong but I feel the labels are not applied to the different behaviors correctly. Although I do believe that compliance, quality care (value), cost containment, evolving payment models, and safety are understood as information—and rely on proper custodial processes of information assets--the industry as a whole is at a crossroads of how data governance influences the rigor of information downstream from policies and processes.

 

As a final reminder Bonny and I will be having a conversation about Data Governance in Healthcare on the 1stMay – the webinar is free and you can sign up here: 

https://usergroups.tableau.com/datagovernanceforhealthcare

 

Comment

How to Write a Good Data Governance Policy

Data Governance Policy Image.jpg

It won’t surprise you to learn that I sometimes find myself writing a data governance policy for my clients.  Sometimes my clients assume that I will just take a previous policy I've written and tweak it for them. However, this really wouldn't be very helpful for the new client, as it wouldn't be designed to meet their needs!

As with all things data governance, I don't think that there is such a thing as a standard approach, and there certainly is not one for a data governance policy. If there is no such thing as a standard data governance framework, why would you think that a policy written for another organization would work for you?

Unfortunately a lot of people don’t realise this and I’m often asked if I would share a template, or an example for data governance policy that they can copy.

For a policy to be really useful (i.e. help you implement Data Governance successfully) it needs to be written with your organisation in mind, and consider the following:

·     What is the scope of your data governance programme?

·     What is it that your organization is going to do to manage its data better?

·     What roles and responsibilities are you going to have to manage your data better? 

·     What kind of processes are you going to implement as a result of having data governance?

Now, the answers to these questions will not be the same for all companies and I can honestly say that every organization I have ever worked with has been unique in its approach to data governance.

 I admit sometimes the differences are subtle, but for a policy to be valuable, these subtleties really do need to be addressed.

So how do you write a good Data Governance Policy from scratch?

If you are just starting to draft a data governance policy, then I recommend that you take the following approach:

Assemble key senior stakeholders in a room together and get them to tell you what principles they want included in the policy.  What are the high level things that you want to achieve by having a data governance framework and policy in place? This could be things like “all data has a data owner”, or it could be describing which data will have data quality standards and monitoring in place, or which data will have definitions in your data glossary.

For some clients, this list of principles has been as long as twelve and others as short as six.

 I find that getting principles agreed is a lot easier than asking a group pf people what they want included in a data governance policy. Plus the conversation around the principles will give you a really good idea about what they want covered in their policy.  (And this additional information will also be important input when designing a more detailed data governance framework for your organization.)

Once you've drafted and circulated those principles for feedback, you should be able to make amendments and agree a list of principles. With the principles agreed drafting your policy in accordance is fairly straightforward.

However, don't make the mistake of believing that once it is drafted that everyone will immediately approve it because they already agreed the principles.  Seeing the detail in black and white often gives rise to more questions, suggestions or changes from your key stakeholders.

At this point, I really have to emphasize that for data governance to be successful, you need the senior stakeholders engaged. So the answer is not to tell them they're wrong, or to railroad them into accepting what you want to have in the data governance policy. If this framework is to be successful, it needs to have buy in from everyone. So you need to take all input at this stage very seriously.

Drafting a data governance policy is one of the many things on my data governance checklist.  

You can download a free version of the checklist here. (Don't forget this is a high level summary view, but everyone who attends either my face to face or online training gets  a copy of the complete detailed checklist which I use when working with my clients.)

Comment

What’s the difference between Data Owners and Data Custodians?

Question mark image

I often get asked about the difference between Data Owners and Data Custodians.

 If you've read my other blogs, you'll know that I'm not fixated on sticking rigidly to the standard role names, but for the purposes of this blog, I’ll to stick with what I consider “best practice” role names and consider both roles in turn:

A Data Owner is a senior business stakeholder who is accountable for the quality of one or more data sets. They are usually a senior business person who has the resources, budget and authority to be able to make changes to that data if necessary. 

Data Custodians are very much an IT role. They are responsible for maintaining data on the IT infrastructure in accordance with business requirements. I think the confusion between the roles and who should be making decisions about data is rooted in a long term lack of Data Governance.   

Before a company has Data Governance in place, it's common that the business has not been trained in articulating their data requirements and therefore it is often down to IT to interpret or make decisions in order to help the business.  However, once you have a Data Governance framework in place, the business usually gets much better at articulating data requirements and IT’s job gets easier. 

Another difference between the roles is that for Data Owners, I'm looking for named individuals who own one or more data sets. When it comes to Data Custodians. However, I use that more loosely as a collective term for all of the IT department who are supporting your infrastructure.  That said, it's not impossible to have named custodians. I've worked with a number of clients where they have named their Data Custodians.  These have usually been smaller organisations, where there is only one subject matter expert for each system and that person has been named as the Data Custodian.

Another frequent question is whether IT is ever a Data Owner? Now this is an interesting questions as generally, I would say, no. They might own meta data, or performance data around the systems, but nothing more than that. However, recently, I have come across circumstances where IT might own some data. 

One such example is maybe your Data Security or Information Security Team who may be monitoring telephone calls or internet activity. This often isn't being done for any direct business purpose, but rather to protect the business as a whole. If this data isn't being collected to meet the requirements of a business Data Owner, then it could easily be argued that IT own that data.

I also often get asked if only IT can be Data Custodians?  Generally, it is the case that Data Custodians sit within IT, but there are instances where business teams or functions may also act as Data Custodians. For instance, you may have a BI or Analytics Team sitting in the business reporting line, who perhaps manage and support your data warehouse.  Because of the work they do it is quite common for such a team to be considered as the Data Owners for all data that's in the data warehouse. However, this is not a correct approach, because all that data has a Data Owner when it sits in the source system and that person still owns it even when it's on a data warehouse. Of course, in a data warehouse a lot of data is aggregated, combined and often new data is derived or calculated. It is common for the BI or Analytics Team to be involved in these activities, but it does not mean that they own the new or aggregated data. It is more appropriate to consider them as Data Custodians for the data, whilst it's being managed, manipulated and reported on by that team.  After all they should be carrying out these activities to meet business requirements and the person who approved those requirements is the Data Owner.

I hope that this has clarified the difference between Data Owners and Data Custodians.  Roles and responsibilities are only one of many things that you have to address as part of implementing Data Governance. If you are struggling to get your head around everything else you should be doing and the order in which to do it, please download my free checklist to help you plan your initiative.

Data Quality Issues - Who Is Responsible for Resolving Them?

Image of man at laptop looking confused

One of the first processes, I believe that you should introduce in your Data Governance initiative is a Data Quality Issue Resolution process. In fact my last blog covered what you should include in a Data Quality Issue Log to help you get started with collating and resolving issues.

But the log itself is only part of the answer and I have been asked on numerous occasions to clarify what the Data Governance (or Data Quality) Team are responsible for doing when managing and resolving data quality issues.  This often gets asked by newly formed teams.

There seems to be a lot of confusion around who does what and I have often come across the expectation that the Data Governance Team will do or solve everything.  There are times when I truly wish that I had a magic wand and could simply fix all the data problems, but sadly that isn’t the case.  Over time, of course your Data Governance Team will develop knowledge and expertise about the data your organisation creates and uses, but they are not responsible for deciding what the remedial actions should be and especially not for undertaking any manual data cleansing that may be required.

However, I am not saying that they have no part to play in the process.  The best way to understand what the Data Governance Team are responsible for is to look at a high level simple data quality issue resolution process:

Raise Data Quality Issue

It will usually be a Data Consumer (business user of that data) who spots an issue and will be the ones to notify the Data Governance Team.

The Data Governance Team will then log the issue on the Data Quality Issue log and identify the data owner(s) of the data concerned.

The Data Governance Team notifies the Data Owner of the issue, who will advise whether or not they are the correct owner of the issue.

In addition, the Data Governance Team reports the current status of all open material data quality issues to the Data Governance Committee (usually as part of their regular agenda).

The Data Governance Committee reviews the open material data quality issues and prioritizes/directs on the remedial activities if needed.

Impact Assess and Root Cause Analysis

The business user who notified the issue, the Data Governance Team, and the Data Owner(s) assess and agree about the impact of the issue.  If the issue is agreed to have a material impact its resolution will be prioritised.

The Data Governance Team works with the Data Owner(s) to identify the cause of the issue.

The Data Owner(s) consider possible remedial actions to rectify the issue.

Remedial Action Plan

The Data Owner(s) proposes an approach to resolve the issue and prevent it from re-occurring.

The business user who raised the issue agrees whether the proposed action plan is appropriate (the Data Governance Team can facilitate discussions between the parties if needed).

The Data Governance Team updates the Data Quality Issue Log with the agreed actions and target dates.

The Data Owner(s) plans how and when the remedial activities will take place.

Monitor and Report on Action Plans

The Data Owner(s) and their team(s) undertake the agreed remedial actions (N.B. this may need the support of IT).

The Data Governance Team monitors progress on remedial actions against agreed target dates and reports on progress to both the impacted business user(s) and the Data Governance Committee.

The impacted business user(s) advise if timescales for resolution are not appropriate.

The Data Governance Committee then reviews progress and prioritizes/directs on the remedial activities if needed.

Of course, in practice, solving data quality issues takes more than the four steps listed above, but the additional steps will be sub-sets of the stages discussed above.  In addition, keeping it simple like this will help your stakeholders quickly understand who is responsible for what.

As with all things Data Governance, communication is key. I would recommend creating a simple high-level diagram of your data quality issue resolution process and using that in your communications to help people not only understand the process but also everyone’s role in it.

The Data Governance Team’s role in the data quality issue resolution process can be summarized as followed:

  • Identifying which Data Owner is responsible for the data which needs fixing and liaising with them

  • Maintaining the Data Quality Issue Log

  • Monitoring and reporting on open issues and associated action plans

I hope that has clarified the situation for you and remember you can find out more about what to include in a Data Quality Issue log here or download a free template for an issue log here.

Setting up the Data Quality Issue Process is just one of many things you need to do when starting a Data Governance Initiative - you can get a summary of all the things you need to consider by downloading my free Data Governance Checklist here.

2 Comments

What do you include in Data Quality Issue Log?

58669333_m.jpg

Whenever I am helping clients implement a Data Governance Framework, a Data Quality Issue Resolution process is top of my list of the processes to implement. After all, if you are implementing Data Governance because you want to improve the quality of your data, it makes sense to have a central process to enable people to flag known issues, and to have a consistent approach for investigating and resolving them.

At the heart of such a process is the log you keep of the issues.  The log is what the Data Governance Team will be using while they help investigate and resolve data quality issues, as well as for monitoring and reporting on progress.  So, it is no surprise that I am often asked what should be included in this log.

For each client, I design a Data Quality Issue Resolution process that is as simple as possible (why create an overly complex process which only adds bureaucracy?) that meets their needs. Then, I create a Data Quality Issue Log to support that process.  Each log I design is, therefore, unique to that client.  That said, there are some column headings that I typically include on all logs.

Let’s have a look at each of these and consider why you might want to include them in your Data Quality Issue Log:

ID

Typically, I just use sequential numbers for an identifier (001, 002, 003 etc).  This has the advantage of being both simple and giving you an instant answer to how many issues have been identified since we introduced the process (a question that your senior stakeholders will ask you sooner or later).

If you are creating your log on an excel spreadsheet, then it is up to you to decide how you record ID numbers or letters.  If, however, you are recording your issues on an existing system (e.g. an Operational Risk System or Helpdesk System), you will need to follow their existing protocols.

Date Raised

Now this is important for tracking how long an issue has been open and monitoring average resolution times.  Just one small reminder: be sure to decide on and stick to a standard date format – it doesn’t look good for dates to have inconsistent formats in your Data Quality Issue log!

Raised By (Name and Department)

This is a good way to start to identify your key data consumers (it is usually the people using the data who notify you when there are issues with it) for each data set.  This is something you should also log in your Data Glossary for future reference (if you have one). More importantly, you need to know who to report progress to and agree on remedial action plans with.

Short Name of Issue

This is not essential and some of my clients prefer not to have it, but I do like to include this one. It makes referring to the Data Quality Issue easy and understandable.

If you are presenting a report to your Data Governance Committee or chasing Data Owners for a progress update, everyone will know what you mean if you refer to the “Duplicate Customer Issue”. They may not remember what “Data Quality Issue 067” is about, and “System x has an issue whereby duplicate customers are created if a field on a record is changed after the initial creation date of a record” is a bit wordy (this is the detail that can be supplied when it is needed).

Detailed Description

As I mentioned above, I don’t want to use the detailed description as the label for an issue, but the detailed description is needed. This is the full detail of the issue as supplied by the person who raised it and drives the investigation and remedial activities.

Impact

Again, this is supplied by the person who identified the issue. This field is useful in prioritizing your efforts when investigating and resolving issues. It is unlikely that your team will have unlimited resources and be able to action every single issue as soon as you are aware of it. Therefore, you need a way to prioritize which issues you investigate first. Understanding the impact of an issue means that you focus on resolving those issues that have the biggest impact on your organization.

I like to have defined classifications for this field. Something simple like High, Medium and Low is fine, just make sure that you define what these mean in business terms. I was once told about a ‘High’ impact issue and spent a fair amount of time on it before I discovered that in fact just a handful records had the wrong geocode. The percentage of incorrect records made it seem more likely that human error was to blame, rather than there being some major systemic issue that needed to be fixed! This small percentage of incorrect codes was indeed causing a problem for the team who reported them. They had to stop time critical month-end processes to fix them, but the impact category they chose had more to do with their level of frustration at the time they reported it than the true impact of the issue.

Data Owner

With all things (not just data), I find that activities don’t tend to happen unless it is very clear who is responsible for doing them. One of the first things I do after being notified of a data quality issue is to find out who the Data Owner for the affected data is and agree with them that they are responsible for investigating and fixing the issue (with support from the Data Governance Team of course).

Status

Status is another good field to use when monitoring and reporting on data quality issues. You may want to consider using more than just the obvious “open” and “closed’ statuses.

From time to time, you will come across issues that you either cannot fix, or that would be too costly to fix. In these situations, a business decision has to be made to accept the situation. You do not want to lose sight of these, but neither do you want to skew your numbers of ‘open’ issues by leaving them open indefinitely. I like to use ‘accepted’ as a status for these and have a regular review to see if solutions are possible at a later date. For example, the replacement of an old system can provide the answer to some outstanding issues.

Update

This is where you keep notes on progress to date and details of the next steps to be taken (and by whom).

Target Resolution Date

Finally, I like to keep a note of when we expect (and/or wish) the issue to be fixed by. This is a useful field for reporting and monitoring purposes. It also means that you don’t waste effort chasing for updates when issues won’t be fixed until a project delivers next year.

I hope this has given you a useful insight on the items you might want to include in your Data Quality Issue Log. You can download a template with these fields for free by clicking here.

Running and managing a Data Quality Log using excel and email is an easy place to start but it can get time consuming once volumes increase – especially when it comes to chasing those responsible!   That’s why I was delighted to be involved recently with helping Atticus Associates create their latest product in this space, DQLog.   The Atticus team are launching their beta version in Spring this year and they are keen to hear from anyone interested in trying it for their feedback.  If you are interested in testing the beta, please email me and I can put you in touch.