What do you include in Data Quality Issue Log?

58669333_m.jpg

Whenever I am helping clients implement a Data Governance Framework, a Data Quality Issue Resolution process is top of my list of the processes to implement. After all, if you are implementing Data Governance because you want to improve the quality of your data, it makes sense to have a central process to enable people to flag known issues, and to have a consistent approach for investigating and resolving them.

At the heart of such a process is the log you keep of the issues.  The log is what the Data Governance Team will be using while they help investigate and resolve data quality issues, as well as for monitoring and reporting on progress.  So, it is no surprise that I am often asked what should be included in this log.

For each client, I design a Data Quality Issue Resolution process that is as simple as possible (why create an overly complex process which only adds bureaucracy?) that meets their needs. Then, I create a Data Quality Issue Log to support that process.  Each log I design is, therefore, unique to that client.  That said, there are some column headings that I typically include on all logs.

Let’s have a look at each of these and consider why you might want to include them in your Data Quality Issue Log:

ID

Typically, I just use sequential numbers for an identifier (001, 002, 003 etc).  This has the advantage of being both simple and giving you an instant answer to how many issues have been identified since we introduced the process (a question that your senior stakeholders will ask you sooner or later).

If you are creating your log on an excel spreadsheet, then it is up to you to decide how you record ID numbers or letters.  If, however, you are recording your issues on an existing system (e.g. an Operational Risk System or Helpdesk System), you will need to follow their existing protocols.

Date Raised

Now this is important for tracking how long an issue has been open and monitoring average resolution times.  Just one small reminder: be sure to decide on and stick to a standard date format – it doesn’t look good for dates to have inconsistent formats in your Data Quality Issue log!

Raised By (Name and Department)

This is a good way to start to identify your key data consumers (it is usually the people using the data who notify you when there are issues with it) for each data set.  This is something you should also log in your Data Glossary for future reference (if you have one). More importantly, you need to know who to report progress to and agree on remedial action plans with.

Short Name of Issue

This is not essential and some of my clients prefer not to have it, but I do like to include this one. It makes referring to the Data Quality Issue easy and understandable.

If you are presenting a report to your Data Governance Committee or chasing Data Owners for a progress update, everyone will know what you mean if you refer to the “Duplicate Customer Issue”. They may not remember what “Data Quality Issue 067” is about, and “System x has an issue whereby duplicate customers are created if a field on a record is changed after the initial creation date of a record” is a bit wordy (this is the detail that can be supplied when it is needed).

Detailed Description

As I mentioned above, I don’t want to use the detailed description as the label for an issue, but the detailed description is needed. This is the full detail of the issue as supplied by the person who raised it and drives the investigation and remedial activities.

Impact

Again, this is supplied by the person who identified the issue. This field is useful in prioritizing your efforts when investigating and resolving issues. It is unlikely that your team will have unlimited resources and be able to action every single issue as soon as you are aware of it. Therefore, you need a way to prioritize which issues you investigate first. Understanding the impact of an issue means that you focus on resolving those issues that have the biggest impact on your organization.

I like to have defined classifications for this field. Something simple like High, Medium and Low is fine, just make sure that you define what these mean in business terms. I was once told about a ‘High’ impact issue and spent a fair amount of time on it before I discovered that in fact just a handful records had the wrong geocode. The percentage of incorrect records made it seem more likely that human error was to blame, rather than there being some major systemic issue that needed to be fixed! This small percentage of incorrect codes was indeed causing a problem for the team who reported them. They had to stop time critical month-end processes to fix them, but the impact category they chose had more to do with their level of frustration at the time they reported it than the true impact of the issue.

Data Owner

With all things (not just data), I find that activities don’t tend to happen unless it is very clear who is responsible for doing them. One of the first things I do after being notified of a data quality issue is to find out who the Data Owner for the affected data is and agree with them that they are responsible for investigating and fixing the issue (with support from the Data Governance Team of course).

Status

Status is another good field to use when monitoring and reporting on data quality issues. You may want to consider using more than just the obvious “open” and “closed’ statuses.

From time to time, you will come across issues that you either cannot fix, or that would be too costly to fix. In these situations, a business decision has to be made to accept the situation. You do not want to lose sight of these, but neither do you want to skew your numbers of ‘open’ issues by leaving them open indefinitely. I like to use ‘accepted’ as a status for these and have a regular review to see if solutions are possible at a later date. For example, the replacement of an old system can provide the answer to some outstanding issues.

Update

This is where you keep notes on progress to date and details of the next steps to be taken (and by whom).

Target Resolution Date

Finally, I like to keep a note of when we expect (and/or wish) the issue to be fixed by. This is a useful field for reporting and monitoring purposes. It also means that you don’t waste effort chasing for updates when issues won’t be fixed until a project delivers next year.

I hope this has given you a useful insight on the items you might want to include in your Data Quality Issue Log. You can download a template with these fields for free by clicking here.

Running and managing a Data Quality Log using excel and email is an easy place to start but it can get time consuming once volumes increase – especially when it comes to chasing those responsible!   That’s why I was delighted to be involved recently with helping Atticus Associates create their latest product in this space, DQLog.   The Atticus team are launching their beta version in Spring this year and they are keen to hear from anyone interested in trying it for their feedback.  If you are interested in testing the beta, please email me and I can put you in touch.

Make Sure you Follow These Practical Steps for Creating a Business Glossary

I’ve recently launched a new course: An Introduction to Data Governance Using Collibra and in order to ensure that attendees on this course have access to the best combination of both business (my focus) and technical skills, I have teamed up with a leading Collibra expert and Implementation Partner Carl White. As you know I like to use this blog to share practical advice to help you with your Data Governance initiatives and I thought that this new collaboration gave me an opportunity to ask Carl for his views on the best way to approach a typical activity for organisations embracing data governance - creating a Business Glossary.

Firstly, what is a business glossary?

In a nutshell, it’s the place where important business terms are clearly owned, articulated, contextualised and linked to other information assets (e.g. reports).  For example you will have a list of terms, what that means in business terms, who owns that data and then information such as which systems and processes it is used in.

A Business Glossary seems a fairly straightforward deliverable, surely it’s very easy to create one?

It seems straightforward but there will inevitably be many stakeholders, all of whom differ in their understanding, expectations, requirements and commitment. Enthusiastic stakeholders will expect the Business Glossary to store everything and solve all problems related to business semantics. Uncommitted stakeholders might see it as a valueless exercise. If it is not carefully positioned, the glossary can quickly become an unstructured dumping ground, ironically reflecting the reason the organisation needed one in the first place.

So what do you recommend that anyone creating a Business Glossary does first?

It’s critical to identify a focus area within the organisation where sponsorship is strong but a lack of clarity has caused problems. Canny sponsors will usually be aware of a particular domain or business area where terms are problematic, for instance, a certain set of management reports where Finance and Sales teams don’t even realise they define terms differently.

Once you have agreed a focus for your pilot what should you do next?

Starting with the sponsor, engage key stakeholders within the focus area to define a limited scope with clear and measurable outcomes that all stakeholders see as valuable to them.

Who do you consider ‘key stakeholder’ do you mean the really senior people in that area or the more junior people that really do the work?

Both senior and junior people have a part to play. Senior people will be accountable for terms and will want to review and approve definitions. Junior people will tend be more involved on a day to day basis so they often know more about the issues. There’s a collaboration to set up through the glossary in which the junior people begin articulating terms and the senior people review and approve. The collaboration is as important as the final definitions, in my opinion, as it leads onto generally better practice like clear accountability with data.

Once you have your area for your pilot identified and stakeholders engaged, what’s next?

Collect a small volume of the most problematic terms, perhaps in an Excel workbook. Identify stakeholders who are willing to act as owners of the term and others who are willing to articulate the term. Encourage stakeholders to be rigorous with their definitions and the information they keep on the terms. I’ve seen so many definitions along the lines of Customer Type - the type of customer’ but this tells me nothing about the possible values, who uses the term, why it matters, who wrote the definition, who approved the definition, when it might no longer apply and so on.

And once you’ve got them working, you move onto another area? 

Not quite, creating data glossaries is very much an iterative process. Once your stakeholders become involved they are likely to think of more information that they would like to add to the glossary. So after the pilot stage it is important that you review the pilot to determine whether all the required information has been collected whether changes are required before rolling the process out across the rest of your organisation.

And can all of this can be done in Microsoft Excel?

You can get a fair way along the journey with Microsoft Excel but the collaboration we talked about earlier includes an element of workflow, terms need to be very easily accessible to all users and changes to the glossary need to be tracked and understood. However, an organisation can start the process using Excel in order to begin their journey and really understand what they need. I would recommend starting small to understand the benefits. Once these are clear and there’s a head of steam, I’d strongly recommend making an investment in a tool.

I hope you have found this advice from Carl useful, if you want to learn where a Business Glossary fits in a data governance framework and even have an attempt at creating your own one in Collibra, why not come along and join us both on An Introduction to Data Governance Using Collibra on 7 September in Central London.

 

My free report reveals why companies struggle to successfully implement data governance. Discover how to quickly get you data governance initiative on track by downloading this free report

Building Relationships and Rapport

Data Governance relationships

I spent last week at two amazing data conferences in the US. Firstly I was at Enterprise Dataversity in Chicago and from there I flew to Richmond, Virginia to join the International Data Quality Summit.  Both were excellent events and gave me the opportunity to meet in person some data friends that until then had only been “virtual” friends via the wonders of social media.  Of course I also got to meet up with others who I had been lucky enough to meet previously and finally there are my new data friends who I would never have come across if I had not met them at the conferences.

Going back to my virtual data friends, I have always considered that I had good relationships with these people, but it's amazing how much better I know these people now that I have actually met them and spent some time with them.  It really doesn’t take long to build rapport with people, just a chat over coffee or a meal makes a huge difference in building relationships.

It bears out something that I taught on one of my tutorials this week - the importance of building relationships and rapport with your stakeholders (especially your Data Owners and Data Stewards).  Sending an email asking them to be a Data Owner is unlikely to be successful, but meeting them face to face and explaining what data governance is and why you think they should be a data owner will be much more successful.  Especially when you take the time to get to know them and the challenges that they are facing, so that you can articulate what being a Data Owner will mean to them.

Sometimes it just isn’t possible to meet up face to face and in those circumstances you will need to work hard to make the most of the communication options that you have available.  But as I experienced on numerous occasions last week, good long distance relationships can very quickly become so much stronger when you can meet up in person.

So if it is possible to meet your senior stakeholders when implementing data governance, make sure that you do and make the most of that opportunity to build relationships and rapport.  And of course if you get the opportunity to attend a data conference, make sure that you take it. It really is an excellent environment for learning from others experiences and meeting and networking with your peers. 

 

My free report reveals why companies struggle to successfully implement data governance. Discover how to quickly get you data governance initiative on track by downloading this free report

Data Governance Interview - Jim Harris

Data Governance Interview - Jim Harris

I'm very pleased that Jim Harris agreed to be interviewed for the first blog in my new website...

Jim Harris is a recognized industry thought leader with more than 20 years of enterprise data management experience, specializing in data quality, data integration, data warehousing, business intelligence, master data management, data governance, and big data analytics.

As Blogger-in-Chief at Obsessive Compulsive Data Quality, Jim Harris offers an independent, vendor-neutral perspective and hosts the popular audio podcast OCDQ Radio. Jim Harris is an independent consultant and freelance writer for hire.

Read More
Comment