Once data is entered or captured in computer systems issues start to appear about it. It rapidly becomes obsolete and needs updating.
Data mastering is another big issue that faces most organizations. You can easily find six different versions of customer records data that all are considered to be ‘master customer data’. In fact you will find 10s of sources for certain data sets such as the list of products or the list of company offices. Applications such as ERP, CRM, production systems, supply chain and corporate intranets all have hundreds of data sets that are all vulnerable to quality issues, duplicated, out of date or out of sync with other similar sets within the organization.
Therefore, data management is a very strategic topic for every CIO of a big organization. The quality of data is as well strategic, because it has been proven that quality management of data is the only way to mastering data. This is referred to as corporate data quality management CDQM.
Several systems, practices, architectures and tools that are used to manage data, ensure its quality can be categorized as:
- Master Data Management (MDM): Several frameworks and tools are available in the software market to ensure proper master data management and provide the “single version of truth” of key data entities like customer, employee, product, partner or supplier. These frameworks provide the ability to manage master data and will probably offer various means for making this data available to various systems within or outside the enterprise.
- ETL: Tools used to extract, transform and load data across various systems.
- Big Data: Tools utilizing big data technologies such as using Hadoop map/reduce methods to analyze large and unstructured volumes and sets of data to come out with findings and reports.
- Migration Tools: Corporates often change systems, upgrade from one version of a system to another or introduce new systems to manage business. Effective data migration tools help organizations populate newly introduced system or system revisions with correct and clean data.
- Synchronization Tools: Systems come from different vendors and sometimes utilize different technology platforms, even within the same system, similar data sets such as customer data need to be present across different databases. So, out there in the software market, there exist tools that ensure data is synchronized across various systems while applying the transformations required to fit any given schema.
- Data Governance: It is a combination of people, processes and technology that ensures accuracy and value of entered data into various organizational systems.
- Service Oriented Architecture (SOA): SOA ensures that systems are built in a loosely coupled manner which implies proper embedding of various business processes into their own service logic while providing the necessary channels (service interfaces) between various business processes and modules to interact in order to deliver the end to end business value. This practice in building systems, by design delivers systems with better data quality and less of data duplication and redundancy.
“In 2011 alone, 1.8 zettabytes (or 1.8 trillion gigabytes) of data will be created, the equivalent to every U.S. citizen writing 3 tweets per minute for 26,976 years. And over the next decade, the number of servers managing the world’s data stores will grow by ten times.” IDC Study referenced by Computer World Magazine. “The IDC study predicts that overall data will grow by 50 times by 2020, driven in large part by more embedded systems such as sensors in clothing, medical devices and structures like buildings and bridges”.