As with any task carried out in the ArcGIS software suite, data is the foundation. The old saying of “Garbage in Garbage out” still applies. Data, whether it is in a text format or geographic (shapefile or File Geodatabase Feature class) will impact on the behaviour, use and display of data in ArcGIS.
I do have one proviso in this situation and that is that I’ve given up on perfection. Deviation from the real world is expected. Distortions from the coordinate system chosen, method of capture and scale are some of the sources of error in geographic data. What is important is what error is acceptable for the task you are carrying out.
Once you make the decision acquire good-quality data quality checks can help with determining if the data available meets those criteria. So what are the major checks you can make?
- Currency and Credibility – Does the data come with metadata that records the date the data was collected and the source. Is the source reliable?
- Completeness – Can you see gaps in the features eg roads missing.
- Consistency – Are the attributes matching the features both logically and physically eg a feature that is a road has water flow attributes.
- Accuracy – Does the features match the real world (remember perfection is not expected).
Some of the above checks can be done manually. But this can be a long process. Using automated processes makes it much easier to check your data for errors.
Data quality and features functionality in ArcGIS Desktop
The first option is to use Geodatabase Topology. These are rules in a Geodatabase that allow you to set how you want the features in the feature class to behave. For example no polygons can overlap each other. ArcGIS will determine what features are overlapping and identify them. The error can then be addressed by the user by validating. This is easier and a more accurate method then trying to do this manually.
The topology is created in a feature dataset in a geodatabase that also holds all the feature classes that are part of the topology.
However, it’s important to know what data you are going to use in the topology and design the rules to match the data. Time spent planning and designing is an important part in quality control.
Data quality for features, attributes and relationships using Data Reviewer extension
Another method for checking data quality is the extension Data Reviewer. This extension provides of a series of tools that support both automated and visual analysis of your data. It can be used to determine issues with features, attributes, and relationships in your database. It isn’t a replacement for the geodatabase topology but rather an enhancement of the functionality.
Data reviewer has a number of analysis tools and data checks. Checks are tools that allow you to validate the data based on certain conditions including topology, attribute and database validation.
A full list of the Checks is listed in the Data Reviewer poster.
Some of the advantages of using Data reviewer are that unlike geodatabase topology, the data does not have to be in a feature dataset when using Data Reviewer. It doesn’t require geodatabase topology. Why is this important? It means that the data doesn’t have to be in the same geodatabase or feature dataset. It allows the validation of shapefiles. This can be helpful if data is being incorporated into your database from a variety of sources.
Perfection isn’t obtainable but we can get close. All the discussed data quality tools are available to help in creating and maintaining high quality data for mapping and analysis. By knowing your data, understanding the limitations of the data, spending time to plan and design the management of the data and knowing what tools are available to improve its integrity will result in a product that is consistent and reliable.