Sunday, September 11, 2016

Road Networks and the Spectre of Exhaustive Data Completeness

Spatial data is, ideally, completely representative of the real-world entities it is portraying.  In reality there is necessarily some amount of generalization inherent to any spatial data, a result of translating and scaling it for its practical use.  The perfectly complete data set is a very rare entity, and road networks like TIGER, TeleAtlas, and locally sourced street centerlines more often than not exclude some amount of road segments that are actually present.  This exclusion results from things like conscious decisions to leave smaller, privately-owned roads out of the data set, and errors of omission in data digitizing/collection procedures.  



The above represents a county and two different road network data sets- one from the census bureau's TIGER database, and another with locally collected street centerline locations.  A 1 km by 1 km grid is overlaid, in order to systematically derive a spatially comparable measure of each data set's relative completeness.  The total length of road segments within each grid cell can be compared between the two, and a measure of the magnitude of difference between the two is depicted with the choropleth map above.  Thus we can conclude that the TIGER data set is more complete than the street centerlines, as it contains a larger amount of total road segments- which is our sole qualifier for data comprehensiveness.  The caveat to that is, of course, that we may wish to consider other factors in gauging the data's relative "completeness."  If the TIGER data contains more driveways and non-navigable road segments, for example, we may wish to reevaluate our definition of "complete," as the superior accuracy of the other street centerline data renders it somewhat more "complete" than the TIGER, so to speak.     

No comments:

Post a Comment