Why is the topology of polygon shapefiles often incorrect?

The Internal Structure of Shapefiles:

The big disadvantage of shapefiles is that there are no informations on topology (i.e. the relation between the single points, lines and polygons) like in ARC/INFO. In coverages the boundary points or borderlines (so-called Arcs) between the polygons are stored. Polygons result from al arcs that enclose a so-called Label Point.

In shapefiles however each polygon is stored as the complete list of points of its contour. The shared boundary points or borderlines between two adjacent polygons therefore exist twice. As these vertices must not be necessarily identical, polygons can overlap or have gaps. These errors may lead to unpleasant surprises and serious problems when processing the data.

(In-)Accuracy of Floating Point Numbers:

The internal inaccuracy when calculating with floating point numbers often is the reason for errors that are only visible at an extreme magnification. One reason that there are differences in the number format between shapefiles, AutoCAD drawings, ARC/INFO coverages and E00 export files. Chiefly vertices of AutoCAD drawings, originally "identical", differ slightly after import. This may lead to improper display of a theme with unstable polygons (see Truncating Coordinates).

A similar problem occurs when intersecting lines or polygons. If two vertices are very close together a vertex with the coordinate 0/0 may result due to an calculation error (Division by Zero). The contour of the polygon then jumps to a point far out of the visible extent of the view and back again.

There are even differences between the internal handling of coordinates in ArcView and the shapefile format. The last decimal place of two apparently identical vertices (the difference is so small that it is ignored in Avenue) may differ slightly after they were stored in a shapefile and are reloaded then.

Variable Fuzzy Tolerance of ArcView:

Due to the inaccuracies of floating point numbers there must be a limit, below that vertices or sections of lines or polygon are evaluated as equal and are removed at cleaning. In ARC/INFO this is the so-called Fuzzy Tolerance. At first sight there is no such value in ArcView. In fact, this kind of tolerance also exists in ArcView and is used in each manipulation of a line or polygon. Just this limit can't be set in ArcView, but is variably calculated from the polygon extent.

That is in a shapefile are used lots of different tolerances. Delicate structures between adjacent polygons are removed in the larger polygon (bigger tolerance) but are maintained in the smaller one (smaller tolerance) thereby causing tiny little gaps and overlaps. This often happens when dividing polygons into differently sized parts. Because ArcView also removes vertices whose distance lies below the tolerance limit, often vertices on nearly straight borderlines are missing. The contours of adjacent polygons are then no more completely identical.

The resulting gaps and overlaps are very small and mostly invisible and can't be detected or cleaned with on board functions of ArcView because every intersection between the polygons underlies the internal fuzzy tolerance. If these polygons are further divided, their fuzzy tolerance is getting finer too and gaps or overlaps may appear where the polygon has not been edited. Therefore we call these errors Fuzzy Vertices that cause Hidden Gaps and Overlaps.

2003 WLM Klosterhuber & Partner OEG