VITAL: Progress to Date

VEHICLE INTELLIGENCE & TRANSPORTATION ANALYSIS LABORATORY

University of California, Santa Barbara

LRMS Test: Cross Streets Profile (XSP)

The Cross Streets Profile is one of the messaging protocols in the LRMS. It communicates a location in terms of an offset distance along a principal street between two cross streets. For example, the incident below is located at “875 metres [or a relative distance: 55% of the way] along Birch St between Main St and State St.” Our task is to test this profile, i.e. to evaluate its ability to transmit a message meaningfully and unambiguously between databases.

Figure 1: The Cross Streets Profile

Initial findings (Phase I) were that the XSP was not a realistic messaging solution in the short term. We recommended that it be reinforced with location coordinates. This led to a further series of tests (Phase II). The following is a condensed report on the results of both Phases I and II.

With the inclusion of coordinates, the XSP acquires an important characteristic. Since names and coordinates are entirely independent and could be in conflict, agreement between the two is a measure of reliability.

Unfortunately, the test results cannot be encapsulated in a single figure or a short statement. Success of the XSP depends on a number of factors including

inherent effectiveness of the Profile
quality of implementation — flexibility and smarts of algorithms
quality of databases
vendor/municipal compliance with standard practices
application context and user requirements — some (e.g. emergency management) are more exacting than others.

Clearly a time-constrained test effort must steer a course between a rigid, unintelligent interpretation of the Profile, and a sophisticated implementation driven by substantial investment and evolution. The test results below reflect current data sets and vendor practices, which will inevitably improve over time. The results should be seen as an indication of the types of errors that can be encountered, as much as a pronouncement on the probability of success.

What's to Test?

Message failure could be caused by

inadequacies in the profile itself
name matching problems
database errors

Inadequacies in the Profile

The Profile does not specify algorithms for composing or decoding the message. Location must be inferred at the receiving end using only the information provided. In Figure 2a and 2b below, the XSP message reads "Birch between Cedar and Cedar," and in Figure 2d, "Birch between Ash and Cedar," both of which are ambiguous. In Figure 2c the message is "Birch between Birch and Birch."

Figure 2: Some of the ambiguities not resolvable with the XSP

Furthermore, the profile does not specify the municipality or jurisdiction in which the names occur. A matching name could be found in another area. Results are therefore dependent on the geographic extent of the test area. In all these cases, coordinates are a fall-back or tie-breaker.

Name Matching Problems

Name matching works well only if databases are accurate. The first problem with the data is the large proportion of blank name fields. Among the four databases tested initially, 20–45% of all records were blank. Since the XSP requires three non-blank names, a message could be successfully composed in only about 33% of all attempts. In 2–31% of all cases (varies with database), all three streets in a transfer attempt were blank.

Note:

To an certain extent it is possible to get around this problem, e.g. as long as the on-street is named, an algorithm can search outward for the nearest named cross-street in either direction. However, this relies on the destination database also having a populated field for those cross-streets.
It could be argued that most of the unnamed streets are remote ranch roads and private tracks, which are irrelevant to ITS needs. This is true, but there are two counter-arguments:

For mission critical applications such as emergency management services (EMS), remote roads are just as relevant as major highways.
On average, 10% of "major" streets have blank names (vendors differ in their characterization of "major" streets). Freeway ramps — on which many highway incidents are located — are typically unnamed.

When street names are non-blank, there are other problems.

Alias exists: Ventura Freeway appears as Hwy 101
Spelling/Typographic error: Venture Freeway
Vendor practice

conflict in abbreviation or coding — Fwy vs Frwy; US-101 vs Hwy-101
prefix/suffix — some vendors distinguish between street name and street type (prefix or suffix). In the case of Main Street, Main is the proper name, Street is the street type (suffix). In the case of Via Del Monaco, Via Del is a street type (prefix), and Monaco is the proper name. Some vendors use suffixes but not prefixes, others use bother, or neither.

Vendor interpretation: Ward Memorial Blvd vs Clarence Ward Memorial Blvd.
Human error: Birch St coded as Birch Av

Algorithms can be designed around these problems. However, the more forgiving the algorithm, the more likely it is to find the wrong instance of the intended location. Again, coordinates offer a cross-check.

Database Errors

Database problems may occur in position, inclusion, topology or attribute.

Positional errors are documented elsewhere on the VITAL web page. An absolute or relative offset measured along Birch St inevitably translates to slightly different positions in the transmitting and receiving data bases. The extent of discrepancy depends on the resolution and accuracy of the data bases. The tolerance to error depends on the application. Examples:

airport, hospital, museum (no specific entrance specified): 100–1000 metres
hotel, gas station: 25–50 metres
parking spot, speed restriction sign: 5–10 metres

Errors of inclusion/exclusion create topological errors, expanded in the next point.
Topological errors are serious because the XSP is fundamentally topological. In Figure 3, Birch St comes within a few metres of Main St, but does not intersect it. Suppose the transmitting database erroneously records an intersection — a common error — whereas the receiving database does not. Then a message that uses this intersection cannot be interpreted at the receiving end. In the converse situation, where an intersection exists in the receiving data base but not the transmitting system, intelligent receiver software can infer the location correctly, although with most link-oriented data structures in current use, this is difficult.
Attribute errors for XSP purposes refer to incorrect or mis-spelled street names, and misclassification (e.g. railroad track classified as street).

Test Design

Broadly there are two test components:

Name matching to determine probability of identifying the correct street segment. Since the result of the transfer is either right or wrong, this can in theory be measured by a “hit rate.” In practice this approach is too exacting, and hit rates are extremely low, in the range of 15%. The Phase II tests are more forgiving, and use a complex reasoning process to find a likely hit — recall that coordinates are part of the profile tested in Phase II.
Measurement of accuracy with which offsets are transferred, using lab and field tests. Tolerance to error depends on the user and application, hence there is no right or wrong result; testing simply documents the degree of accuracy. Offset measurement and error is the focus of a different LRMS profile — the Linear Referencing Profile — which will be tested downstream. Therefore the tests and results on this component are cursory.

Field tests are conducted using 54 points sampled in and around Santa Barbara. For lab tests, about 10,000 sample points are generated around the county, the sampling density proportional to the density of roads. The lab tests apply equal weight to all points, whether freeways or private ranch roads. Field sampled points are more representative of day-to-day driving routes.

Major roads are tested as a separate sample. There are wide differences between vendors on what constitutes a major road (for example, see Vendor A's version of major streets in Santa Barbara, compared with Vendor F). Nevertheless, we assume that the receiving database contains all streets, major and minor, therefore variations in the definition of "major" affects only the sampling process, not the transfer.

Finally, we test all streets and major streets in the Santa Barbara-Goleta urban area.

Findings

Test Set A matches sample points within the originating database. Obviously, using coordinates, such a test is 100% successful. When coordinates are excluded, only 33% of all sample points have non-blank names required for the transfer. Of those, about 5–10% of the transfers are ambiguous (Figure 2).

Test Set B examines the accuracy with which coordinates alone can identify the correct link in another database. The transfer is made using coordinates only, and confirmed by checking cross street names. As documented above, names are unreliable as a means of judging success. Therefore hits are scored on a scale of likelihood of matching, with some tolerance of blank fields. On average, 11% of test points score "likely" matches. About 66% fall into the “possible” category, because (a) they match the wrong street in the right general area, or the wrong segment of the right street, due to coordinate error, or (b) confidence in name matching is diluted due to blanks and other name matching problems — this is unfortunately an inherent limitation of the test design. Results are far better for field-sampled points, which generally lie on named streets.

**Table 1:** Test Set B
.	Mean	Min	Max
Likely	11%	2%	18%
Possible	61%	52%	70%
Unlikely	28%	18%	45%

Test Set C transfers locations using names+coordinates. Obviously, transfers are always apparently “successful.” Two questions need to be asked to validate the transfers:

How often does a transfer fall back to coordinates, because a match is not possible using street names? This count is termed “Bin 9,” a reference to Bin 9 on the name processing flowchart.
How far does the destination point lie from the source point? The name match could in some cases point to the wrong street, or if the destination database does not contain the road referenced in the source, transfer by coordinate may erroneously snap to the nearest available entity. These problems are addressed in Test Set D below.

Table 2 summarizes selected results pertaining to Test Sets C and D. The Bin 9 row indicates that 35% of all transfers (68% on major streets) fall back to coordinates. The high fallback for major streets is partly because aliases are not handled in this test set; they are examined in Test Set E, below.

**Table 2:** Test Set C and D
.	All Streets			Major Streets only
.	Mean	Min	Max	Mean	Min	Max
Bin 9	35%	16%	45%	68%	38%	91%
Snap distance
Median distance (m)	36.6	0.0	140.4	11.3	0.1	46.2
[0, 30m)	66%	37%	96%	82%	52%	97%
[30, 50m)	4%	0%	8%	9%	0%	15%
[50m and more)	30%	4%	58%	23%	3%	88%

Test Set D is a study of Euclidean distance between source and destination points, based on the same transfer tests as Test Set C. We note that distances below 30m are usually transfers made to the correct destination link, whereas distances above 50m are generally associated with errors. For single point transfers, the median distance between source and destination points is about 35m. Two thirds of all points differ by less than 30m (probably hitting the correct link), one third by more than 50m (probably hitting the wrong link). The worst deviation observed in any test is 75 kilometres — clearly a referential error.

Since fallback points are snapped to the nearest arc without verifying street name, a high "Bin 9" count in test set C is typically found in conjunction with an artificially low median distance in D.

Test Set E. As an afterthought to the original experimental plan, we implement a "fan-out" algorithm that searches intelligently for any occurrence of the required street names in the destination database, not necessarily as a triad of names associated with a single link. In effect, this approach forgives intervening streets in the destination. The method is particularly applicable to major-streets-only events typical of ITS.

The algorithm locates the two intersections {On, From} and {On, To} — blank names are not admitted at all. It examines all possible paths between these two intersections, constraining the search to links carrying the On-street name (in theory there should be only one path, but due to database errors and odd municipal practices, such as forking streets with the same name, multiple paths are often encountered). It selects the longest such path to be the destination street.

Fan-out is implemented in conjunction with other matching processes, in the sequence:

1. Exact match
2. Fan-out using exact match
3. Fuzzy match
4. Fan-out using fuzzy match

The fan-out approach produces the best results in the entire test series, with a mean fallback rate of 33% (compared with 35% not using the algorithm). For major streets, the fallback appears to rise from 68% to 72%; however, these numbers are not comparable because for the fan-out tests, major street events are passed using only major cross streets; with other tests, major street events may use minor cross streets. Limiting major streets to the Santa Barbara urban area, the average fallback is 51%, with a best score of 23%.

Although the magnitude of improvement due to fan-out is disappointing, it is clear that this method is the most appropriate implementation of XSP, if only because transfers should not be confounded by the topology of intervening streets, the presence/absence of which are often matters of scale and interpretation. Based on cursory checking of some results, it appears that when fan-out fails, it is because of database errors, for example:

Spelling disagreements in name records, between databases: Sargosso in database X, Sargoso in database Y
Spelling errors or discontinuities in naming within a single database, due to which a path cannot be built connecting the two intersections points.

Some of these errors are easily fixed. We anticipate that intelligent name matching software could lower the fallback rate to the region of 10–20%.

Conclusions

As we emphasized in the opening paragraphs, success is a function of numerous factors, from inherent profile effectiveness to municipal practices, database accuracy and quality of implementation. Because there is no quality metadata explicitly associated with the message, there is potential for propagation of error. Because there are so many dimensions to the tests and results, each potential user group will need to focus on aspects appropriate to its needs.

For mission-critical applications such as EMS, the revised XSP (coordinates included) offers a measure of assurance, in that failure of transfer is clearly indicated by disagreement of coordinate and name components. Clearly EMS should take no comfort in the improvements in results due to “lenient” treatment of blanks in Phase II tests.

It must be emphasized that the bulk of the problem with low success rates is not the fault of the XSP specification, but is a reflection of database quality, particularly the high incidence of blanks, absence of alias fields in many databases, and non-standard name parsing and abbreviations. EMS agencies recognize the need for high quality data; in many areas they are the driving force behind municipal-level street database quality improvement initiatives. For best results EMS agencies must ensure that (a) they operate with reference to a single-source database as far as possible, and (b) they establish appropriate database quality control and testing measures.

The ITS industry expects far better success rates from a messaging standard than what has been achieved in these tests. The following are constructive recommendations for national-level activities that would lead to better results.

Obviously, the ideal long-term course of action is to re-survey the national street network to uniform quality standards. Piecemeal efforts are already underway. Several municipalities have integrated GIS programs in operation, with varying degrees of coordination between federal, state, county and private agencies. Outstanding hurdles are (a) the technical difficulty of finding a common quality standard that suits the needs of all stakeholders at a reasonable cost, and (b) management challenges to coordinate this activity at a national scale. Even if re-survey is publicly funded, commercial vendors will need to make substantial investments in data reorganization and conflation of nationwide databases. There are two shorter-term alternatives: standardization of databases, and the ITS Datum.

Standardization of databases: Messaging could be simplified if vendors would populate alias name fields, and comply with basic standards in street naming, in particular, highway and ramp nomenclature, field separation and abbreviation. Standardization of other aspects such as classification and inclusion, are desirable, but may not be readily achievable in the short term.

The ITS Datum is a mid- to long-term strategy that could potentially

alleviate many current messaging problems,
provide an evolutionary framework for a high-quality national database, and
offer a mechanism for continuing update of highway-related coordinates and attributes, that would survive future construction and changes in geodetic datums.

Conceptual design of the ITS Datum is underway; improvement in XSP success may be one of several measures of its ultimate effectiveness.

Update 1998-05-15

The full text of the LRMS Cross Streets Profile report is available under Technical Reports. VITAL acknowledges the support of the Federal Highways Administration, ITS Joint Program Office, Contract DTFH61-91-Y-30066. The project was executed under contract to Viggen Corporation. Infrastructure development that enabled this research was funded by Caltrans.

Research• Home Page