New traffic data collection technologies require new verification methods

5 min read

The increasing information explosion demands new approaches to the selection and utilization of data.  Tremendous amounts of constantly generated information, along with the technical simplicity of producing the data derivatives, force the users, striving to avoid costly mistakes, to carefully verify the information they consume.

This can be achieved with the help of a powerful ally, i.e., a multiplicity of various data originating from a variety of sources. In other words, it’s vital to ensure the application of data cross-verification methods.

Let’s review several examples of how verification of data quality and information relevance helps to weed out unreliable data and, therefore, avoid errors in the site selection process.

1. Misinterpretation of data from a reliable source.

Here’s an example of a reliable source: US Census. You can quite justly characterize it as very reliable, verified, and respected. It provides highly granular data sets, such as average household income information.  But when you get an offer from a data provider to supply you the information about “traffic by income” distribution it is helpful to remember that the origination of a vehicle trajectory in an X-income polygon does not guarantee that any specific driver has exactly X average income. Even more, people of different professional occupations - and thus, with different incomes - could quite often choose different travel destinations, so there’s absolutely no guarantee that all components of the population will be uniformly represented in all traffic flows. This is why, while the original data on median income is unquestionably correct, there's a high probability that data derivatives, like “traffic by income” distribution, will become misleading.

2. Leverage of irrelevant data

for traffic flow analysis is another common source of wrong decisions. Estimations of traffic volume based on Location Based Services (LBS) or cell phone data have one key problem - they are directly relevant to people, not to traffic flow. There are some notorious examples with 100 people in one bus, but there’s another less typical case: two people in one car. Therefore, estimations based on cell phone data come with uncontrolled errors that can vary from 30% to 100% depending on location as well as over time.

3. The use of incomplete data

This error is exemplified by multiple efforts to base traffic volume estimations on probe sizes. It has long been known how rough such estimates are, based on manual short-term counting. Less obvious are the errors of using multiple data coming from GPS sources, like navigation, connected cars, etc. Despite the fact that the number of such data points is in the billions, there are still too few of them compared to the actual number of cars on the road. Currently, even for the best data provider, the average penetration rate (calculated as a ratio of probe size to real traffic volume) is about 10-15%. In other words, only one - two connected cars out of the overall eight cars moving at the same road section will be registered by the data provider (illustrated by the picture above). Since not all kinds of vehicles are represented equally in the probe, the penetration rates vary a lot from provider to provider and from location to location, depending on what type of fleet is observed, how intensive is the access to the navigation systems, what connection technology is used, and more. The influencing factors are so numerous that even the use of sophisticated ML models does not yet allow the practitioner to achieve acceptable accuracy. Our case studies clearly show that not only the estimation of true volumes but even a comparison of relative traffic volume values between different sites based on probe sizes lead to unacceptable errors.

Our e-book provides a more granular analysis of the above-mentioned and further examples of typical errors.

In general, we can conclude that in order to avoid costly mistakes - and for errors in site selection, businesses have to keep paying for many years - in the modern world it is not enough to use traditional criteria such as  “proven technology”, “reputable source”, or “trusted person”. Instead, it is necessary to check not only the source of the information but also the information itself.

Of course, this is a laborious process - but it can be automated, and we managed to do it.

Ticon’s proprietary algorithm, based on traffic engineering and multifactor analysis, transforms data mined from multiple independent sources into a cross-verified, high-resolution dataset that allows the user to see traffic flow parameters, including true volumes values and speed distribution, as well as their derivatives for very small road segments, i.e. the exact place of your interest.

Contact us to request a demo and learn more about our methodology of comprehensive data processing and verification, which makes Ticon data ample, reliable, and available.