The role of statistical scoring models in contact-tracing

Each contact you have with another person potentially infects you with a virus. If the person you have contact with later tests positive for that virus, the contact you had with that person increases the chance that you will also later test positive.

The sooner you are notified that one of your contacts test positive, the sooner you can also get tested and the lower the risk of spreading the virus further and the more efficient the use of limited testing resources. This is the principle behind contact-tracing.

We have worked closely over the last 15 years with Dr. Andreas Henking of RiskSIM and Quantic Risk Solutions to develop and implement statistical financial credit scoring models for customers.

Scoring models in contact-tracing

Each contact can be assigned a score in a range from 0 (no reason to believe this contact might have infected you) to 10 (every reason to believe this contact may have infected you). The function which assigns a score to a contact is called a scoring model. When a participant has accumulated a sufficiently high score to constitute a risk of infection (i.e. enough contacts with infected people), they need to be alerted.

Scoring models are used in many areas to predict outcomes based on input factors – most notably the credit scores used by banks to determine whether borrowers will default on their loan repayments.

Heuristic models

Heuristic models are educated guesses of how the outcome is related to the inputs. For instance, a heuristic used in Singapore is that a contact of more than 30 minutes at less than 2 meters with someone who will test positive within the next 7 days is most likely to cause infection. Heuristic models work reasonably well where the relationship between the inputs and the outcomes is relatively simple, however its likely that the relationship is more complex – multiple shorter contacts may also contribute to infection, depending on how factors like the time between the contacts.

Statistical models

Statistical models, in contrast to heuristic models, don’t need to make assumptions or guesses about the relationship between inputs and outcomes. Instead they rely on the availability of historical data containing both inputs and outcomes. They then infer a relationship between the inputs and the outcomes by statistical analysis of this data. This process has the advantage that it can potentially more accurately score any contact. The disadvantage is that it requires significant amounts of data, inputs and outcomes, to work.

Measuring model effectiveness – the Gini coefficient

The predictive power (or quality) of a model is measured by its Gini coefficient. A Gini coefficient of 0 means that the model is worthless and a Gini coefficient of 1 means that the model gets it right every time.

A model can be back-tested against historical data and it can be continuously validated as new data becomes available. If the Gini coefficient starts to drop over time, the model needs to be recalibrated or redeveloped based on the newer data.

Inputs and outcomes in contact scoring models

Some of the inputs which would be taken into consideration are:

  1. Stage of the infection in the other person at the time of contact
  2. Duration of contact
  3. Number of contacts
  4. Interval between contacts
  5. Proximity of contact
  6. External temperature
  7. External humidity

The outcome is a positive test result (i.e. confirmed infection).

See also

An API-first approach to COVID-19 contact-tracing

1 thought on “The role of statistical scoring models in contact-tracing

  1. Pingback: An API-first approach to COVID-19 contact-tracing – blog.armstrongconsulting.com

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.