Brief Analysis of Apple & Google’s Contact Tracing specification.


We are currently in the midst of the Coronavirus Pandemic. SARS-Cov-2, or COVID-19 as it’s more commonly known, has been with us for a while now. Contact tracing was the practice of scrambling to find those who had come into contact with the early cases, but once cases begun rapidly accelerating this was mostly abandoned.

Countries are in lockdown, but the virus continues to spread. Contact tracing as a method of preventing the spread of SARS-Cov-2 is being discussed again, particularly by Apple and Google.

Google have already setup detailed statistics on the virus. If you Google the virus or the country, you’re presented with general statistics on the spread in that area.

Google search for COVID-19 in the United Kingdom.

Apple have begun planning to add Coronavirus testing centres to the Maps application, allowing for organisations who are testing people for the disease to declare themselves to Apple so users can find their local testing area.

However, Apple and Google have now come together. They have put together a plan for a method Contact Tracing using our mobile devices that doesn’t compromise user privacy. Apple released a preliminary cryptography, bluetooth and API specification discussing how the system will work.

Apple outlined that the system would be rolled out in two stages. The first would be an API to allow “interoperability between Android and iOS devices using apps from public health authorities”. Presumably these APIs will be available via Entitlements to organisations like the United Kingdom’s NHS Digital, rather than to just any developer.

In fact, the Secretary of State for Health Matt Hancock confirmed that the NHS would integrate the API provided by Apple and Google into their own contact tracing app for iOS and Android.

The second phase would be the broader Bluetooth-based contact tracing outlined in the Cryptography Preliminary that I will discuss today. This will be built-in to Apple and Google’s respective Mobile Operating Systems iOS and Android. They describe this as a more “Robust” solution than just simply an API and would allow more people to participate. Finally, they make clear throughout the announcement that this will be a completely opt-in service with privacy as one of the main goals.

You can read the original post by Apple here.

I will attempt to give a more simplistic explanation of what the spec outlines, so any feedback you have would be greatly appreciated. This post will cover the Cryptography side of the protocol, and I may do a similar post for the Bluetooth and API specs in the coming days.


Cryptography

I shall start by discussing the Cryptography white paper Apple and Google released. The paper begins by defining some external functions used for generating keys and for the data used to generate those keys. I’m reluctant to go over their definitions here as I would be in danger of simply rewriting the white paper. That’s not the aim. The aim is to explain what is outlined in the paper in a more readable way for the average user. For more information about the HKDF, HMAC, CRNG and Truncate functions, have a read of the Cryptography white paper.

Moving on, first is the definition for DayNumber, an unsigned 32-bit integer (uint32_t) for identifying a particular day. It's generated by taking the current Unix Epoch time and dividing by 60 x 60 x 24. More on how this is used later.

Following {% katexmm %}$DayNumber${% endkatexmm %} is {% katexmm %}$TimeIntervalNumber${% endkatexmm %}. This provides a number for each 10-minute window in a given 24-hour window as defined by {% katexmm %}$DayNumber${% endkatexmm %}. It is stored as an unsigned 8-bit integer (uint8_t) and generated using the following:

There are three keys that will be used in the contact tracing protocol. First of all is the Tracing Key. This is a 32 byte cryptographically random number that is to be kept secret, therefore never leaving the device, and is to be unique to each client. The white paper describes it as being “securely” stored on the device, so it’s possible some other protections are implemented - possibly integration with the Secure Enclave (SEP), but that is not mentioned.

Next is the Daily Tracing Key. This key is generated every 24-hour window that Contact Tracing is enabled. The daily tracing key is a 16-byte hash using the previously defined HKDF function. This hashing function takes a key, salt, info and outputLength as parameters. To generate the Daily Tracing Key, the secret Tracing Key is passed as the key, nothing is passed as the salt (NULL), the string “CT-DTK” with the DayNumber for the current 24-hour window appended is given as the info, and, finally, 16 is passed as the outputLength.

Finally is the Rolling Proximity Identifier. This key is described as a “Privacy-preserving identifier” which is used for Bluetooth broadcasts, or “Advertisements” - more on this shortly.

The Rolling Proximity Identifier is generated each time the device Bluetooth MAC address changes. The RPI is generated using the Daily Tracing Key and the Time Interval Number, both discussed previously. The actual key is generated with the following.

The usage of the Truncate function here results in the RPI being the first 16-bytes of the hash generated by HMAC.


Now we’ve discussed all of the cryptographic definitions required for the Contact Tracing Protocol, we can now discuss how the protocol works - again based on the Cryptographic Preliminary white paper.

Each device, once enrolled, generates the unique and secret Tracing Key. This is used for the entire time the device is enrolled, and is re-generated if the user is re enrolled. After this, each day a Daily Tracing Key is generated and is stored on the device. These key’s also never leave the device unless the user is tested positive for Coronavirus.

The system requires Bluetooth. Throughout the day your device will broadcast Bluetooth Advertisements with the generated Rolling Proximity Identifier for the current broadcasting rotation interval, and surrounding devices also enrolled in the system will store all of these. Due to the nature of the RPI there is no identification, therefore there is no possibility of working out who is who. By design the protocol is meant to identify whether you have come into contact with someone infected with SARS-Cov-2, rather than identify who that person was.

When someone is diagnosed with Coronavirus their Diagnosis Keys, which is a set containing the Daily Tracing Key and associated DayNumber, for the period where the individual is deemed to have been infectious are uploaded to the Diagnosis Server. This server collects all of the Diagnosis Keys of those who've tested positive and then distributes them to all devices.

Devices will frequently retrieve the list of Diagnosis Keys from the Diagnosis Server. As we already know, the Diagnosis keys are made up of a Daily Tracing Key, referred to as Dtk, and the corresponding Day Number, hereby referred to as Dn.

The device will re-derive the Rolling Proximity Interval, or RPI, from the Diagnosis Key, and compare to each of the RPIs collected.


Re-deriving of Rolling Proximity Identifiers

A quick disclaimer: This is obviously just my opinion and analysis. How this will actually work in practice will not be known for a few weeks. When Apple and Google release their Operating System-level implementation of the Bluetooth-based Contact Tracing Protocol I shall attempt another explanation of how this works.

This is basically just a quick overview of how the Rolling Proximity Identifier can be re-derived from the Diagnosis Key, based on the Matching Values from Users Tested Positive section from the Cryptography Preliminary - which describes how the device will determine if it has come into contact with a device of a confirmed user from their Diagnosis Key.

As we have discussed, the Diagnosis Key is a set of the Daily Tracing Keys (and their corresponding Day Number) for all the days that the user was determined to have been infectious. The problem here is how the Diagnosis Key is used to compare with the Rolling Proximity Identifiers collected from Bluetooth Advertisements.

The document states that a device will frequently fetch Diagnosis Keys to verify. Presumably devices will keep a record of the Diagnosis Keys that it has checked, therefore there is no repeating of the verification process. However, the RPIs collected cannot be deemed to have been a safe interaction until 14 days after the exposure. So it is by my understanding that the last 14 days worth of collected Rolling Proximity Identifiers are checked every time the device collects a new batch of Diagnosis Keys.

The Diagnosis Key, as I’ve mentioned, is a set of both the Daily Tracing Key and the corresponding Day Number. Take the following:

Once a device receives this batch of Diagnosis Keys it must, somehow, re-derive the RPI that was advertised by that device at that time. To work out how this is done, we must look again at how the RPI is generated.

The RPI is derived using the Daily Tracing Key and the Time Interval Number. We already have the Daily Tracing Key from the Diagnosis Key, but how do we generated the Time Interval Number?

The Time Interval Number is generated by taking the number of seconds since the start of the Day Number. To get the start of the Day Number, we just just reverse its formula. Take Odn as our original Day Number.

The last step to calculating the TIN is to find the number of seconds since the Day Number, now Odn, was calculated. Reminder that the TIN is generated every 10 minutes. For this, we take the timestamp of the current RPI that is being verified, and subtract the Odn. We’re left with a new Seconds Since Start of Day Number value and we can use the formula for Time Interval Number to calculate the original.

Once we have the TIN we can now use the formula to re-derive the Rolling Proximity Identifier, and then compare to the current RPI that is being verified.

Now, if the two RPIs match, the device can alert the user that they have come into contact with someone confirmed to have been diagnosed with Coronavirus. The same process happens with them, whereas the days which the user is deemed to have been infectious are recorded and the corresponding Diagnosis Key sets are submitted to the Diagnosis Server for distribution among all other enrolled devices in the Contact Tracing program.

Otherwise, no alert is given and the next RPI is verified.

There are a number of things about this analysis that I’m still questioning however. The Day Number is calculated, in programming terms, as Epoch % (60 x 60 x 24). The problem is if you do this with an Epoch timestamp, then attempt to reverse it, you will get an inaccurate result. This is described here: https://stackoverflow.com/a/41197027

Because of this, the other calculations are thrown off. I’m unsure how this will work in practice and It’s also possible I got this all wrong. However I will go over this again once the APIs and Operating System integration is implemented. For reference, this paragraph from the preliminary is what I’ve described is based on:

"In order to identify any exposures, each client frequently fetches the list of Diagnosis Keys. Since Diagnosis Keys are sets of Daily Tracing Keys with their associated Day Numbers, each of the clients are able to re-derive the sequence of Rolling Proximity Identifiers that were advertised over Bluetooth from the users who tested positive. In order to do so, they use each of the Diagnosis Keys with the function defined to derive the Rolling Proximity Identifier. For each of the derived identifiers, they match it against the sequence they have found through Bluetooth scanning."

Bluetooth

As I’ve already mentioned, Apple and Google released three documents. A Cryptography, Bluetooth and API Preliminary. We’ve already gone over the cryptography one, there’s not much point in covering the API yet as it’s subject to change and no code has actually been released as of yet, so all is left is the Bluetooth paper.

This document outlines some Bluetooth-specific aspects of the Contact Tracing system such as the structure of the Bluetooth payload, some particular definitions and diagrams of how devices communicate with each other using this protocol.

The paper begins by defining a Contact Detection Service, a Bluetooth LE (Low-Energy) service registered with the Bluetooth SIG (Special Interest Group - the Bluetooth standards organisation). It’s described as being “designed to enable proximity sensing of Rolling Proximity Identifier between devices”.

The Contact Detection Service payload is structured as follows:

Contact Detection Service structure based on the Bluetooth Preliminary.

  • Flag: The flag section defines the payload as Low Energy Discoverable Mode. This is done by setting bit 1 of the flag to 1.
  • Service UUID: A 16-bit Service UUID, 0xFD6F, precedes the Service Data section.
  • Service Data: The 16-bit Service UUID paired with the 128-bit Rolling Proximity Identifier.

Device Communication.

The paper defines a few important behavioural properties for devices advertising their Rolling proximity identifier. The first is that the Bluetooth Random Private Address rotation period should be random, being greater than 10 minutes after the last rotation, and no more than 20 minutes after the last rotation.

The second is that the Bluetooth advertiser address and Rolling Proximity Identifier should be change synchronously, therefore these two identifiers cannot be linked together.

Along with this, behavioural properties for devices scanning for RPIs are also defined. The first being that any Contact Detection Services that a device discovers should be kept on-device and not shared. The second being that scan results should be timestamped, this allows for RPI re-deriving later on when verifying if a device has been exposed.

There are other behavioural properties defined, but these are the significant ones.


Summary

Thank you if you've stuck around this long, and apologies for leaving it so long since my last post. If you have any questions or feedback please let me know. You can reach me on Twitter @h3adsh0tzz