This is a deep-dive article meant for people deeply familiar with the Signal code base

I was puzzled by why some people were registering in the Diskuv Communicator Android app (a fork of Signal) , but getting the warning "Missing Google Play Services" and then being asked to let the app run in the background. Basically, they were being forced to use the websocket delivery option rather than the more robust websocket + Firebase Cloud Messaging (FCM) option.

Our recommendation is to bump up a magic constant from 5 seconds to 30 seconds. I summarize the raw test results and how we arrived at the recommendation.

Some background ...

There is a REST API GET /v1/accounts/{type}/preauth/{token}/{number} in Signal-Server. In Diskuv Communicator we have reworked that API to remove the phone number ... that new API is GET /v1/accounts/{type}/prereg/{accountId}/{token}.

There is another other change in how Diskuv generates the push challenge, allowing the Android app to de-duplicate a push challenge when FCM weirdly decides to send the push challenge to multiple destinations. However, it is not relevant to this deep dive so I'll skip over that.

The intent of the API (I think!) is to allow the server to know the Android app is capable of receiving FCM messages. Another intent is to lean on Google to authenticate there is a human holding a phone rather than a bot; if there is no acknowledgement of the FCM push challenge message, the server forces the user to do a CAPTCHA. The API implements this as the steps:

  1. Randomly generate a string that is called a "push challenge". This is a very fast operation.
  2. Add the push challenge to a database, alongside the account identifier (ie. phone number for Signal). This is pretty fast as well.
  3. Send that push challenge over the Firebase Cloud Messaging (FCM) to the Android app. The delivery of that push challenge is now in Google's hands.

The Android app is set to wait at most 5 seconds for the entire three (3) steps above. If those 5 seconds run out, a CAPTCHA is given to the user and the warning "Missing Google Play Services" will be hoisted on the user.

The 5 seconds seemed like a magic number; perhaps it was right two years ago when the constant was set? So I setup a load test to see what the value should be today. I used Locust to perform the load test, and our raw Locust report is here. In summary:

  • Our Locust scripts created 115 FCM client applications tokens over 20 minutes. Each client application mimics the installation of a Android app, and all but one client application was able to create a FCM registration token.
  • The client applications were in Seattle; the servers were in Northern Virginia
  • The actual HTTP request to GET /v1/accounts/fcm/prereg/[account_uuid]/[fcm_token] only took on average 186 ms.
  • However, the waiting for delivery of the push challenge took on average 4.1 secs or more importantly P95 of 11 secs and P100 of 21 secs at 2 digit accuracy.
  • Somewhere between 60-70% of the push challenges can make it back in 5 seconds. That means 30-40% of the push challenges are not served within 5 seconds.

I've changed that magic constant to 30 seconds in Diskuv Communicator. Expect these results to change over time, and the timings may have been sensitive to the geography. But for now, to improve the reliability of the Signal Android app (which I still use as my social networking app!), we suggest bumping up the timeout to 20-30 seconds.

Thanks -- Jonah, Founder