How did Pikmin Bloom delay Android Notifications?
On November 1, 2021 the public version of Pikmin Bloom went into the world. As of today, we know there are over 2 million users, but what struck me as completely odd is my phone notifications simply breaking after installing this application.
I thought I was going crazy, how could an application break/delay my entire phone from a specific application? It almost seems like I was going crazy or had a virus and then I saw social media:
Sure enough, this was a real deal only affecting Android. So I tossed the application, because I want my phone working over some Pikmin game.
A week and some days later (November 13, 2021) the change log for v34 came out and included in the change log was:
(Android) We are investigating the issue where the OS’s notifications get delayed.
This was the second time I saw official notice that an investigation had begun. The first being a pretty fast turnaround on November 2 of the wide spread reported issue in a forum post, but I figured that was generic customer service just "looking into it".
This morning (November 15, 2021) I saw a new release published (v34.1) with only 1 change in the change log.
(Android) Made improvements to the OS notification delay issue. The issue may still persist for some devices and users, we are continuing to investigate.
This shows to me with a patch release and only one change that Niantic is taking this seriously, but I'm still lost how an application even coded badly could affect all my applications (especially text driven ones) on sending or even receiving them.
So I thought, I used to do application teardowns for these applications until legal got in the way so I believe a new application with a small focus towards a bug and not leaking assets or security enhancements is good to go!
So lets diff this.
We can see right out of the gate there are not many changes, so we can easily manually review this.
➜ git:(master) ✗ git commit -a [master 58173c9a] [34.1] APK Dump 14 files changed, 351 insertions(+), 273 deletions(-)
If we look back at the files changed list, we can toss the following files for only having a build number / version change:
This leads only two classes that received changes:
So now I'm more confused how activity/location managers end up delaying my notifications, so lets take a look. I saw the
LocationManager class have a new constant added
.field private static final FASTEST_CONTINUOUS_LOCATION_UPDATE_INTERVAL_SEC:I = 0x5
If you are unfamiliar with hex (base 16), hex 5 is still decimal 5. This joined the already existing constant of:
.field private static final CONTINUOUS_LOCATION_UPDATE_INTERVAL_SEC:I = 0x12c
0x12c or 300 decimal, which probably corresponds to a 5 minute loop. The rest of this class has what looks to be additional debug information with newly added strings like:
const-string v0, "LocationManager requestUpdate"
const-string v1, "LocationManager startOrStopContinuousUpdate start:%b"
const-string v1, "LocationManager stop continuous update"
const-string p1, "LocationManager start continuous update"
Which will help track the events as they occur. The next changes we saw were in the
LocationCallbackReceiver, which introduced a new
count class variable.
.field private static count:I const-string v1, "LocationManager LocationCallbackreceiver onReceive %d"
This is additionally logged and used to presumably know if the callback is called too many times.
Now if we pivot to the
ActivityRecognitionManager, we can see the logic for "stopping activity updates" had some changes on when it was fired. So it will clean updates prior to starting them as well as cleaning up when ending them.
I know from enough Niantic apps that these applications barely survive in the background. Mixed with the aspect that Pikmin Bloom can optionally live in the background with connection with Google Fit might lead to some interesting state issues.
When we start thinking that most applications run through Google with push notifications, it might make sense that an application that hooks directly into Google for the health/activity/fit portion may overlap some of that logic. So if the activity/location managers aren't cleaned up properly or checking too often (now locked to the fastest of 5/sec per update) it may lead to so many events getting pushed that other events are delayed.
Except hold up! While writing this post another Android only release (34.2) came out!
So lets do this again.
The official change log for 34.2 says:
Thank you for playing Pikmin Bloom!
- further improvements to the OS notification delay issue
➜ git:(master) ✗ git commit -a [master e524c7dd] [34.2] APK Dump 9 files changed, 191 insertions(+), 143 deletions(-)
Once again the diff has many files touched with a new hash/version as part of the release. So the only file that had meaningful changes for this discussion is one file:
This file received many changes since the last release (34.1) and we see the addition of two new properties in this build.
.field private static final GEOFENCE_LOITERING_DELAY_SEC:I = 0x1e .field private static final GEOFENCE_NOTIFICATION_RESPONSIVENESS_SEC:I = 0x1e
These properties take aim at setting up delays and response time for geo-fences. They are both set to hex
0x1e or 30 decimal. Lets talk about a game mechanic real quick that is different than say like Ingress or Pokemon Go where you directly interact with locations.
In Pikmin Bloom you can run into a flower that requires upwards of a certain number of flowers planted within its range. Is the game using geo-fences around these locations?
Looking at some pseudo code to see what changes were made. Some logic occurs now against the API (Google Protobuf) looking for a specific flag to determine two different values that are then set to the
((com.google.android.gms.location.Geofence$Builder) builder ).setNotificationResponsiveness(notificationResponsiveness); ((com.google.android.gms.location.Geofence$Builder) builder ).setLoiteringDelay(loiteringDelay);
This is a Google package, so we can easily find the developer docs for these methods. We learn the following:
setLoiteringDelay (int loiteringDelayMs)
Sets the delay between
GEOFENCE_TRANSITION_DWELLINGin milliseconds. For example, if loitering delay is set to 300000 ms (i.e. 5 minutes) the geofence service will send a
Geofence.GEOFENCE_TRANSITION_DWELLalert roughly 5 minutes after user enters a geofence if the user stays inside the geofence during this period of time. If the user exits from the geofence in this amount of time,
Geofence.GEOFENCE_TRANSITION_DWELLalert won't be sent.
setNotificationResponsiveness (int notificationResponsivenessMs)
Sets the best-effort notification responsiveness of the geofence. Defaults to 0. Setting a big responsiveness value, for example 5 minutes, can save power significantly. However, setting a very small responsiveness value, for example 5 seconds, doesn't necessarily mean you will get notified right after the user enters or exits a geofence: internally, the geofence might adjust the responsiveness value to save power when needed.
So now we understand the units at play here (milliseconds) so lets look back at this logic. Depending on the path taken - you'll either get 2 or 4 seconds back from the service. This is immediately converted into milliseconds and passed onward to the two above functions.
With that, we looked at two patches releases specially for Android and made some assumptions and proved some logic changes occurred for:
- Location update frequency capped at 1 per 5 seconds at fastest.
- Location update frequency locked at 1 per 5 minutes at slowest.
- Location callback added count detection to toss out already processed events.
- Location Manager class added extra debug logging.
- GeofenceBuilder leveraged loitering delay parameter (2 or 4 sec).
- GeofenceBuilder leveraged notification responsiveness parameter (2 or 4 sec).