There’s a multitude of transactions, processing points and interactions that occur between the source and the destination when making a digital transaction via a mobile network or otherwise. What could go wrong? Actually, quite a lot.
When it comes to supporting clients with multiple integrated systems, the litany when issues occur is always the same: “We see no problem, please check with the other party”. This is often a very frustrating thing, because you will often have exactly the same result to your investigation: “we see no problem, please check with the other party.” What is often forgotten is that the transaction source (in this example, Breakpoint’s Digital Experience Monitoring solution, Apogee), and the transaction destination (the system servicing Apogee’s request) are not the only two parties involved in the conversation, and in this post I’d like to elaborate a bit on one such case.
A not-so-simple request chain
One of the many channels that Apogee supports when performing a user-simulation request is a USSD transaction using a GSM terminal. To explain; Apogee uses the same infrastructure as the end user to initiate, traverse and eventually request any transactions available to the end user. Just like a real user, Apogee would dial a USSD short code (eg. *123#) and would receive a menu. It would then check whether the menu received is what it expects using regular expressions and, if correct, would request the next menu item using a continuation USSD message (ie, not a new session, simply a response on an existing session). This would be repeated until the transaction is finally executed. Just like the intermediate steps, Apogee would then review the response message (which may even be asynchronously delivered via an alternate bearer such as SMS), and pronounce success or failure based on its configuration.
The above request chain actually encompasses a number of systems: the modem needs to be registered on the GSM network in order to perform the request, the request then needs to go through the GSM network to the USSD gateway, which in turn must understand the short code and send the request via the client internal network, usually through multiple firewalls and switches to the relevant backend system for processing, and then all the way back again.
Additional points of failure
This discussion is not an in-depth exposé on the components of the GSM network nor its workings. However, for the SIM in the Modem to be able to use the GSM network, it must be registered in the Home Location Register, or HLR. (For interest sake, if you’re roaming, your information is kept in the Visitor’s Location register or VLR). This HLR will tell the Mobile Switching Centre (MSC) that you’re allowed to send traffic over the GSM network. Sometimes, the SIM will be expired or blocked on the HLR, which interestingly does not stop the SIM from registering, but will block any traffic, and dialing the USSD short code will time out right there. That is the first additional potential point of failure.
If all is well on the MSC and HLR part of the network, the request will now be sent to the USSD gateway. The USSD gateway defines a set of short codes as valid for the client network and for each of these short codes an end point is configured, for example a backend system that offers airtime purchases, bill payments or mobile money wallet access. The USSD gateway will usually have a connection pool for inbound connections (usually shared), but not necessarily for outbound connections, as well as other items such as database connections etc. Should any of these connection pools be full or exhausted, the USSD gateway might not accept the request or be able to send the response – in this example, the second additional potential point of failure.
Once the USSD gateway receives the request, it will look it up in its register of short codes and send the request to the backend application. If it can’t, then the user would commonly get a confusing error such as “External Application Down”; ie, the back-end application external to the USSD Gateway to which it is trying to send the transaction, is not responding. There can be many reasons why this is so. Perhaps the external application actually is down. However, with the mindset of ‘It’s not me, it’s you’, it would be reasonable to expect the backend vendor to have confirmed their backend system is up and processing transactions. Thus other possible causes could be network or firewall changes) which can actually block the request or response. (We would hope this is a formal change-controlled environment and such changes would be documented enthusiastically.) This is the third additional potential point of failure.
Just like the USSD gateway, the backend application would typically have connection pools for inbound and outbound connections as well as database connection pools. Additionally, and especially in the financial transaction world, there would be connections to hardware security modules (or HSMs) for PIN verification and other cryptographic operations as well. Each of these can be a potential point of failure.
Finally, the response to the request would go all the way back, each component having to send and the next receive the response until, finally, it reaches the originator – sometimes via alternative bearer channels than the originating request.
So to summarise, it isn’t always as simple as a source and destination where one of the two must be the culprit in a failed scenario. There are many possible points of failure in the middle, especially when it comes to a GSM or mobile transactions, which could potentially be the cause of the failure: be it hardware being overloaded, licenses being overrun, connection pools being exhausted, or plainly being blocked from doing the transaction. Keep that in mind the next time you are tempted to respond with “It’s not me, it’s you!”, and perhaps try to find ways of identifying and isolating the issues to intermediate systems. Once all the systems in the path are known, the next step is to make them visible – but that is a subject for another day.