When deploying a Cisco Expressway MRA(Mobile Remote Access) solution, I ran into a weird interoperability issue with AT&T wireless. The symptom was: MRA calls to AT&T Wireless numbers went straight to voicemail without ringing the cell phone at all. The same MRA call doesn't seem to have problem with other carriers like Verizon, T-Mobile or even AT&T wired phones.
At the first glance, this seems to be a carrier issue and there is not much we can do unless the carrier tells us what's wrong. The demarcation point is at the CUBE. I don't have any visibility beyond the CUBE.
I decided to do some troubleshooting within my scope. I noticed that non-MRA calls didn't seem to have this problem.
If it's only the MRA calls having the problem, it is unfair to point the finger to the carrier. But on the other hand, this only happens with one carrier. It must be an interoperability between MRA and that particular carrier.
Both MRA calls and non-MRA calls go through the same CUBE. I looked at the INVITEs sent from CUBE to carrier. They are very similar except that the MRA calls have "Max-Forwards: 12" in the SIP messages while non-MRA calls have "Max-Forwards: 69".
I'm not sure if that's the root cause of the problem but that is the only thing sticks out. By looking at Cisco documentations, Expressway has default Max-Forwards of 15 and CUCM has default of 70. These values are very close to 12 and 69 from the CUBE logs.
Max-Forwards tag was designed to prevent infinite loops in call routing, similar to the TTL in IP packets or hop-count in routing protocols. The value will be decreased by 1 on each hop along the path. If one of the hops has a different value on Max-Forwards, the lower value takes precedence. The diagram below explains why the MRA calls have a value of 12 while the non-MRA calls have a value of 69.
Without seeing the AT&T Wireless logs, I cannot tell what happened within the cellular network. But imagine there are 12 or more hops in the cellular network before the call reaches the wireless endpoints (cell phones). What would happen?
When the Max-Forwards value decreased to 0 on the way, the call will be dropped. If that happens, the call controller within the cellular network will think the cell phone is unreachable (like when the cell phone is powered off or out of signal). The call controller will send a REFER (redirect) SIP message back to the originator. The call will be redirected to the cell phone's voicemail. This is exactly what happens when the cell phone is "unreachable".
- If it take less hops for the CUBE to reach the voicemail server (less than 12 hops), the call will be established. The caller will hear voicemail greetings.
- If it takes 12 hops or more for the CUBE to reach the voicemail server, the caller will hear reorder tone (fast busy) or the carrier's error announcement. Because the call will fail for the same reason (Max-Forwards decreased to 0).
In my case, it is the prior. Again, all these are just my guess, but an educated guess. Is there a way we can fix this problem without carrier involved? Of course.
The solution is to change Expressway default value from 15 to 70. It doesn't necessarily have to be 70. It just needs to be a value large enough so that the SIP message can survive the number of hops before the Max-Forwards decreased to 0. Since CUCM has a default value of 70 and it seems to work, I decided to set Expressway to 70 as well. If you are one of those OCD (Obsessive-compulsive disorder) persons, you may set Expressway to 72. Then both MRA and non-MRA calls will leave the CUBE with the same value of 69, making it "consistent" from carrier point of view.
After the change, MRA calls to AT&T wireless numbers work as expected.