Wednesday, February 25, 2009

KISS - Keep It Simple Stupid

Most IT guys would know about KISS rule (Keep It Simple Stupid). However, not too many really understand and utilize it. Let's take a look at some Cisco Unified Communication products and see how we can utilize KISS rule.

The most frequently seen problem description is "it doesn't work".

"It" could mean lots of things. "It" could involved different products from different products/vendors (such as CUCM/IPCC/CUPS from Cisco, MOC from Microsoft, PBX from Avaya, T1 trunks from AT&T, etc.).

In order to simply the problem, we have to narrow down the problem quickly.

For example, if a customer said "My call center agents cannot make phone calls", I would ask "can you make calls from IP phone to IP phone in the same office?". This question could potentially eliminate call center software, voice gateway, PSTN and codec issues. If you didn't ask, you'd have to troubleshoot those items one by one (assuming you know how to troubleshoot those items)

Another example is network issue. All Unified Communication software rely on network connectivity. They wouldn't function properly if network didn't. Sometimes, network issue was not as obvious as you had thought. For example:

1) Windows Firewall service was stopped. But traffic wouldn't pass through until you explicitly open ports on it. (Hard to believe. But it happens)

2) You're not using VPN. But VPN client was running as a service and have firewall option turned on. (works as designed)

3) You claimed there was no firewall in the network, but there's a FWSM(Firewall Switching Module) on the switch.

4) You opened all ports on ASA (Firewall and VPN), CUPC still won't work. That is because one of the ASA bugs prevent large SIP message from passing through.

...

To troubleshoot network issue, you have to:
a) Have visibility on every components in the network
b) Be very good at all network layers (from physical layer to application layer)
c) Know how to use sniffer (such as Wireshark)

The difficult part is: sometimes you wouldn't think it's the network because it's not that obvious. Hence you wouldn't go down that path at all. You have to use KISS rule to find it out.

Example #1:
Customer: "My CUPC doesn't work."
You: "Doesn't work for all users? Or for some users?"
Customer: "For those users working from home."
You: "If those users were in office, would CUPC work for them?"
Customer: "Yes."

Now you know the problem is outside CUPC. Probably on the network (VPN?)

Example #2:
Customer: "My CUPC doesn't work."
You: "Doesn't work for all users? Or for some users?"
Customer: "It works for John but doesn't work for Mary. And they are both in the same office."
You: "On John's computer, can you log into CUPC with Mary's account? See if it works?"
Customer: "Yes, it works."

Now you know the problem is outside CUPC. Probably on Mary's computer (Firewall?)

Some KISS rules for Cisco UC (Unified Communication) products:

#1 If you don't know if it's case sensitive, assume it is.
This becomes a problem when Cisco moved from Windows platform to Linux.

#2 Because of #1, try to use lower case as much as you could.
Some people use capital case for cosmetic purpose. This could potentially cause some problems and it could take weeks to troubleshoot.

#3 Eliminate dependencies as much as possible.

Example A: When you installing CUCM, you have the option to use DNS, NTP, etc. Do NOT use them. If you use them, the installation might fail if those components weren't configured properly. You chance to configure them after install. I can't tell you how many problems are caused by DNS (even after install).

Example B: Don't use same "service account" for different applications. For example, you used the same active directory account for CUCM LDAP integration, CUPC LDAP search and CUPS Calendar. If CUPS admin change the account password (for whatever reason), it breaks CUCM and CUPC.

Example C: Get rid of CUCM subscribers during Windows-to-Linux migration. When you migrate from CCM 3.x/4.x (Windows) to CUCM (Linux), DB replication is always a headache. DB replication would fail if hostname, IP address was changed during migration (or some other changes between Pub and Sub). To avoid those headaches, remove subscribers from the cluster before migration. With a single server (Publisher) in the cluster, your chance of failure is far less than a 10-server cluster. After migration, you may add the subscriber to cluster one by one.

#4 Be a "minimalist"
Sales people tend to sell all the "bells and whistles" to customer. Sure that's the selling point. But as an engineer, if you want to get the job done smoothly, try to start with the minimum.

Example A: Use TCP instead of TLS.
Sure we want the security of TLS. But don't try to run before you can walk. Make sure the product is working before attempting TLS. If it didn't work with TLS, you know where the problem is.

Example B: Use simple passwords.
Sure we want the security of a long, complex passwords (how about 1024-character long?). But for installation and troubleshooting purpose, keep it short and simple (don't use special characters)

Example C: Build a simple test bed.
I've seen some integrators tried to deploy their first CUPS/CUPC installation over the VPN (because they are not onsite). This is a bad idea unless VPN is what you want to test. If something didn't work, you won't know if it's the VPN or CUPC.

Same for the computers. Instead of testing on a computer with bunch of custom-installed software, you'd better test on a clear/fresh-installed computer. Stick with Windows XP. Stay away from Vista, unless you understand what is UAC, Windows Defender (or offender?), and other security "features".

2 comments:

  1. Great article! Made me purchase your book ;)
    I have an odd issue at a client that is using CUPC and and CUPS. I have everything up and running but I am losing presence status when a user tries to use CUPC over a Juniper VPN. The vpn setup is pretty much default and the soft phone still works. Just curious if you have seen this before. Jeremy

    ReplyDelete
  2. CUPC not getting presence over VPN is a very common issue.

    Depends on the length of the contact list, presence information might be sent in very large SIP packets (e.g. as large as 10k). Most VPN device (including Cisco ASA) won't be able to handle such a large packet in older versions.

    You may upgrade the VPN to the latest version. If that didn't fix the problem, we'll need sniffer capture.

    Michael

    ReplyDelete