Carrier Packet Loss/Latency (Identification, Investigation and Escalation)

The purpose of this document is to detail the process for actions that need to be taken upon notification that a carrier is experiencing some kind of potential issue based on SLACK alerts received from Solarwinds. 


Identification

As stated above, the alerts that get triggered for potential carrier-related issues are set up in Solarwinds. When an alert triggers, the alert is posted in the SLACK channel #carrierchannels.

**Example Alert when Trigger Threshold is Met

As it stands, these alerts are set to send a notification if the packet loss to the monitored node reaches greater than 25%. An alert will be indicated by the vertical red indicator line shown on the left of the alert itself. It will also have the word "ALERT" in the notification. Other data you will find in the alerts...

  1. The name of the carrier to which the node belongs.
  2. The type of node being monitored. For example, the above alert is labeled as "BICS - SIP." All this indicates is that this device is a SIP node. Some of our carriers have separate IPs/devices that handle SIP vs. RTP. If the carrier has informed us that the node we are monitoring handles both SIP and RTP, the node will just be named and identified as "SIP."
  3. IP address of the node.
  4. Date and time the alert triggered. The times within the notification are set to the local times of the location from which the alert came. For example, the alert above originated from AVOXI's Hong Kong POP so the time in the alert reflects the local time of Hong Kong.
  5. The amount of packet loss when the alert triggered. In the example, the packet loss was 40% for this node at the time this alert triggered.
  6. Average response time of the node at the time of the alert trigger. Also note that it will show the latency variation during this time (jitter). For example, the alert above shows a latency variance from 241ms to 243ms equating to 2ms of jitter.
  7. A link to access the node in Solarwinds.

Once the node packet loss returns to a packet loss of 25% or below, a RESET notification will be received in the SLACK channel informing that the node connectivity is back with in normal, acceptable levels. Note that in the example below, a RESET notification is not only labeled as such but it also has a green, vertical indicator line.


Investigation

When a carrier alert is received in the SLACK channel, it is alerting to a potential problem that could be causing call quality issues on any and all calls that are traversing between AVOXI's telephony network and the carrier for which the alert was received. Typically if one alert is received with a subsequent reset message in short order, there is no need to perform an investigation on the alert. However, if you continue to see alerts, there is most likely an issue that needs to be investigated. 

The fact that we are receiving an alert only indicates that an issue exists but it doesn't necessarily tell you where the issue is. The first thing you would want to confirm is where the issue lies. To perform this investigation, you can log into the instance of PingPlotter from the location which the alert was received. Below are the links for each PingPlotter instances.

HK - http://10.50.2.33:7464

SA - http://10.20.31.25:7464

US - http://10.10.128.33:7464

Upon logging in, you should see a list of carriers in the left panel. There you can select the carrier and the connection you are troubleshooting.

Once you've drilled down into the connection you can double-click on the dots next to each hop in order to see the history of the of that connection over a period of time, which is defined in the upper-right hand corner of each connection bar.

One thing of note, when you see a hop that shows consistent packet loss, it may be due to ICMP packet de-prioritization. This is basically when the administrator of the router will set thresholds on how much ICMP traffic it will allow the router to respond to before it starts rejecting the requests. This is typically done to ensure that the router can continue to operate optimally without unnecessary requests causing performance problems. In this case, the packet loss that is being shown in the trace is not actually true packet loss and is not causing any issues with other traffic that is traversing that router.


Escalation


If you are able to determine that a specific hop is, in fact causing an issue there are a couple of things to consider.

First, you need to determine if the problem device is on the carrier's network (typically the last 1 or 2 nodes on the graph). The next step would be to confirm if there is a correlation between the alerts, ping plot graphs and any calls to that carrier during this time. To determine if it is affecting calls, you would need to log into the corresponding location's VoIP Monitor and pull CDRs from the time where the problem was being presented. If it is determined that the cause of alerts is having an adverse effect on calls, a Salesforce case should be opened on the carrier account and a ticket should be opened with that carrier to perform an investigation since the problem seems to be on the carrier's network.

Another scenario you may find is that a hop along the route between AVOXI and the carrier is potentially causing the problem. In this scenario, you may still want to reach out to the carrier to see if there is anything they can do. In most cases, there wouldn't be a whole lot they can do to fix this issue from a networking stand point. If they have multiple locations with other connections they may be able to move the number in an attempt to avoid the problem network/device. In the event that they cannot get around the problem, and it continues to be an issue, porting the number may be an option.

Give feedback about this article

Was this article helpful?

Have more questions? Submit a request

Updated:

February 8th, 2022

Author:

Kevin Robinson

Updated By:

Greg Buckalew

KB ID:

1214876

Page Views:

744

Tags:

Can’t find what you’re looking for?

Contact our award-winning customer care team.