I have been running into a serious issue with Avaya Voice portal the past several month where Avaya (AVP) will freeze a caller resulting in a call to the customer that is ‘Dead-Air’ then the caller gets hung up on, or sometimes the dead-air lasts long enough where the caller just hangs up.

While investigating this issue, I have come across several interesting.

From one of the Avaya MPP servers that are servicing callers, there is an Apache server with 10 servers being instantiated to service requests. These requests are retrieving VXML content from a remote Tomcat server, and prompt resources from another remote Apache server.

The first thing I have been is monitoring how many sockets the Avaya application is using this command:

netstat -ant | awk '{print $6}' | sort | uniq -c | sort -n; echo httpd processes: [`ps aux | grep httpd | wc -l`]

This is what I have been getting:

      1 established)
      1 Foreign
      1 LAST_ACK
      3 SYN_SENT
      7 LISTEN
    124 TIME_WAIT
    170 CLOSE_WAIT
    227 ESTABLISHED
    httpd processes: [10]

The issue I have with this is the sheer number of CLOSE_WAIT threads.

I was digging and found a reference to others having issues:
https://issues.apache.org/bugzilla/show_bug.cgi?id=50615

I have been tracking this for two months now, and notice there is not a hung CLOSE_WAIT as I have described in another blog of mine XXX. But they do seem to last for quite a while. I have estimated the CLOSE_WAIT’s last for around 30 seconds before going back into TIME_WAIT and this was from a verbal discussion I had with an Avaya engineer.

I started looking at what communication the MPP was having with an individual Tomcat server and found:

$ netstat -an | grep -c "xxx.xxx.xxx.xxx:8080"; netstat -an | grep "xxx.xxx.xxx.xxx:8080"
23
tcp        0      0 xxx.xxx.xxx.xxx:45525         yyy.yyy.yyy.yyy:8080           TIME_WAIT
tcp        0      0 xxx.xxx.xxx.xxx:45186         yyy.yyy.yyy.yyy:8080           TIME_WAIT
tcp        1      0 xxx.xxx.xxx.xxx:45830         yyy.yyy.yyy.yyy:8080           CLOSE_WAIT
tcp        0      0 xxx.xxx.xxx.xxx:45836         yyy.yyy.yyy.yyy:8080           TIME_WAIT
tcp        1      0 xxx.xxx.xxx.xxx:45832         yyy.yyy.yyy.yyy:8080           CLOSE_WAIT
tcp        1      0 xxx.xxx.xxx.xxx:45864         yyy.yyy.yyy.yyy:8080           CLOSE_WAIT
tcp        0      0 xxx.xxx.xxx.xxx:45889         yyy.yyy.yyy.yyy:8080           TIME_WAIT
tcp        1      0 xxx.xxx.xxx.xxx:45918         yyy.yyy.yyy.yyy:8080           CLOSE_WAIT
tcp        1      0 xxx.xxx.xxx.xxx:45932         yyy.yyy.yyy.yyy:8080           CLOSE_WAIT
tcp        1      0 xxx.xxx.xxx.xxx:45930         yyy.yyy.yyy.yyy:8080           CLOSE_WAIT
tcp        1      0 xxx.xxx.xxx.xxx:45967         yyy.yyy.yyy.yyy:8080           CLOSE_WAIT
tcp        1      0 xxx.xxx.xxx.xxx:45989         yyy.yyy.yyy.yyy:8080           CLOSE_WAIT
tcp        0      0 xxx.xxx.xxx.xxx:45997         yyy.yyy.yyy.yyy:8080           TIME_WAIT
tcp        1      0 xxx.xxx.xxx.xxx:46003         yyy.yyy.yyy.yyy:8080           CLOSE_WAIT
tcp        0      0 xxx.xxx.xxx.xxx:46022         yyy.yyy.yyy.yyy:8080           ESTABLISHED
tcp        0      0 xxx.xxx.xxx.xxx:46031         yyy.yyy.yyy.yyy:8080           ESTABLISHED
tcp        0      0 xxx.xxx.xxx.xxx:46028         yyy.yyy.yyy.yyy:8080           ESTABLISHED
tcp        0      0 xxx.xxx.xxx.xxx:46037         yyy.yyy.yyy.yyy:8080           ESTABLISHED
tcp        0      0 xxx.xxx.xxx.xxx:45587         yyy.yyy.yyy.yyy:8080           TIME_WAIT
tcp        1      0 xxx.xxx.xxx.xxx:45743         yyy.yyy.yyy.yyy:8080           CLOSE_WAIT
tcp        1      0 xxx.xxx.xxx.xxx:45758         yyy.yyy.yyy.yyy:8080           CLOSE_WAIT
tcp        0      0 xxx.xxx.xxx.xxx:45766         yyy.yyy.yyy.yyy:8080           TIME_WAIT
tcp        1      0 xxx.xxx.xxx.xxx:45801         yyy.yyy.yyy.yyy:8080           CLOSE_WAIT

Now I ran the inverse from the target Tomcat server to find far less connections, and most where in FIN_WAIT_2 state.

C:\Documents and Settings\MKnuts6173c>netstat -an | grep -c "yyy.yyy.yyy.yyy"
17

C:\Documents and Settings\MKnuts6173c>netstat -an | grep "yyy.yyy.yyy.yyy"
  TCP    yyy.yyy.yyy.yyy:8080      xxx.xxx.xxx.xxx:45801    FIN_WAIT_2
  TCP    yyy.yyy.yyy.yyy:8080      xxx.xxx.xxx.xxx:45930    FIN_WAIT_2
  TCP    yyy.yyy.yyy.yyy:8080      xxx.xxx.xxx.xxx:45932    FIN_WAIT_2
  TCP    yyy.yyy.yyy.yyy:8080      xxx.xxx.xxx.xxx:45967    FIN_WAIT_2
  TCP    yyy.yyy.yyy.yyy:8080      xxx.xxx.xxx.xxx:46022    FIN_WAIT_2
  TCP    yyy.yyy.yyy.yyy:8080      xxx.xxx.xxx.xxx:46028    FIN_WAIT_2
  TCP    yyy.yyy.yyy.yyy:8080      xxx.xxx.xxx.xxx:46037    FIN_WAIT_2
  TCP    yyy.yyy.yyy.yyy:8080      xxx.xxx.xxx.xxx:46056    FIN_WAIT_2
  TCP    yyy.yyy.yyy.yyy:8080      xxx.xxx.xxx.xxx:46067    ESTABLISHED
  TCP    yyy.yyy.yyy.yyy:8080      xxx.xxx.xxx.xxx:46071    FIN_WAIT_2
  TCP    yyy.yyy.yyy.yyy:8080      xxx.xxx.xxx.xxx:46107    ESTABLISHED
  TCP    yyy.yyy.yyy.yyy:8080      xxx.xxx.xxx.xxx:46118    ESTABLISHED
  TCP    yyy.yyy.yyy.yyy:8080      xxx.xxx.xxx.xxx:46121    ESTABLISHED
  TCP    yyy.yyy.yyy.yyy:8080      xxx.xxx.xxx.xxx:46134    ESTABLISHED
  TCP    yyy.yyy.yyy.yyy:8080      xxx.xxx.xxx.xxx:46137    ESTABLISHED
  TCP    yyy.yyy.yyy.yyy:8080      xxx.xxx.xxx.xxx:46145    ESTABLISHED
  TCP    yyy.yyy.yyy.yyy:8090      xxx.xxx.xxx.xxx:46560    FIN_WAIT_2
  TCP    yyy.yyy.yyy.yyy:8090      xxx.xxx.xxx.xxx:46941    FIN_WAIT_2

This appears that Avaya is opening sockets to remote resources, holding the sockets until the entire call is completed, then sending the final ACK

Reference to others having issues: https://issues.apache.org/bugzilla/show_bug.cgi?id=50615

A normal TCP connection in terms of protocol looks as follows:

1. The client sends a SYN to the server.
2. The server responds with a SYN and ACK to the client.
3. The client responds with an ACK to the server.

Connection is now established and data transfer takes place (the steps above are known as a 3 way handshake).

When the server is closing the connection, the following sequence takes place:

4. The server sends a FIN and an ACK to the client.
5. The client sends an ACK to the server.
6. The client sends its own FIN and ACK to the server
7. The server sends and ACK to the client.

(NOTE that if it is the client that closes the connection, then you should replace server with client and client with server in steps 4 – 7 since a close on the connection can be called from both sides).

Connection is now closed and the ports are reset and are available for the next connection. You will see below a Network Monitor trace that hopefully will illustrate the above better.

So, where does the CLOSE_WAIT and FIN_WAIT_2 states come into play in the scenario above? In the scenario above the ports are in this state after step 5.

On the client the port will be in CLOSE_WAIT:

TCP xxx.xxx.xxx.xxx:45801 yyy.yyy.yyy.yyy:8080 CLOSE_WAIT

On the server the port will be in FIN_WAIT_2:

TCP yyy.yyy.yyy.yyy:8090 xxx.xxx.xxx.xxx:46941 FIN_WAIT_2

The client has just sent the first ACK and the server is waiting for the FIN and ACK from the client. This state is normally very short lived, But for some reason on our servers, they are lasting quite a long time.

Conclusion

The Avaya Voice browsers seem to manage Threads in a very messy way and talking with the Avaya engineers, they state there is nothing wrong with their application but cannot seem to explain why this is occurring and also do not seem to actually know how Sockets work. I was not able to find an Avaya engineer that understood (admittedly) how socket worked nor what CLOSE_WAIT and FIN_WAIT_2 even meant.

I am very worried about the MPP’s in production and I feel this issue is directly related to several different call related errors including ‘dead-air’, ‘dropped-calls’ and ‘caller session-reset’ and am still investigating these issues even though I feel I have no hope to fix the issues, just address them and mitigate these issues on Avaya’s behalf. How nice of me right 😉

Mick Knutson

Java, JavaEE, J2EE, WebLogic, WebSphere, JBoss, Tomcat, Oracle, Spring, Maven, Architecture, Design, Mentoring, Instructor and Agile Consulting. http://www.baselogic.com/blog/resume

View all posts

Categories

BLiNC Adsense