I am working on a project where we have http requests from an IVR Voice browser making VXML requests. Each request is a phone caller and each new caller in our application will spawn 6 Akka actor requests for various account look-ups.

The issue I had was the concurrent new caller volume of ~20 was backing up the Akka queues and then timing out the akka requests. This was puzzling as the Dispatcher thread pool was set to a core size of 10 which seemed to spawn off 50 akka threads total.

Then we used JMeter to simulate new callers and as we started increasing up the number of callers, we noticed that Akk was not using the new threads.

The JMEter tests simulated 120 new callers with a 60 second ramp-up time and would loop for forever for a total of 300 seconds. Here is the result of that test showing only 1 thread used.

120 callers, 1 thread per actor 64-bit mac Core Due (2 cores)
# of Samples requests/sec KB/sec
/start/initDivisionalIvr.vxml 891 2.826507629 4.551670977
/welcome/spanishDivisionalIvr_rc.vxml 834 2.839313254 7.22583041
/welcome/postWelcome.vxml 822 2.818176208 16.66339157
/identify/disambiguateAniOrCed.vxml 813 2.801477581 5.709652063
/identify/gatherGetServiceInfoResult.vxml 805 2.697043303 13.15092606
TOTAL 4165 13.21257495 44.4114435

When I look at the threads that are being used, I noticed only 1 thread is active, thus why the performance is so slow.

We would fill up each Actors queue around 50-60 concurrent callers so the numbers where quite small. Not enough for our target traffic which needs be in excess of 300 concurrent callers per machine. We could just get many more machines, bu that is not the best approach.

Here is the akka configuration we had

    <akka:dispatcher id="appt-dispatcher"
                     type="executor-based-event-driven-work-stealing"
                     name="dispatcher-appointments">
        <akka:thread -pool queue="unbounded-linked-blocking-queue"
                          fairness="true"
                          core-pool-size="4"
                          max-pool-size="4"
                          keep-alive="60000"
                          rejection-policy="caller-runs-policy"></akka:thread>
    </akka:dispatcher>

    <akka:typed -actor id="appointmentActor"
                      interface="com.comcast.ivr.core.actors.AppointmentActor"
                      implementation="com.comcast.ivr.core.actors.AppointmentActorImpl"
                      timeout="${actor.appointment.timeout}"
                      scope="singleton"
                      depends-on="appointmentServiceClient,applicationProperties"
                      lifecycle="permanent">
        <akka:dispatcher ref="dispatcher-Appointment"></akka:dispatcher>
        <property name="appointmentServiceClient" ref="appointmentServiceClient"></property>
    </akka:typed>

Maybe I missed this in the documentation, but it seems that even though we created the actor as prototype, there was still only 1 instance of the Actor created, and then only 1 thread was being used because an Actor is bound to a queue. You can see that 1 thread per actor is working while the others are waiting.

In order to be able to use multiple threads per typed actor we needed to create an ActorRegistry and create multiple Actors per type.

Right now we created the actors individually. We are going to refactor this to be dynamic instead of static, but for our proof of concept, this is what we are going use:

    <akka:dispatcher id="appt-dispatcher"
                     type="executor-based-event-driven-work-stealing"
                     name="dispatcher-appointments">
        <akka:thread -pool queue="unbounded-linked-blocking-queue"
                          fairness="true"
                          core-pool-size="2"
                          max-pool-size="12"
                          keep-alive="60000"
                          rejection-policy="caller-runs-policy"></akka:thread>
    </akka:dispatcher>

    <akka:typed -actor id="appointmentActor1"
                      interface="com.comcast.ivr.core.actors.AppointmentActor"
                      implementation="com.comcast.ivr.core.actors.AppointmentActorImpl"
                      timeout="${actor.appointment.timeout}"
                      scope="singleton"
                      depends-on="appointmentServiceClient,applicationProperties"
                      lifecycle="permanent">
        <akka:dispatcher ref="appt-dispatcher"></akka:dispatcher>
        <property name="appointmentServiceClient" ref="appointmentServiceClient"></property>
    </akka:typed>

    <akka:typed -actor id="appointmentActor2"
                      interface="com.comcast.ivr.core.actors.AppointmentActor"
                      implementation="com.comcast.ivr.core.actors.AppointmentActorImpl"
                      timeout="${actor.appointment.timeout}"
                      scope="singleton"
                      depends-on="appointmentServiceClient,applicationProperties"
                      lifecycle="permanent">
        <akka:dispatcher ref="appt-dispatcher"></akka:dispatcher>
        <property name="appointmentServiceClient" ref="appointmentServiceClient"></property>
    </akka:typed>
    ... more actors omitted ...

This is the initial Actor Load balance Registry.

import java.util.Random;
import static akka.actor.Actors.registry;

public class ActorLoadBalancer {

    // TODO: Implement proper load balancing algorithm using CyclicIterator
    @SuppressWarnings("unchecked")
    public static  T actor(Class targetClass) {
        Object[] workers = registry().typedActorsFor(targetClass);
        // Routing.loadBalancerActor(new CyclicIterator(Arrays.asList(workers));
        int actorNumber = new Random().nextInt(workers.length);
        return (T) workers[actorNumber];
    }
}

We started with 4 actors for each registry at first then ran the same JMeter test with 120 callers and our numbers where an order of magnitude better

120 callers, 4 Actors 32-bit windows 2003 server running JDK 1.6 in client mode 2 cpu Core Due (4 cores total)
sampler_label aggregate_report_count aggregate_report_rate aggregate_report_bandwidth
/start/initDivisionalIvr.vxml 4692 14.97601348 24.11664671
/welcome/spanishDivisionalIvr_rc.vxml 4663 15.63133776 39.78053341
/welcome/postWelcome.vxml 4662 15.63275434 104.8341055
/identify/disambiguateAniOrCed.vxml 4607 15.65893633 31.91425794
/identify/gatherGetServiceInfoResult.vxml 4607 15.42448298 85.53283018
TOTAL 23231 74.14914092 273.2926494

Wen I looked at the thread usage, I was able to see better thread utilization for the actor threads:

But I noticed there where still some threads that where backing up so I created 3 of the actors with 10 instances, and the others stayed with 4. I can to this numbering with trial-and-error to see how many I could use. But I was still able to get another large improvement

120 callers, up to 10 Actor 32-bit windows 2003 server running JDK 1.6 in client mode 2 cpu Core Due (4 cores total)
sampler_label aggregate_report_count aggregate_report_rate aggregate_report_bandwidth
/start/initDivisionalIvr.vxml 6537 21.78194068 35.07560601
/welcome/spanishDivisionalIvr_rc.vxml 6532 21.94147819 55.83934783
/welcome/postWelcome.vxml 6532 21.94199413 146.6590149
/identify/disambiguateAniOrCed.vxml 6443 21.9398909 44.60519867
/identify/gatherGetServiceInfoResult.vxml 6441 21.69716936 119.9142785
TOTAL 32485 106.8501171 393.0813908

Here is the test I ran and noticed this thread activity

By creating more Actors, I noticed more of the threads where actually running verse wait state.

Final Actor count:
Broadcast Messages 10
Outage 10
Identify 4
GetServiceInfo 8
Payment 4
Appointment 10

This was the most actors in most any combination I could get without 1 of 2 things happening.

1. If I increased all actors evenly, the slow Actors where backing up while the faster actors where more idol so there where many threads that where not running waiting for the other actors to free up.

2. I then wanted to increase the number of slow actors to get more through-put but then after the exact number I had above, I started getting HTTP transport errors like this:

Caused by: com.sun.xml.internal.ws.client.ClientTransportException: HTTP transport error: java.net.BindException: Address already in use: connect
 at com.sun.xml.internal.ws.transport.http.client.HttpClientTransport.getOutput(Unknown Source)
 at com.sun.xml.internal.ws.transport.http.client.HttpTransportPipe.process(Unknown Source)
 at com.sun.xml.internal.ws.transport.http.client.HttpTransportPipe.processRequest(Unknown Source)
 at com.sun.xml.internal.ws.transport.DeferredTransportPipe.processRequest(Unknown Source)
 at com.sun.xml.internal.ws.api.pipe.Fiber.__doRun(Unknown Source)
 at com.sun.xml.internal.ws.api.pipe.Fiber._doRun(Unknown Source)
 at com.sun.xml.internal.ws.api.pipe.Fiber.doRun(Unknown Source)
 at com.sun.xml.internal.ws.api.pipe.Fiber.runSync(Unknown Source)
 at com.sun.xml.internal.ws.client.Stub.process(Unknown Source)
 at com.sun.xml.internal.ws.client.sei.SEIStub.doProcess(Unknown Source)
 at com.sun.xml.internal.ws.client.sei.SyncMethodHandler.invoke(Unknown Source)
 at com.sun.xml.internal.ws.client.sei.SyncMethodHandler.invoke(Unknown Source)
 at com.sun.xml.internal.ws.client.sei.SEIStub.invoke(Unknown Source)
 at $Proxy138.lookupApplicationConfigurationProperties(Unknown Source)
 at sun.reflect.GeneratedMethodAccessor163.invoke(Unknown Source)

I found this reference to the error: http://java.net/jira/browse/JAX_WS-485
But I still need to research if the Spring-ws implementation, which is the web service each of these Actors is calling has a keep-alive set or not. I know Akka has the keep-alive set to 60000/ms.

Some observations

1. When I set core-pool-size=”1″ each actor spawned only 1 thread and each thread was used more.

2. When I set core-pool-size=”2″ I noticed each actor spawned 2 threads, and each of the threads sometimes where used interchangeably, but not always.

Conclusion

All-in-all I was able to get a good amount of traffic through-put. I feel there is more that I can achieve, but I might be limited by the operation teams choice of Windows 2003 running a 32-bit client mode jvm.

I think I can do further research to see if netstat -anop tcp will show that connections are indeed kept alive, and if not, look into the Spring-ws to see why this would not be the case.

I found further reading on testing the keep-alive http://forum.springsource.org/showthread.php?37961-TCPMon-setup-for-Spring-WS&s=e49f481d08f73e3a1c2fab8a11fad5fb

Hope this helps.

Mick Knutson

Java, JavaEE, J2EE, WebLogic, WebSphere, JBoss, Tomcat, Oracle, Spring, Maven, Architecture, Design, Mentoring, Instructor and Agile Consulting. http://www.baselogic.com/blog/resume

View all posts

Java / JavaEE / Spring Boot Channel

BLiNC Supporters

BLiNC Adsense

Archives

Newsletter