diff options
author | Nick Hudson <nick.hudson@isode.com> | 2013-07-23 11:07:56 (GMT) |
---|---|---|
committer | Nick Hudson <nick.hudson@isode.com> | 2013-07-26 10:00:01 (GMT) |
commit | 4f743d361c2e84420d02d9c9d10304b7184b3170 (patch) | |
tree | e3fed647439a0e85e1feb685b54894bd2f0b5f27 /build.xml | |
parent | 1006276e3c78c2098d8efd5dedff0ee87e0e006f (diff) | |
download | stroke-4f743d361c2e84420d02d9c9d10304b7184b3170.zip stroke-4f743d361c2e84420d02d9c9d10304b7184b3170.tar.bz2 |
Re-implement JavaConnection to use Selector
When investigating problems on Solaris, attention focused on the
JavaConnection class, whose implementation appeared to be non-optimal.
The original implementation had a loop which operated on a
non-blocking socket, and looked something like this:
while (!disconnecting) {
while (something to write) {
write data to socket;
if write failed {
sleep(100); // and try again
}
}
try reading data from socket
if (any data was read) {
process data from socket;
}
sleep(100);
}
Because the socket is non-blocking, the reads/writes return straight
away. This means that even when no data is being transferred, the
loop is executing around ten times a second checking for any data to
read/write.
In one case (Solaris client talking to Solaris server on the same VM)
we were consistently able to get into a state where a write fails to
write any data, so that the "something to write" subloop never exits.
This in turn means that the "try reading data" section of the main
loop is never reached.
Investigation failed to uncover why this problem occurs. The
underlying socket appears to be returning EAGAIN (equivalent to
EWOULDBLOCK), suggesting that the write fails because the client's
local buffer is full. This in turn implies that the server isn't
reading data quickly enough, leading to the buffers on the client side
being full up. But this doesn't explain why, once things have got
into this state, they never free up.
At any rate, it was felt that the implementation above is not ideal
because it is relying on a polling mechanism that is not efficient,
rather than being event driven.
So this change re-implements JavaConnection to use a Selector, which
means that the main loop is event-driven. The new implementation
looks like this
while (!disconnected) {
wait for selector
if (disconnected) {
break;
}
if something to write {
try to write data;
}
if something to read {
try to read data;
}
if still something to write {
sleep(100);
post wake event; // so that next wait completes straight away
}
}
Test-information:
Testing appears to show that the problems we saw on Solaris are no
longer seen with this patch (Solaris tests still fail, but later on,
which appears to be due to a separate problem).
Testing shows that this leads to the thread spending much more time
idle, and only being active when data is being read/written (unlike
the original implementation which was looping ten times a second
regardless of whether any data was being read/written).
Testing using MLC seems to show the new implementation works OK.
I was unable to provoke the "write buffer not completely written"
case, so faked it by making the doWrite() method constrain its maximum
write size to 200 bytes. By doing this I verified that the "leftOver"
section of code was working properly (and incidentally fixed a problem
with the the initial implementation of the patch that had been passing
the wrong parameter to System.arrayCopy).
Change-Id: I5a6191567ba7e9afdb9a26febf00eae72b00f6eb
Signed-off-by: Nick Hudson <nick.hudson@isode.com>
Diffstat (limited to 'build.xml')
0 files changed, 0 insertions, 0 deletions