RE: [jade-develop] The Amazing Dying Container, and messages sent in afterMove()


Subject: RE: [jade-develop] The Amazing Dying Container, and messages sent in afterMove()
From: John J. Mikucki (jjm7570@cs.rit.edu)
Date: Thu Aug 01 2002 - 20:55:00 MET DST


Thus spake cefn.hoile@bt.com (cefn.hoile@bt.com):

> I wonder whether you just have a memory overflow, associated with one or
> more threads or VMS. If this is too obvious, ignore me.

I suppose it's possible... but the non-main containers only take up about 26MB,
and the systems hosting them have 130 or so MB free, before touching their swap
files. THe main container, while growing from 32 to 50 (and if I leave it an
hour or two, over 70 MB!), is hosted on a machine with 4GB of RAM, 3486MB of
which are free. Needless to say, I'm not too worried about it yet. ;)

> Of course, the system should be robust to memory overflow in VM or thread.

True.

> http://www.agentcities.org/Challenge02/Proc/Papers/ch02_20_hoile.pdf

I'll go look at it--thanks!

Update: When I find broken containers, sometimes I can kill them from the RMA.
When I can (or when I forcibly kill them from the command line) I get this
message:

MessageManager(Thread[JADE Timer dispatcher,10,JADE time-critical threads]):
Retry Timer expired. Handle # messages.

or this one:

MessageManager(Thread[Deliverer-4,10,JADE Time-critical Threads]): Timer
activated. Period is 20000

I have no idea what this means, but I assume that it's to do with the JADE
internal message-delivery system. Since I timestamp all events my agents post,
I can see that when this (these) messages are delivered, my agents suddenly get
them and perform a flurry of processing before dying.

Perhaps the platform is waiting for something my agent should do, (but
doesn't), and then fails to recover gracefully?

What, if any, message-handling must/should my agent do for non-application
specific messages? I only use two (JADE Mobility and AgentManagement's
QueryAgentsOnLocation) but I could easily see my agents not following the
prescribed protocol for these, as I don't fully understand them. Essentially,
the way it's coded right now, the agent sends the request, ignores the AGREE,
and silently consumes the response and uses it. Could this be causing the
problem? It would seem odd, in that it works for multiple iterations and then
fails...but works on the main container.

FYI, I start the main container without the -gui option, and then SSH to other
machines to fire up their containers. Lastly, I load a container on my local
machine, and instruct it to start an RMA agent in addition to the ubiquitous
host agents on the net. Then, from the RMA, I fire up a dozen or so agents
which wander, do things, interact, and then suddenly fall silent. :)

John

-- 
Avoision, n:  Avoidance, as practiced in 'Joisey.



This archive was generated by hypermail 2a22 : Thu Aug 01 2002 - 20:53:17 MET DST