Task Dispatcher

A lightweight Java library for running a set of tasks over a cluster of multicore computers, with the minimum of fuss.

  • Easy to use
  • Intended for coarse grained parallelism
  • Takes advantage of multiple cores on each machine
  • Robust to machine failure
  • Works for me

Hosted on github at https://github.com/ggovan/taskdispatcher/.

Read the JavaDoc.

Provides a lightweight library for distributing a set of jobs over a cluster of multicore computers.

The ClusterStub allows for dynamic adding and removal of machines from the pool of machines connected to it. This allows for reissuing of jobs when a machine fails. Jobs are issued to machines on the basis of the number of free processing elements that they have.

To use this library:

  1. You must first create a class that implements the Job interface.
  2. Then create a TaskDispatcher:

    • The ClusterDispatcher is to be used on a cluster of machines.
    • The ThreadedDispatcher is an alternate dispatcher to be used on a single multicore machine.
  3. Jobs are then added to the TaskDispatcher using AbstractTaskDispatcher.addJob(Job).
  4. Calling AbstractTaskDispatcher.start() will cause the TaskDispatcher to serialise the jobs and issue them to the remote machines.
  5. The jobs will then be executed, and upon completion, serialised and sent back to dispatching machine.
  6. Only once all the jobs have been completed will the start method return.

If the task dispatcher is to be used for a second set of jobs, then call the AbstractTaskDispatcher.newGeneration() method to clear the list of jobs.

Once all the work has been completed call the AbstractTaskDispatcher.end() to close any open sockets and stop any threads.

Other things of note:

  • ssh is used to start the ClusterStubs on remote machines. It should therefore be set up to use keypairs that do not require a password to be entered upon login.
  • As it uses ssh to start the remote machines, it assumes some sort of Linux base. If you are running this on a cluster of Windows machines it is possible to start the ClusterStubs by hand, or to create another tool to automate it.
  • Be careful what you reference from you Jobs, as this will all be serialised when sent across the network. Null out any fields that shouldn't be needed.