Wednesday, July 15, 2009

An Elegant Matching Algorithm In Ruby

Recently, I wanted to sit down a learn ruby (independent of rails), so I grabbed a fairly standard hacking problem and went to town on it. I now love ruby more than ever.

The Problem:
Given a set of people and a set of jobs, where each person may take a different amount of time to do each job, optimally match people to jobs (1:1) to minimize the amount of time it will take to complete all jobs.

Many people will recognize this as a standard matching problem for which the Hungarian Algorithm is a known solution. But for those that have implemented the Hungarian Algorithm (or seen implementations of it), you know well enough to steer clear. It is error prone, and a very specific algorithm. So, I sought to implement a more elegant (and generally applicable) solution using graphs.

I found this article over on topcoder describing max-flow algorithms and the beauty of such. I fell in love and decided that I needed to solve this with max-flow.

The formulation:
Lets convert our problem to a bipartite graph. Let one set of nodes be the people, and the other set of nodes be the jobs. Create an edge from each person to each job, with a weight (NOT capacity!) equal to the time it will take that person to do that job.

In our situation, the capacity for each line is one since only one person can do a job. Flow is ofcourse initialized to zero for each edge (no one is doing any of the jobs). Lastly, we connect every job to a SINK node in the graph.

The philosophy and algorithm: (the important part)
Essentially, we'll be finding shortest paths in the graph, from each person to the SINK (via jobs) iterating through each of the people. On each iteration, we augment the graph with the path.

To recap the topcoder article, augmenting the graph consists of incrementing the flow for each edge in the augmenting path and adjusting edges to represent the new flow/capacity. There is an edge that represents "residual capacity" with capacity == capacity - flow, and there is an edge in the reverse direction that represents "upstream flow" with capacity == flow.

REMEMBER, in our case:
Capacity is ALWAYS 1.
Flow is either 1 or 0.
This is ENTIRELY independent of the cost/value/weight of an edge.

What does that mean to us you say? Well, in our case there are only two situations, a person is assigned to a job (flow == 1), or a person is not assigned ot a job (flow == 0). In the first case, where a person is assigned a job, there is an edge from the job to the person. During the algorithm, this edge essentially represents the path to UNDO the assignment. In the second case, the edge simply represents the
making that assignment.