The EZ Way: April 2009

2009-04-25

Floyd's Algorithm (Ruby)

I've got another algorithm post today. My latest is Floyd's algorithm, again in Ruby. My ultimate goal is to publish a large library of algorithms in Java, Ruby and C. Gonna have to really get moving on that project. I'll have the Java version of this guy shortly.

Floyd's algorithm is very closely related to Warshall's (as I previously presented here and here). Indeed it's sometimes called the Floyd-Warshall algorithm since Floyd apparently got the idea for it from Warshall's algorithm. It's also often called the all-pairs shortest-paths problem which is the descriptive version.

Floyd's algorithm is more commonly seen in the literature (a Google on Warshall is in fact brings this algorithm higher that Warshall's own). The reason for this is that it's useful in more applications. One often wants to know how far it is to something, say a network resource, rather than just whether or not one can get there.

The way Floyd's algorithm works is so. We start with a weight matrix of a directed graph, which is like the adjacency matrix except that it contains "weights" of edges representing distances between nodes. As with Warshall's algorithm one then iterates through intermediate nodes, call them k, to find the distance of the path between each node i,j through k. If it is smaller than the distance already calculated through a previous k, and smaller than the weight (distance) of any edge connected i to j, then the current distance in the distance matrix is replaced with the new, shorter, distance. Note that in my version I initialize the distance matrix with the weight matrix so we always compare the distance paths with, at least, the direct distance between the nodes. We might call this k=0 (or maybe k=-1 would be better given my 0 indexed arrays).

One thing I didn't note in my discussion of Warshall's algorithm is that both these algorithms are examples of dynamic programming. In dynamic programming one builds a final result from intermediate results. I this case each intermediate result is the intermediate node, k. Since we iterate through each k we build up successively all possible paths from all nodes using all possible intermediate nodes thus comparing all paths between all nodes to find the smallest.

OK, so here's the algorithm. Enjoy. (Oh, I included a randomized again, so its easy to generate lots of examples.)

class Floyd #aka all-pairs shortest paths algorithm
 def initialize(weightMatrix=nil,oo=256) #256?
  @weightMatrix=weightMatrix
  @oo=oo #needed to set a weight considered "infinite"
 end
  
 attr_accessor :oo
 
 attr_accessor :weightMatrix
 
 def getDistanceMatrix
  numNodes=@weightMatrix[0].length
  distanceMatrix=Array.new(@weightMatrix)
  
  for k in 0...numNodes
   for i in 0...numNodes
    #Optimization: if no path from i to k, no need to test k to j
    if distanceMatrix[i][k]<@oo then 
     for j in 0...numNodes
      distance = distanceMatrix[i][k] + distanceMatrix[k][j]
      if distance<distanceMatrix[i][j] then
       distanceMatrix[i][j]=distance
      end
     end
    end
   end
  end
  distanceMatrix
 end

 def showDistanceMatrix
  puts "weight"
  @weightMatrix.each{|c| print c.to_s.gsub(@oo.to_s,"oo"),"\n"}
  puts "distance"
  getDistanceMatrix.each{|c| print c.to_s.gsub(@oo.to_s,"oo"),"\n"}
 end

end
  
#tests:
if __FILE__ == $0
 floyd=Floyd.new
 oo=floyd.oo
 floyd.weightMatrix=[[0,oo,3,oo],[2,0,oo,oo],[oo,7,0,1],[6,oo,oo,0]]
 floyd.showDistanceMatrix
 dim=10 #randomize out a big graph
 weightMatrix=Array.new(dim){Array.new(dim)}
 for i in 0...dim do
  for j in 0...dim do
   if i==j then
    weightMatrix[i][j]=0
   elsif rand(dim/2)==0 then
    weightMatrix[i][j]=rand(9)+1
   else
    weightMatrix[i][j]=oo
   end
  end
 end
 floyd.weightMatrix=weightMatrix
 floyd.showDistanceMatrix
end

2009-04-19

Warshall's Algorithm in Ruby

Among my current projects is to learn Ruby. Although my background is almost entirely in statically typed languages I feel that to be a complete programmer I also need to be comfortable in a dynamically typed language. Ruby is definitely interesting. I'm not sure I'm sold on the value of dynamic typing. It seems to deny one the opportunity of letting the language itself help with the programming task, not to mention being much less amenable to static analysis and the such. Still, one might make the case that most of the problems static typing are designed to prevent are one's good programmers shouldn't make very often anyway and that one can simply code faster in statically typed languages since one doesn't need to worry about such minor issues as... knowing the type at design time. Ruby definitely allows one to be very terse with the syntax, do complex things simply and hence program fast. And we all want to program fast, I'm sure.

Anyway this post wasn't really meant to be a discussion of the relative merits of static vs. dynamic typing, or even on why Ruby is cool (although my example below does illustrate the terseness of Ruby relative to Java). It was meant to present my first program in Ruby, a translation of my Java Warshall's algorithm implementation. Actually it's not a direct translation since I added a small optimization. I realized that one doesn't need to bother with the j loop if (i,k)=0. If you can't get from i to k at all you definitely can't get form i to j through k. This needs to be added to the Java implementation, which I'll get around to eventually. Note that this improvement changes the algorithm's best case (a completely unconnected graph) time complexity to Θ(n)=n². Average and worst cases remain Θ(n)=n³.

I don't have a test case for this since I haven't yet learned how to do test cases in Ruby. OK, I'm sure I could rig one up myself, but I don't see the point. Instead I just have a script at the bottom that displays the path matrix for various adjacency matrix. I also added a little random adjacency matrix generator so that one can see more examples of the algorithm at work. One of the interesting parts of that was to choose reasonable edge densities to get a nice number of paths (using n/2 as the randomizer's cutoff seems to work well).

So, here's the algorithm:


class Warshall
 def initialize(adjacencyMatrix)
  @adjacencyMatrix=adjacencyMatrix
 end
 
 def getPathMatrix
  numNodes=@adjacencyMatrix[0].length
  pathMatrix=Array.new(@adjacencyMatrix)
  for k in 0...numNodes
   for i in 0...numNodes
    #Optimization: if no path from i to k, no need to test k to j
    if pathMatrix[i][k]==1 then 
     for j in 0...numNodes
      if pathMatrix[k][j]==1 then
       pathMatrix[i][j]=1
      end
     end
    end
   end
  end
  return pathMatrix
 end

 def showPathMatrix
  puts "adjacency"
  @adjacencyMatrix.each{|c| print c,"\n"}
  puts "path"
  getPathMatrix.each{|c| print c,"\n"}
 end 

end
#tests:
if __FILE__ == $0
 adjMatrix = [[0,1],[1,0]]
 Warshall.new(adjMatrix).showPathMatrix
 adjMatrix = [[0,1,0],[0,0,1],[0,0,0]]
 Warshall.new(adjMatrix).showPathMatrix
 adjMatrix = [[0,0,0,1],[1,0,1,1],[1,0,0,1],[1,0,1,0]]
 Warshall.new(adjMatrix).showPathMatrix
 dim=10 #randomize out a big graph
 adjMatrix=Array.new(dim){Array.new(dim)}
 for i in 0...dim do
  for j in 0...dim do
   if rand(dim/2)==0 then
    adjMatrix[i][j]=1
   else
    adjMatrix[i][j]=0
   end
  end
 end
 Warshall.new(adjMatrix).showPathMatrix
end #tests

2009-04-10

Warshall's Algorithm in Java

For fun (I have a nerdy definition of fun, I admit) I decided the other day, hanging out at Peet's Coffee, to bang out Warshall's algorithm in Java. Ah, the life of an unemployed programmer.

Before I present it, I suppose I should explain a little bit about what Warshall's algorithm does, in case there some readers who have never heard of it (like me less that a year ago). I'm not going to go deep, it being a well established and easily Googleable subject. For instance, this page does a better job than I'm likely to do in any case.

Here it is in a nutshell. Given a directed graph find out which nodes have paths between them. This is also known as the transitive closure of the graph for those who prefer the more impressive sounding mathematical terminology. For Warshall's algorithm one starts with the adjacency matrix of the graph, which is a matrix in which all nodes with direct connections (edges) between them are identified with a 1, and all other pairs of nodes with a 0. Warshall's algorithm generates from this data a new matrix with a 1 for all nodes that are connected by a path, 0 for all others. By "path" I mean that by traveling from node to adjacent node (those with an edge between them) one can arrive at the other node in a pair of nodes. Since this is a directed graph one must follow the direction of the edge in question. Think of it as a game of catch where each player only throws to certain other players. The adjacency matrix would identify which players throw to specific other players; a 1 row 1 column 2 if player 1 throws to player 2, a 0 otherwise. It's quite possible, based on this scenario, that some players never get the ball thrown to them. A fair PE teacher could implement Warshall's algorithm to identify that situation. If only more PE teachers understood algorithms the world would be a more equitable place. (One interesting side note, row 1 column 1 will always be 0 in the adjacency matrix since a player never throws the ball to himself, but it could be 1 in the path matrix since hopefully the player can get the ball back... or maybe not!)

The way Warshall's algorithm works is like this. Enumerating through each node numbered by k, starting with k=0 (i.e. node #0), decide which node pairs i,j have paths through node k. This can be determined by evaluating wither or not a path from i to k and from k to j already exists on the path matrix, which was initialized as the adjacency matrix. If it does, then the path from i to j can be added to the path matrix. Our paths will thus grow organically as we iterate through k. When we done we will have every path identified in our matrix since every path with every intermediate node will have been identified.

Anyway, now that I've probably confused everybody (Google "directed graph" to become un-befuddled) here's the Java code. I've also included a test case class which I think helps to understand it. So, without further ado:

package com.blogspot.ezetter.algorithms;

public class Warshall {
 public byte[][] getPathMatrix(byte [][] adjacencyMatrix) { 
  int numberOfNodes=adjacencyMatrix[0].length;
  byte [][] pathMatrix = new byte[numberOfNodes][numberOfNodes];
  System.arraycopy(adjacencyMatrix,0,pathMatrix,0,numberOfNodes);
  for(int k=0;k<numberOfNodes; k++) {
   for(int i=0;i<numberOfNodes; i++) {
    if (pathMatrix[i][k]==0){
     continue;
    }
    for(int j=0;j<numberOfNodes; j++) {
     if(pathMatrix[i][k]==1 &&
      pathMatrix[k][j]==1) {
       pathMatrix[i][j]=1;
     }
    }
   }
  }
  return pathMatrix;
 }
}

And the test case:

package com.blogspot.ezetter.algorithms;
import junit.framework.TestCase;

public class WarshallTests extends TestCase {
 private byte[][] adjacencyMatrix = 
  {{0,0,0,1},{1,0,1,1},{1,0,0,1},{1,0,1,0}};
 private byte[][] expectedPathMatrix = 
  {{1,0,1,1},{1,0,1,1},{1,0,1,1},{1,0,1,1}};
 private byte[][] adjacencyMatrix2 = 
  {{0,1,0},{0,0,1},{0,0,0}};
 private byte[][] expectedPathMatrix2 = 
  {{0,1,1},{0,0,1},{0,0,0}};
 
 public void testGetPathMatrix() {
  byte[][] pathMatrix = new Warshall().getPathMatrix(adjacencyMatrix);
  for(int i=0;i<adjacencyMatrix.length;i++) {
   for(int j=0;j<adjacencyMatrix.length;j++) {
    assertEquals("Path matrix at i=" + i + ", j=" + j + ":",
     expectedPathMatrix[i][j],pathMatrix[i][j]);
   }
  }   
  pathMatrix = new Warshall().getPathMatrix(adjacencyMatrix2);
  for(int i=0;i<adjacencyMatrix2.length;i++) {
   for(int j=0;j<adjacencyMatrix2.length;j++) {
    assertEquals("Path matrix at i=" + i + ", j=" + j + ":",
     expectedPathMatrix2[i][j],pathMatrix[i][j]);
   }
  }   
 }

2009-04-08

Spring MVC vs Struts Part 2

This Part 2 of my experience switching from Struts to Spring MVC. Part 1 is here.

Now, before I get attacked by a Spring expert I am not claiming that this as a "best practices" post. I'm just describing my experience with Spring MVC and the solution I came up with. I would truly welcome a Spring veteran to come along and explain the right way to do it.

What I wanted was to retrieve certain information from the model and inject it into the view whether one was coming to the controller from the form submission or from any other source, like a link. At first I tried using the formBackingObject since it is always called but since one doesn't have access to the ModelAndView there that doesn't work well. I tried using handleRequest which is also always called but since I was therefore overriding the method from a superclass, which calls onSubmit, I was masking that functionality and causing my onSubmit to be uncalled. I then tried using showForm, which is called to display the form. This ended up being my final, relatively satisfying, solution. It did leave one problem. I wanted to return to the same form after submission but that proved tricky to do from onSubmit. My solution was to call showForm from onSubmit and then return the ModelAndView returned by showForm.

This has been working pretty well, although not entirely aesthetically satisfying. I wish it was simpler but as with many things, once you figure it out, it's not too bad. If you doubt it's complex check out the workflow in the Javadoc. One has to consider both the workflow in the class and all it's superclasses. On the plus side this give one a great deal of control over how a request is handled.

But when my headache reaches a certain threshold, I long for Struts.

Note: I will post the actual source soon, in a more detailed description of this project.

2009-04-06

Spring MVC vs Struts

I mentioned a few posts ago that I recently switched from Struts 2 to Spring MVC as my web development platform for a current project. I also mentioned that so far I don't regret my decision although I have some reservations. What I like about Spring MVC is that 1. it's more closely integrated with Spring, and Spring is most assuredly a worthy invention and 2. it does have in certain respects more flexibility than Struts. I'll probably have more to say on those strengths later but for now the point is that that's also Spring MVCs weakness. With flexibility often comes complexity and the Spring MVC controller model is definitely complex relative to Struts'.

Struts has a very tight and simple coupling between the view and the controller. In Webwork (the predecessor to Struts 2) it was extremely simple. You registered your controllers in xwork.xml and referenced them in the actions of your forms and your controller's execute would be automatically called by Webwork on each form submit. A get request directly from a URL is handled in exactly the same way. In Struts 2 it's similar, although a little more flexible. In the struts.xml file one can configure specific methods of the controller to be called, so you aren't just stuck with execute. That was a nice addition.

In Spring MVC it's more complex. First of all Spring MVC mirrors the HTTPServlet doGet/doPost methods more closely as opposed to Struts higher level of abstraction. But Spring adds more methods to handle different cases like onSubmit for form sumission, showForm to display the view and formBackingObject to link the model to the view. These are all chained together depending on the way the controller is entered, say a URL get vs. a form post. Contrast this with the simple description for Struts I gave above. Just one method called, configurable in the struts.xml, no matter how the form is called.

Where I ran into trouble was to be able to have identical things happen whether a user came to the controller via a link or a form submission. I want to check if the user is logged in and I want to retrieve certain information from the model whether this is the initial display of the view or a form submission. The problem is, which methods to use? There are so many to choose from and some don't get called in certain circumstances. Do I use doPost, handleRequest, formBackingObject, showForm or some combination thereof?

Since this post is getting a little too long I think my solution is going to have to be in a "to be continued".

So, stay tuned.

2009-04-05

Halting "Problem" Redux

A while back I posted as an April fools joke: a Java version of the Turing Halting Problem. I recently learned that for Turing halting wasn't a "problem" at all, quite the opposite. The term was rather added by a later commentator (who's name I will post as soon as I look it up) who wanted to express the theorem in modern day computing terms, where infinite loops are generally considered bad things. For Turing, who's purpose was to determine the limits of decidability, that was no problem and in fact programs that didn't halt were considered "good" programs. If an algorithm could keep indefinitely chugging away at a problem and could give an answer, assuming infinite time, then said problem is "decidable". Computing pi, the square root of 2, 1/3... those are all decidable problems. The reason we can't get an exact answer is simply not having enough time (i.e. not having infinite time) to do so. It's not any fundamental problem with the calculation. The halting problem, on the other hand, is inherently "undecidable" since it embodies a paradox.

When Turing first posited the Universal Machine it was in order to construct an example of an undecidable problem; a Universal Machine can't determine whether or not it will halt. His purpose at the time was not to invent the computer. That use of the formulation was applied later (just a little) by many commentators, including Turing himself. So the invention of the Universal Turing Machine, of which all computers and programming languages are examples, was like so many other great discoveries an accident.

An accident that changed everything.

_{*Actually Turing's presentation was even more esoteric than that. He postulated that every machine could be designated a number that described it (not so far fetched since real computer programs are stored as binary numbers: computer programs are ultimately just numbers). So Turing was proving that no number can exist that determines if that number is itself "satisfactory", i.e. whether or not it can be computed. Remember, this was a mathematical, not a practical, problem. As fate has it, math problems have a tendency to become practical.}

2009-04-04

The ORM Paradigm

I recently gave a presentation for a job interview on the value of Object Relational Mapping tools and, specifically, JPA. For the already enlightened it does contain technical information on using JPA which might still be of interest, I've decided JPA is the best ORM implementation yet.

Say to your programming environment: "be the database".

The presentation can be found at this link.

2009-04-03

Proof of Euclid's GCD Algorithm

In an advanced algorithms course I recently took the first algorithm covered, as I imagine is the case in many such courses, was Euclid's greatest common divisor method. Euclid's gcd is often considered the earliest known algorithm. Does that make Euclid the first programmer? (Note: Euclid's method, invented over 2000 years ago, is still very efficient even by today's standards and is enormously more powerful then the method you probably learned in middle school.) Euclid's gcd is based on the following observation:

gcd(m,n) = gcd(n, m mod n)

This equality is repetitively applied until a final condition, gcd(m,0), is reached for which case m is taken to be the gcd of m, n.

Now, although we weren't asked to prove this for my course, at the time I was inspired to attempt it. After lengthy pondering I wasn't able and eventually gave up disappointed. Recently I revisited the problem and was able to see the solution pretty quickly. I guess my math skills are improving (or so I like to think). Below is my proof. You will have to take my word for it that I didn't Google it (I didn't, I swear.)

Starting with the final condition, which is pretty trivial, if m mod n = 0 then by definition m is divisible by n so n is the gcd. Next we need to show that we get to that final condition, i.e. that we don't get stuck at m mod n != 0. Again, this is pretty simple. m mod n < n necessarily. Since we keep feeding m mod n into the procedure as the new n, n gets smaller each iteration. Since m mod 1 = 0 for all m, we must reach 0 eventually.

The hard part for me before was solving the general equality, gcd(m,n) = gcd(n, m mod n). I now find this pretty simple too and am even a bit embarrassed that I didn't come up with it before. Observe that if m and n are divisible by x then m-n is also divisible by x: (m-n)/x =(m/x)-(n/x). Likewise, m-an is also divisible by x for any a. Since the definition of m mod n is m-an for the largest a where m-an>0 it follows that any common divisor of m and n is also a common divisor of m mod n. So if x is the gcd for m and n, it is also the gcd of m mod n (since m mod n is necessarily smaller than both m and n).

QED

If any reader sees a flaw in my reasoning, please reply. I really want to be sure of this.

2009-04-02

Static Analysis Tools

I'm evaluating static analysis tools for my Master's program at Cal Lutheran University. I'm finding the subject intriguing. Static analysis programs test code for potential flaws and are contrasted with dynamic testing tools like JUnit. They check code for deviations from established best practices, good design standards and possible serious issues. In my experience so far these tools generally do a good job. At the minimum they are useful for learning. One might not even always agree with the results but they are often interesting from an academic standpoint. It's especially fun to run them on popular open source projects. It's therapeutic for the ego to realize how pervasive certain bad practices are. By that token, running them on one's own projects might help one avoid such embarrassments.

The best of breed in the Java space are: FindBugs, PMD and Checkstyle. Each of the tools have different focuses, Checkstyle being oriented towards formatting conventions for instance (although also evaluating for more serious issues). One interesting thing is that they all seem to find different problems so they work well in conjunction. They each have IDE integration, specifically Eclipse plugins. Also they each have build tool integration with ANT. I think a development environment can benefit greatly by using both these features: use them in the IDE for real time preemption and then integrate into the automatic builds to ensure clean code across the team (I dub the concept Bug Brother). Few could argue the benefits of clean code for bug avoidance and maintenance. Best practices are in many cases well established across the industry but even good programmers can find it difficult to keep them all straight and to maintain the discipline to apply them. Tools like these can keep us honest. Let's use them.

Here's a sample Ant build.xml snippet from my project (the one I hope to share more on soon):


 <taskdef name="findbugs" 
          classpath="${findbugs.home}/lib/findbugs.jar" 
          classname="edu.umd.cs.findbugs.anttask.FindBugsTask" />

    <target name="findbugs" depends="build">
       <findbugs home="${findbugs.home}" output="xml"
                     outputFile="bcel-fb.xml">
            <sourcePath path="${src.dir}" />
            <class location="${build.dir}" />
            <auxClasspath path="${web.dir}/WEB-INF/lib/spring.jar" />
            <auxClasspath path="${web.dir}/WEB-INF/lib/spring-webmvc.jar" />
            <auxClasspath path="${web.dir}/WEB-INF/lib/persistence.jar" />
            <auxClasspath path="${web.dir}/WEB-INF/lib/commons-logging.jar" />
            <auxClasspath path="${web.dir}/WEB-INF/lib/servlet-api.jar" />
       </findbugs>
    </target>
 
<taskdef resource="checkstyletask.properties"
         classpath="${checkstyle.home}/checkstyle-all-5.0-beta2.jar"/>
   
<target name="checkstyle"
        description="Generates a report of code convention violations.">

  <checkstyle config="sun_checks.xml"
              failureProperty="checkstyle.failure"
              failOnViolation="false">
    <formatter type="xml" tofile="checkstyle_report.xml"/>
    <fileset dir="${src.dir}" includes="**/*.java"/>
  </checkstyle>
</target>
 
<path id="pmd.classpath">
    <pathelement location="${build}"/>
    <fileset dir="${pmd.home}/lib/">
        <include name="*.jar"/>
    </fileset>
</path>

<taskdef name="pmd" classname="net.sourceforge.pmd.ant.PMDTask" 
classpathref="pmd.classpath"/>

<target name="pmd">
      <pmd>
  <ruleset>basic,imports,unusedcode,strings</ruleset>
          <formatter type="net.sourceforge.pmd.renderers.HTMLRenderer"
 toFile="pmdout.html"/>
          <fileset dir="src">
              <include name="**/*.java"/>
          </fileset>
      </pmd>
</target>
   
<target name="analysis" depends="build,findbugs,checkstyle,pmd"/>

My April Fools Program

I meant to post this yesterday but, oldest excuse in the book, got busy. Anyway, not wanting to waste the opportunity and not willing to wait a year I'm posting it now. It is an example using a new feature of Java reflection. Here goes:

package april.fools;

public class Thp {
 
 Thp(Object input) {
  while(Class.doesHalt(this.getClass(),input));
 }
 
 public static void main(String [] args) {
  try {
   new Thp(Class.forName("april.fools.Thp"));
  } catch (ClassNotFoundException e) {
   e.printStackTrace();
  }
 }
}

I hope you appreciate it.

The EZ Way