The EZ Way: 2009

2009-05-06

Thoughts on SDLC

I've been reading a great deal lately on testing and development methodologies. Certainly all the rage these days are such buzzwords as Test Driven Development (TDD) and agile/RAD programming. Exactly what these terms mean have varied interpretations and I think there are few development shops that implement them precisely as any given description (except, perhaps, their own) presents it. Anyway, in reading blogs, books, Wikipedia posts, etc. I have tried to assimilate, filter and aggregate in order to come up with my own development methodology, admitting that I can't claim to be an expert in the field given that many others have spent a great deal more time thinking about the subject. Still, I feel entitled to an opinion so here goes.

First of all I want to say that I do in principle agree with the one week SDLC cycle emphasized in Agile and RAD methods. It's my experience that the waterfall SDLC concept is generally not effective. Insisting on a set of rigorous and complete requirements up front I feel is a near guarantee of project failure. Requirements change constantly, at least on every project I've been on. More to the point, we learn about the requirements in the process of trying to implement them. Things we thought up front sounded like great ideas just don't look so good as implementations. So a continuous analyze/prototype/evaluate cycle makes sense. Get ideas, see if they work and if they do make those the requirements. I think this is efficient and effective, and given the popularity of Agile methods so, apparently, do many others.

I do think that it is important to ensure that the rapid Agile cycles don't become excuses for cutting corners, though. One thing, for instance, that I keep reading in descriptions of Agile methods is "involves less documentation". I don't buy that. Why, just because we have quick cycles, do we get away with less documentation? This just sounds like a ploy by developers, famous for not wanting to document to get away with it. I feel that every week the team should be spending time keeping running documentation up to date. If they can design and program as they go, they can document as they go.

Basically the way I see it is that in Agile, one shouldn't be abandoning the steps in traditional Waterfall SDLC. One should just be compressing them all into a single week. So, planning/analysis/design/coding/testing/implementation/maintenance (well, replace implementation with prototyping, that may become implementation) gets done every week. No cutting corners in any of these steps should be allowed. The only difference is that the project is broken into bite sized steps that can be accomplished in weekly cycles. I think this is doable and practical.

The "less documentation" paradigm is especially odd given the emphasis on testing in Agile methods, specifically unit testing. Testing is another area notoriously neglected by software developers and Agile testing methods were specifically proposed to address this deficiency. Still, I'm concerned that simply declaring that every method, and ideally every logical input type to the method, needs to be tested does not guarantee good, effective tests. It's easy to toss together a quick test for a method and accomplish little more than not testing at all.

So, here's my proposal. I think that there should be some kind of guidelines such that for every unit of time spent programming, the thing must of us programmers enjoy doing most and would do all day if we were allowed, there should be some proportional unit of time devoted to these less popular activities, testing and documentation. I'm going to, as an initial estimate, propose 2:2:1. So for every 2 units (say 2 hours) of programming there should be an equal 2 units of testing and 1 unit of documentation. One might make the objection that such rigid rules are unreasonable, that every day is different and requires adjustments to the times on different tasks. So projects might have complex programming requirements but not need as much testing and documentation. I don't buy that. If a programming task was complex then it logically requires proportional testing to validate that complexity and proportional documentation to understand it. That seems reasonable to me so I'm sticking to my guns on that.

The proportions themselves I will leave negotiable since I would need to try to put it into practice and reevaluate. They seem reasonable to me on the surface but I believe in empirical evidence so I'm keeping an open mind. The testing requirement I feel pretty confident in since it seems very natural that the complexity of the testing can't be less than the complexity of the coding. In fact, maybe the testing time should be higher. Testing, let's face it, is hard, involved work. I don't think that developers should spend the same amount of time documenting as on these other two tasks, though. The primary job of a developer is not writing documentation, although it is a critical aspect of their job. Besides, even 2:1 between coding and documentation is much more than is typical (e.g. 0). A team could get very nice documents out of such an effort, I think. I doubt it should be much less, though. While not the primary role of the developer, we don't want to de-emphasize the importance of the task.

So, there's my proposal in a nutshell. I hope to think, and write, more about the subject as time goes by. Maybe I'll completely change my mind. These are admittedly initial thoughts. I definitely plan to focus on this area more in this blog, though. After all, algorithms are fun, but this is what really matters in real life software development.

2009-05-05

Horspool's Algorithm

Today, another algorithm. Admittedly this one is not likely to be of much use to many programmers as an actual implementation. Horspool's algorithm, the subject of today's post, is nothing more than a pretty efficient way to search a string for a pattern. Obviously unless your project is to write a new programming language API then you already have such facilities ready at hand. In fact I'd be willing to bet that whatever API you're using has an algorithm similar to this to do the job.

The point here is that the purpose of these exercises are not to reinvent the wheel. They are to understand how things work, under the hood so to speak. I, for one, like that. I feel that it is the sign of a good programmer to have that sort of curiosity, to not just know how to do things but how things work. For my own selfish reasons, I find that I never really understand something until I write about it. So, acknowledging that this post might not have a huge audience, at least I'll get something out of it.

Anyway, about Horspool's algorithm. Obviously the simplest way to do a string pattern search is just to start at the beginning of both string and pattern and do a character by character comparison. Then, if a mismatch is found in any character, shift the pattern one position forward in the string and do the comparison again. Really, this is probably efficient enough for most purposes, should a programmer actually need to implement such an algorithm for some reason. For large searches through many large strings something better might be worth it, though.

Obviously (hopefully) I'm not going to be so foolish as to post such a trivial algorithm as the simple version described above, so there is an optimization, Horspool's that is. (One day I'll have an algorithm named after me.) Firstly, Horspool's starts by aligning the string and pattern as described above. Then it does it's comparison starting at the end of the pattern instead of the start. Now, the key insight is that if a mismatch is found the pattern can be shifted more than a single character. For instance if the final character, the first checked, does not match and also is not contained anywhere in the pattern then the pattern can be shifted the whole length of the pattern forward in the string. If, on the other hand, there is a mismatch but the character does exist elsewhere in the pattern then the pattern can be shifted forward the distance necessary to align that character in the pattern with the string character just checked. Furthermore this is true of any character mismatch anywhere in the string. The exact same shifting rule applies.

The algorithm will continue in this way until either all characters are found to match or the start position of the pattern within the string is such that the end of the pattern would be beyond the end of the string. in that case a "no match" indicator is returned (-1 in my version).

Note that an algorithm like this would not be worth it in code that just does low frequency single character comparisons. The simple algorithm is plenty good enough for that. In fact the computation of the shift table might erase any potential efficiency gain. For large comparisons, though, say a grep on a set of large files this algorithm could make a substantial difference. Note that the shift table is calculated only once and so can be amortized across multiple searches. Of course, as I mentioned before, in the real world we already have grep at our disposal but I'm sure whatever greppers one uses have a similar implementation to this (or maybe even more efficient versions, which exist, although those are quite a bit more complex and might be considered not worth it even for api vendors. Evidence of this is that Horpool published his algorithm after the more efficient, but similar, version. Apparently he saw a need for a simplification).

Anyway, here is my Ruby implementation. I tried to include a good many comments to make the logic easy to follow. Hopefully I succeeded. Enjoy.

class Horspool
 def findIn(str, pattern)
  @shiftTable=Hash.new(nil) #will contain shift amounts for all valid chars
  @pattern=pattern
  calcShift #buld the shift table
  i=0 #pos in str
  j=pattern.size-1 #pos in pattern
  shift=pattern.size #initialize shift to maximum posible
  while i+pattern.size-1<str.size #Pattern doesn't extend past end of str
   #Use shift table lookup to shift the precomputed amount minus dist from 
   #end of pattern (which is (pattern.size-1)-j), if that's less then
   #previously computed shift make it the current shift
   newShift=@shiftTable[str[i+j,1]]-((pattern.size-1)-j)
   (newShift<shift)?shift=newShift:nil
   if str[i+j]==pattern[j] #str/pattern char match
    j-=1 #moving backward through pattern
    if j<0 #Made it all the way through pattern, must match
     return i
    end
   else
    i+=shift #Start comparing at new post in str
    j=pattern.size-1 #reset j to end of pattern
    shift=pattern.size
   end
  end
  return -1 #No match
 end

private
 
 def calcShift
  #Precompute to shift whole pattern length for chars not in pattern
  for i in 'a'..'z' 
   @shiftTable[i]=@pattern.size
  end
  #Shift distance from end of pattern-1 for chars in pattern
  for i in 0... @pattern.size-1
   @shiftTable[@pattern[i,1]]=(@pattern.size-i)-1
  end
 end
end

if __FILE__ == $0
 print Horspool.new.findIn("ericzetterbaum","zett")
end

2009-05-03

Java/Tomcat Web Services

I've decided I want to have my Java algorithms library available as a web services API. This post will describe how I set about to accomplish that.

First of all, I must give credit where it is due. Most of the information on how to do this for Tomcat (my container of choice right now) came from this document by Peter Yeung. Mostly the steps laid out in that document were sufficient for my project, although there were a few gotchas that took me some time to work out. I think he was using an older version of the Java Web Services Developer Pack. It looks like there have been a few changes, including the introduction of bugs!

So, to start, I downloaded the above mentioned JWSDP as noted above. This definitely makes the job of creating the web service infrastructure easier. It has utilities to generate the WSDL and a bunch of SOAP utility classes, stuff for serialization and such. Who would want to deal with all that? Not me! (Indeed, one could create highly functional web services without ever knowing WSDL using this an other similar utilities out there.)

The actual programming steps are then quite simple. First, one needs to create an interface to define the functionality of web service. For my Warshall algorithm, it looks like this:

package algorithms.server;
import java.rmi.Remote;
import java.rmi.RemoteException;

public interface Warshall extends Remote{
 public PathMatrix getPathMatrix(byte[][] adjacencyMatrix) 
  throws RemoteException;
}

Then I created a WarshallImpl class that implements that interface (I won't bother posting it here since it's just a POJO, basically the Warshall.java I posted earlier, but I will make the whole project available to download.) Also I created the PathMatrix class rather than just returning the matrix as an array since I want to be able to send a status back to the client, like if it made an unreasonable request.

Now, you need two XML files. One is a JaxRPC web service description file that the JWSDP deploy utility will use to get its necessary information. It should be named jaxrpc-ri.xml. It looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<!-- configuration file for JWSDP wsdeploy tool -->
<webServices xmlns="http://java.sun.com/xml/ns/jax-rpc/ri/dd" version="1.0"
targetNamespaceBase="http://com.test/wsdl"  
       typeNamespaceBase="http://com.test/types"  
>
<!-- the endpoint name becomes the service name in the WSDL -->
<endpoint name="warshallService"
interface="algorithms.Warshall"
implementation="algorithms.WarshallImpl"/>
<endpointMapping endpointName="warshallService" urlPattern="/warshallService"/>
</webServices>

Then you need a deployment descriptor. This was the first gotcha from Peter Yeung's instructions. He said to make an empty web.xml, i.e. just web-app tags with no content. This doesn't work. The utility will generate some kind of null pointer exception. After some Googling, I came up with this solution:

<?xml version="1.0" encoding="UTF-8"?>
<web-app>
<session-config>
<session-timeout>60</session-timeout>
</session-config>
</web-app>

Put both the web.xml and raxrpc-ri.xml in a config directory under your project root, and put the java source in a src directory. Also make a build directory, a dist directory and a web directory. One might choose to simplify this by keeping the configuration files in the web/WEB-INF directory, and building straight to to web/WEB-INF/classes. Or, just make a nice ant script to automate it all. I'm in the process of doing the latter, which I'll include with my project when I upload it. This will make it all very easy. In the meantime, after building the project I have to copy the conf and build contents into the web/WEB-INF and web/WEB-INF/classes directories.

Now we get to the magic parts. First, made a war file from the web directory:

jar -cvf build/myWEB-INF.war -C web/ .

Then I ran
%JWSDP_HOME%/jaxrpc/bin/wsdeploy.bat -verbose -o dist/algorithms.war build/myWEB-INF.war assuming that %JWSDP_HOME% is the install directory of JWSDP. This makes a new war file that is the web service app itself, that can then be deployed to Tomcat. Just dump this war file in the Tomcat webapps folder and the web service is deployed. That's it, done. Oh, do make sure you put the needed .jar files from the JWSDP distribution into the Tomcat lib. This is %JWSDP_HOME%\jaxrpc\lib, %JWSDP_HOME%\saaj\lib, %JWSDP_HOME%\jwsdp-shared\lib and %JWSDP_HOME%\jaxp\lib\endorsed.

OK, so now we have a functional web service. We can check that it works like so: http://localhost:8082/algorithms/warshallService?WSDL. That shows us our WSDL, which we didn't have to bother to write.

Now, how about a client? JWSDP also contains an automatic client maker. Nice! It generates all the client stub classes for us. All that's needed is another simple XML configuration file to give it instructions. For my client, it looks like this:

<?xml version="1.0"?>
<configuration xmlns="http://java.sun.com/xml/ns/jax-rpc/ri/config">
<!-- WSDL URL and generated package name -->
<wsdl location="http://localhost:8082/algorithms/warshallService?WSDL" packageName="clientStub"></wsdl>
</configuration>

I named it /config/wscompile_config.xml under the project root.

Then I ran:

%JWSDP_HOME%/jaxrpc/bin/wscompile -d build -gen:client -keep config/wscompile_config.xml

That generates all the client stub classes. Finally, make a jar of it:

jar -cvf lib/wsstub.jar -C build clientStub/

OK, now we're good to go... mostly. This was the other big gotcha. To actually use this we will need the necessary jars from the JWSDP distribution. The one's we need are from: jwsdp-shared\lib, \fastinfoset\lib and \jaxrpc\lib. Now I come to the biggest gotcha of this project You also need the saaj jars, but the ones included with the JWSDP dist DIDN'T WORK. It seems like there is some kind of incompatibility with some of the other jars, or maybe something. But I kept getting a class cast exception, specifically:

java.lang.ClassCastException: com.sun.xml.internal.messaging.saaj.soap.ver1_1.Message1_1Impl cannot be cast to com.sun.xml.messaging.saaj.soap.MessageImpl

This one drove me crazy. Finally I solved it by downloading the lib files directly from the SAAJ (SOAP with Attachments API for Java) project, https://saaj.dev.java.net/. That indeed solved the problem. And I was done with my client. My source, the part I had to write myself, looks like this:

package algorithms.client;
import java.rmi.RemoteException;
import java.util.Random;

import javax.xml.rpc.ServiceException;

import clientStub.*;

public class WarshallClient {
 public static void main (String[] args) {
  WarshallService warshallService = new WarshallService_Impl();
  Warshall warshall = null; 
  try {
   warshall = warshallService.getWarshallPort();
  } catch (ServiceException e) {
   e.printStackTrace();
  }
  try {
   PathMatrix pm = warshall.getPathMatrix(new byte[][]{{0,0,0,1},{1,0,1,1},{1,0,0,1},{1,0,1,0}});
   System.out.println();
   System.out.printf("Status=%s\n",pm.getStatus());
   System.out.print("Path matrix:");
   for(int i=0;i<pm.getPathMatrix().length;i++) {
    System.out.println();
    for(int j=0;j<pm.getPathMatrix().length;j++) {
     System.out.print(pm.getPathMatrix()[i][j] + ",");
    }
   }
  } catch (RemoteException e) {
   e.printStackTrace();
  }
 }
}

WarshallService, WarshallService_Impl and Warshall were all generated automatically by the wscompile utiltiy. Warshall.java is just a stub for the actual web service. The rest is just to test that it's actually working. And it does! Now, making web services is about as easy as making the POJOs. Nice! Now I can add my algorithms API to the cloud! I'm sure thousands of programmers are waiting anxiously. :)

Anyway, I'll post the whole source as soon as I have a chance to clean it up a little. In the meantime, I hope my experience proves useful to others.

2009-05-01

Java Interview Questions

Here are a few of the more notable Java programming questions I've had at interviews lately.

Q: How do sets and lists differ in handling multiple nulls?

A: In a Set, there will have be single null, while in a List there will be as many nulls as were inserted. This is because sets consolidate equal inputs into a single entry (no duplicates) while lists allow multiples of the same value. BTW, this comes from the mathematical definition of sets and lists.

Q: Given the following: Set s = new HashSet<String>(); what is the behavior of this set? For instance what happens if I add, say, an Integer?

A: The Set will contain the Integer. No compiler or runtime errors will occur. Whether or not a collection is a generic depends on the declaration type, not the assignment type.

Q: In an HttpSession, if one adds a HashMap as a session attribute in a request, then in another request retrieves the HashMap and adds another entry: is it necessary to set the attribute again for yet another request to retrieve the new HashMap entry?

A: Yes. If you don't then the HashMap will appear to the subsequent request as if you haven't updated the HashMap. In other words, if one modifies the object in a session attribute, the attribute needs to be re-set with the object.

Q: Are there memory leaks in Java?

A: Not in the traditional sense as they are understood in C programming, for instance. By that I mean that one doesn't need to worry about explicitly freeing memory after allocating it. The garbage collector takes care of that. There are programming habits that lead to effective memory leakage, however. For example one might have a private instance variable that is initialized in, say, a constructor. That variable might not be referenced anywhere else in the class but as long as an application retains a reference to the class object itself the variable can't be garbage collected. Now this is probably really an example of the bad programming technique of using instance variables when a local variable will do but, still, it illustrates how allocated memory can stay allocated even though it is not reachable. In a more complex example it might not be as obvious as in this simplified version.

2009-04-25

Floyd's Algorithm (Ruby)

I've got another algorithm post today. My latest is Floyd's algorithm, again in Ruby. My ultimate goal is to publish a large library of algorithms in Java, Ruby and C. Gonna have to really get moving on that project. I'll have the Java version of this guy shortly.

Floyd's algorithm is very closely related to Warshall's (as I previously presented here and here). Indeed it's sometimes called the Floyd-Warshall algorithm since Floyd apparently got the idea for it from Warshall's algorithm. It's also often called the all-pairs shortest-paths problem which is the descriptive version.

Floyd's algorithm is more commonly seen in the literature (a Google on Warshall is in fact brings this algorithm higher that Warshall's own). The reason for this is that it's useful in more applications. One often wants to know how far it is to something, say a network resource, rather than just whether or not one can get there.

The way Floyd's algorithm works is so. We start with a weight matrix of a directed graph, which is like the adjacency matrix except that it contains "weights" of edges representing distances between nodes. As with Warshall's algorithm one then iterates through intermediate nodes, call them k, to find the distance of the path between each node i,j through k. If it is smaller than the distance already calculated through a previous k, and smaller than the weight (distance) of any edge connected i to j, then the current distance in the distance matrix is replaced with the new, shorter, distance. Note that in my version I initialize the distance matrix with the weight matrix so we always compare the distance paths with, at least, the direct distance between the nodes. We might call this k=0 (or maybe k=-1 would be better given my 0 indexed arrays).

One thing I didn't note in my discussion of Warshall's algorithm is that both these algorithms are examples of dynamic programming. In dynamic programming one builds a final result from intermediate results. I this case each intermediate result is the intermediate node, k. Since we iterate through each k we build up successively all possible paths from all nodes using all possible intermediate nodes thus comparing all paths between all nodes to find the smallest.

OK, so here's the algorithm. Enjoy. (Oh, I included a randomized again, so its easy to generate lots of examples.)

class Floyd #aka all-pairs shortest paths algorithm
 def initialize(weightMatrix=nil,oo=256) #256?
  @weightMatrix=weightMatrix
  @oo=oo #needed to set a weight considered "infinite"
 end
  
 attr_accessor :oo
 
 attr_accessor :weightMatrix
 
 def getDistanceMatrix
  numNodes=@weightMatrix[0].length
  distanceMatrix=Array.new(@weightMatrix)
  
  for k in 0...numNodes
   for i in 0...numNodes
    #Optimization: if no path from i to k, no need to test k to j
    if distanceMatrix[i][k]<@oo then 
     for j in 0...numNodes
      distance = distanceMatrix[i][k] + distanceMatrix[k][j]
      if distance<distanceMatrix[i][j] then
       distanceMatrix[i][j]=distance
      end
     end
    end
   end
  end
  distanceMatrix
 end

 def showDistanceMatrix
  puts "weight"
  @weightMatrix.each{|c| print c.to_s.gsub(@oo.to_s,"oo"),"\n"}
  puts "distance"
  getDistanceMatrix.each{|c| print c.to_s.gsub(@oo.to_s,"oo"),"\n"}
 end

end
  
#tests:
if __FILE__ == $0
 floyd=Floyd.new
 oo=floyd.oo
 floyd.weightMatrix=[[0,oo,3,oo],[2,0,oo,oo],[oo,7,0,1],[6,oo,oo,0]]
 floyd.showDistanceMatrix
 dim=10 #randomize out a big graph
 weightMatrix=Array.new(dim){Array.new(dim)}
 for i in 0...dim do
  for j in 0...dim do
   if i==j then
    weightMatrix[i][j]=0
   elsif rand(dim/2)==0 then
    weightMatrix[i][j]=rand(9)+1
   else
    weightMatrix[i][j]=oo
   end
  end
 end
 floyd.weightMatrix=weightMatrix
 floyd.showDistanceMatrix
end

2009-04-19

Warshall's Algorithm in Ruby

Among my current projects is to learn Ruby. Although my background is almost entirely in statically typed languages I feel that to be a complete programmer I also need to be comfortable in a dynamically typed language. Ruby is definitely interesting. I'm not sure I'm sold on the value of dynamic typing. It seems to deny one the opportunity of letting the language itself help with the programming task, not to mention being much less amenable to static analysis and the such. Still, one might make the case that most of the problems static typing are designed to prevent are one's good programmers shouldn't make very often anyway and that one can simply code faster in statically typed languages since one doesn't need to worry about such minor issues as... knowing the type at design time. Ruby definitely allows one to be very terse with the syntax, do complex things simply and hence program fast. And we all want to program fast, I'm sure.

Anyway this post wasn't really meant to be a discussion of the relative merits of static vs. dynamic typing, or even on why Ruby is cool (although my example below does illustrate the terseness of Ruby relative to Java). It was meant to present my first program in Ruby, a translation of my Java Warshall's algorithm implementation. Actually it's not a direct translation since I added a small optimization. I realized that one doesn't need to bother with the j loop if (i,k)=0. If you can't get from i to k at all you definitely can't get form i to j through k. This needs to be added to the Java implementation, which I'll get around to eventually. Note that this improvement changes the algorithm's best case (a completely unconnected graph) time complexity to Θ(n)=n². Average and worst cases remain Θ(n)=n³.

I don't have a test case for this since I haven't yet learned how to do test cases in Ruby. OK, I'm sure I could rig one up myself, but I don't see the point. Instead I just have a script at the bottom that displays the path matrix for various adjacency matrix. I also added a little random adjacency matrix generator so that one can see more examples of the algorithm at work. One of the interesting parts of that was to choose reasonable edge densities to get a nice number of paths (using n/2 as the randomizer's cutoff seems to work well).

So, here's the algorithm:


class Warshall
 def initialize(adjacencyMatrix)
  @adjacencyMatrix=adjacencyMatrix
 end
 
 def getPathMatrix
  numNodes=@adjacencyMatrix[0].length
  pathMatrix=Array.new(@adjacencyMatrix)
  for k in 0...numNodes
   for i in 0...numNodes
    #Optimization: if no path from i to k, no need to test k to j
    if pathMatrix[i][k]==1 then 
     for j in 0...numNodes
      if pathMatrix[k][j]==1 then
       pathMatrix[i][j]=1
      end
     end
    end
   end
  end
  return pathMatrix
 end

 def showPathMatrix
  puts "adjacency"
  @adjacencyMatrix.each{|c| print c,"\n"}
  puts "path"
  getPathMatrix.each{|c| print c,"\n"}
 end 

end
#tests:
if __FILE__ == $0
 adjMatrix = [[0,1],[1,0]]
 Warshall.new(adjMatrix).showPathMatrix
 adjMatrix = [[0,1,0],[0,0,1],[0,0,0]]
 Warshall.new(adjMatrix).showPathMatrix
 adjMatrix = [[0,0,0,1],[1,0,1,1],[1,0,0,1],[1,0,1,0]]
 Warshall.new(adjMatrix).showPathMatrix
 dim=10 #randomize out a big graph
 adjMatrix=Array.new(dim){Array.new(dim)}
 for i in 0...dim do
  for j in 0...dim do
   if rand(dim/2)==0 then
    adjMatrix[i][j]=1
   else
    adjMatrix[i][j]=0
   end
  end
 end
 Warshall.new(adjMatrix).showPathMatrix
end #tests

2009-04-10

Warshall's Algorithm in Java

For fun (I have a nerdy definition of fun, I admit) I decided the other day, hanging out at Peet's Coffee, to bang out Warshall's algorithm in Java. Ah, the life of an unemployed programmer.

Before I present it, I suppose I should explain a little bit about what Warshall's algorithm does, in case there some readers who have never heard of it (like me less that a year ago). I'm not going to go deep, it being a well established and easily Googleable subject. For instance, this page does a better job than I'm likely to do in any case.

Here it is in a nutshell. Given a directed graph find out which nodes have paths between them. This is also known as the transitive closure of the graph for those who prefer the more impressive sounding mathematical terminology. For Warshall's algorithm one starts with the adjacency matrix of the graph, which is a matrix in which all nodes with direct connections (edges) between them are identified with a 1, and all other pairs of nodes with a 0. Warshall's algorithm generates from this data a new matrix with a 1 for all nodes that are connected by a path, 0 for all others. By "path" I mean that by traveling from node to adjacent node (those with an edge between them) one can arrive at the other node in a pair of nodes. Since this is a directed graph one must follow the direction of the edge in question. Think of it as a game of catch where each player only throws to certain other players. The adjacency matrix would identify which players throw to specific other players; a 1 row 1 column 2 if player 1 throws to player 2, a 0 otherwise. It's quite possible, based on this scenario, that some players never get the ball thrown to them. A fair PE teacher could implement Warshall's algorithm to identify that situation. If only more PE teachers understood algorithms the world would be a more equitable place. (One interesting side note, row 1 column 1 will always be 0 in the adjacency matrix since a player never throws the ball to himself, but it could be 1 in the path matrix since hopefully the player can get the ball back... or maybe not!)

The way Warshall's algorithm works is like this. Enumerating through each node numbered by k, starting with k=0 (i.e. node #0), decide which node pairs i,j have paths through node k. This can be determined by evaluating wither or not a path from i to k and from k to j already exists on the path matrix, which was initialized as the adjacency matrix. If it does, then the path from i to j can be added to the path matrix. Our paths will thus grow organically as we iterate through k. When we done we will have every path identified in our matrix since every path with every intermediate node will have been identified.

Anyway, now that I've probably confused everybody (Google "directed graph" to become un-befuddled) here's the Java code. I've also included a test case class which I think helps to understand it. So, without further ado:

package com.blogspot.ezetter.algorithms;

public class Warshall {
 public byte[][] getPathMatrix(byte [][] adjacencyMatrix) { 
  int numberOfNodes=adjacencyMatrix[0].length;
  byte [][] pathMatrix = new byte[numberOfNodes][numberOfNodes];
  System.arraycopy(adjacencyMatrix,0,pathMatrix,0,numberOfNodes);
  for(int k=0;k<numberOfNodes; k++) {
   for(int i=0;i<numberOfNodes; i++) {
    if (pathMatrix[i][k]==0){
     continue;
    }
    for(int j=0;j<numberOfNodes; j++) {
     if(pathMatrix[i][k]==1 &&
      pathMatrix[k][j]==1) {
       pathMatrix[i][j]=1;
     }
    }
   }
  }
  return pathMatrix;
 }
}

And the test case:

package com.blogspot.ezetter.algorithms;
import junit.framework.TestCase;

public class WarshallTests extends TestCase {
 private byte[][] adjacencyMatrix = 
  {{0,0,0,1},{1,0,1,1},{1,0,0,1},{1,0,1,0}};
 private byte[][] expectedPathMatrix = 
  {{1,0,1,1},{1,0,1,1},{1,0,1,1},{1,0,1,1}};
 private byte[][] adjacencyMatrix2 = 
  {{0,1,0},{0,0,1},{0,0,0}};
 private byte[][] expectedPathMatrix2 = 
  {{0,1,1},{0,0,1},{0,0,0}};
 
 public void testGetPathMatrix() {
  byte[][] pathMatrix = new Warshall().getPathMatrix(adjacencyMatrix);
  for(int i=0;i<adjacencyMatrix.length;i++) {
   for(int j=0;j<adjacencyMatrix.length;j++) {
    assertEquals("Path matrix at i=" + i + ", j=" + j + ":",
     expectedPathMatrix[i][j],pathMatrix[i][j]);
   }
  }   
  pathMatrix = new Warshall().getPathMatrix(adjacencyMatrix2);
  for(int i=0;i<adjacencyMatrix2.length;i++) {
   for(int j=0;j<adjacencyMatrix2.length;j++) {
    assertEquals("Path matrix at i=" + i + ", j=" + j + ":",
     expectedPathMatrix2[i][j],pathMatrix[i][j]);
   }
  }   
 }

2009-04-08

Spring MVC vs Struts Part 2

This Part 2 of my experience switching from Struts to Spring MVC. Part 1 is here.

Now, before I get attacked by a Spring expert I am not claiming that this as a "best practices" post. I'm just describing my experience with Spring MVC and the solution I came up with. I would truly welcome a Spring veteran to come along and explain the right way to do it.

What I wanted was to retrieve certain information from the model and inject it into the view whether one was coming to the controller from the form submission or from any other source, like a link. At first I tried using the formBackingObject since it is always called but since one doesn't have access to the ModelAndView there that doesn't work well. I tried using handleRequest which is also always called but since I was therefore overriding the method from a superclass, which calls onSubmit, I was masking that functionality and causing my onSubmit to be uncalled. I then tried using showForm, which is called to display the form. This ended up being my final, relatively satisfying, solution. It did leave one problem. I wanted to return to the same form after submission but that proved tricky to do from onSubmit. My solution was to call showForm from onSubmit and then return the ModelAndView returned by showForm.

This has been working pretty well, although not entirely aesthetically satisfying. I wish it was simpler but as with many things, once you figure it out, it's not too bad. If you doubt it's complex check out the workflow in the Javadoc. One has to consider both the workflow in the class and all it's superclasses. On the plus side this give one a great deal of control over how a request is handled.

But when my headache reaches a certain threshold, I long for Struts.

Note: I will post the actual source soon, in a more detailed description of this project.

2009-04-06

Spring MVC vs Struts

I mentioned a few posts ago that I recently switched from Struts 2 to Spring MVC as my web development platform for a current project. I also mentioned that so far I don't regret my decision although I have some reservations. What I like about Spring MVC is that 1. it's more closely integrated with Spring, and Spring is most assuredly a worthy invention and 2. it does have in certain respects more flexibility than Struts. I'll probably have more to say on those strengths later but for now the point is that that's also Spring MVCs weakness. With flexibility often comes complexity and the Spring MVC controller model is definitely complex relative to Struts'.

Struts has a very tight and simple coupling between the view and the controller. In Webwork (the predecessor to Struts 2) it was extremely simple. You registered your controllers in xwork.xml and referenced them in the actions of your forms and your controller's execute would be automatically called by Webwork on each form submit. A get request directly from a URL is handled in exactly the same way. In Struts 2 it's similar, although a little more flexible. In the struts.xml file one can configure specific methods of the controller to be called, so you aren't just stuck with execute. That was a nice addition.

In Spring MVC it's more complex. First of all Spring MVC mirrors the HTTPServlet doGet/doPost methods more closely as opposed to Struts higher level of abstraction. But Spring adds more methods to handle different cases like onSubmit for form sumission, showForm to display the view and formBackingObject to link the model to the view. These are all chained together depending on the way the controller is entered, say a URL get vs. a form post. Contrast this with the simple description for Struts I gave above. Just one method called, configurable in the struts.xml, no matter how the form is called.

Where I ran into trouble was to be able to have identical things happen whether a user came to the controller via a link or a form submission. I want to check if the user is logged in and I want to retrieve certain information from the model whether this is the initial display of the view or a form submission. The problem is, which methods to use? There are so many to choose from and some don't get called in certain circumstances. Do I use doPost, handleRequest, formBackingObject, showForm or some combination thereof?

Since this post is getting a little too long I think my solution is going to have to be in a "to be continued".

So, stay tuned.

2009-04-05

Halting "Problem" Redux

A while back I posted as an April fools joke: a Java version of the Turing Halting Problem. I recently learned that for Turing halting wasn't a "problem" at all, quite the opposite. The term was rather added by a later commentator (who's name I will post as soon as I look it up) who wanted to express the theorem in modern day computing terms, where infinite loops are generally considered bad things. For Turing, who's purpose was to determine the limits of decidability, that was no problem and in fact programs that didn't halt were considered "good" programs. If an algorithm could keep indefinitely chugging away at a problem and could give an answer, assuming infinite time, then said problem is "decidable". Computing pi, the square root of 2, 1/3... those are all decidable problems. The reason we can't get an exact answer is simply not having enough time (i.e. not having infinite time) to do so. It's not any fundamental problem with the calculation. The halting problem, on the other hand, is inherently "undecidable" since it embodies a paradox.

When Turing first posited the Universal Machine it was in order to construct an example of an undecidable problem; a Universal Machine can't determine whether or not it will halt. His purpose at the time was not to invent the computer. That use of the formulation was applied later (just a little) by many commentators, including Turing himself. So the invention of the Universal Turing Machine, of which all computers and programming languages are examples, was like so many other great discoveries an accident.

An accident that changed everything.

_{*Actually Turing's presentation was even more esoteric than that. He postulated that every machine could be designated a number that described it (not so far fetched since real computer programs are stored as binary numbers: computer programs are ultimately just numbers). So Turing was proving that no number can exist that determines if that number is itself "satisfactory", i.e. whether or not it can be computed. Remember, this was a mathematical, not a practical, problem. As fate has it, math problems have a tendency to become practical.}

2009-04-04

The ORM Paradigm

I recently gave a presentation for a job interview on the value of Object Relational Mapping tools and, specifically, JPA. For the already enlightened it does contain technical information on using JPA which might still be of interest, I've decided JPA is the best ORM implementation yet.

Say to your programming environment: "be the database".

The presentation can be found at this link.

2009-04-03

Proof of Euclid's GCD Algorithm

In an advanced algorithms course I recently took the first algorithm covered, as I imagine is the case in many such courses, was Euclid's greatest common divisor method. Euclid's gcd is often considered the earliest known algorithm. Does that make Euclid the first programmer? (Note: Euclid's method, invented over 2000 years ago, is still very efficient even by today's standards and is enormously more powerful then the method you probably learned in middle school.) Euclid's gcd is based on the following observation:

gcd(m,n) = gcd(n, m mod n)

This equality is repetitively applied until a final condition, gcd(m,0), is reached for which case m is taken to be the gcd of m, n.

Now, although we weren't asked to prove this for my course, at the time I was inspired to attempt it. After lengthy pondering I wasn't able and eventually gave up disappointed. Recently I revisited the problem and was able to see the solution pretty quickly. I guess my math skills are improving (or so I like to think). Below is my proof. You will have to take my word for it that I didn't Google it (I didn't, I swear.)

Starting with the final condition, which is pretty trivial, if m mod n = 0 then by definition m is divisible by n so n is the gcd. Next we need to show that we get to that final condition, i.e. that we don't get stuck at m mod n != 0. Again, this is pretty simple. m mod n < n necessarily. Since we keep feeding m mod n into the procedure as the new n, n gets smaller each iteration. Since m mod 1 = 0 for all m, we must reach 0 eventually.

The hard part for me before was solving the general equality, gcd(m,n) = gcd(n, m mod n). I now find this pretty simple too and am even a bit embarrassed that I didn't come up with it before. Observe that if m and n are divisible by x then m-n is also divisible by x: (m-n)/x =(m/x)-(n/x). Likewise, m-an is also divisible by x for any a. Since the definition of m mod n is m-an for the largest a where m-an>0 it follows that any common divisor of m and n is also a common divisor of m mod n. So if x is the gcd for m and n, it is also the gcd of m mod n (since m mod n is necessarily smaller than both m and n).

QED

If any reader sees a flaw in my reasoning, please reply. I really want to be sure of this.

2009-04-02

Static Analysis Tools

I'm evaluating static analysis tools for my Master's program at Cal Lutheran University. I'm finding the subject intriguing. Static analysis programs test code for potential flaws and are contrasted with dynamic testing tools like JUnit. They check code for deviations from established best practices, good design standards and possible serious issues. In my experience so far these tools generally do a good job. At the minimum they are useful for learning. One might not even always agree with the results but they are often interesting from an academic standpoint. It's especially fun to run them on popular open source projects. It's therapeutic for the ego to realize how pervasive certain bad practices are. By that token, running them on one's own projects might help one avoid such embarrassments.

The best of breed in the Java space are: FindBugs, PMD and Checkstyle. Each of the tools have different focuses, Checkstyle being oriented towards formatting conventions for instance (although also evaluating for more serious issues). One interesting thing is that they all seem to find different problems so they work well in conjunction. They each have IDE integration, specifically Eclipse plugins. Also they each have build tool integration with ANT. I think a development environment can benefit greatly by using both these features: use them in the IDE for real time preemption and then integrate into the automatic builds to ensure clean code across the team (I dub the concept Bug Brother). Few could argue the benefits of clean code for bug avoidance and maintenance. Best practices are in many cases well established across the industry but even good programmers can find it difficult to keep them all straight and to maintain the discipline to apply them. Tools like these can keep us honest. Let's use them.

Here's a sample Ant build.xml snippet from my project (the one I hope to share more on soon):


 <taskdef name="findbugs" 
          classpath="${findbugs.home}/lib/findbugs.jar" 
          classname="edu.umd.cs.findbugs.anttask.FindBugsTask" />

    <target name="findbugs" depends="build">
       <findbugs home="${findbugs.home}" output="xml"
                     outputFile="bcel-fb.xml">
            <sourcePath path="${src.dir}" />
            <class location="${build.dir}" />
            <auxClasspath path="${web.dir}/WEB-INF/lib/spring.jar" />
            <auxClasspath path="${web.dir}/WEB-INF/lib/spring-webmvc.jar" />
            <auxClasspath path="${web.dir}/WEB-INF/lib/persistence.jar" />
            <auxClasspath path="${web.dir}/WEB-INF/lib/commons-logging.jar" />
            <auxClasspath path="${web.dir}/WEB-INF/lib/servlet-api.jar" />
       </findbugs>
    </target>
 
<taskdef resource="checkstyletask.properties"
         classpath="${checkstyle.home}/checkstyle-all-5.0-beta2.jar"/>
   
<target name="checkstyle"
        description="Generates a report of code convention violations.">

  <checkstyle config="sun_checks.xml"
              failureProperty="checkstyle.failure"
              failOnViolation="false">
    <formatter type="xml" tofile="checkstyle_report.xml"/>
    <fileset dir="${src.dir}" includes="**/*.java"/>
  </checkstyle>
</target>
 
<path id="pmd.classpath">
    <pathelement location="${build}"/>
    <fileset dir="${pmd.home}/lib/">
        <include name="*.jar"/>
    </fileset>
</path>

<taskdef name="pmd" classname="net.sourceforge.pmd.ant.PMDTask" 
classpathref="pmd.classpath"/>

<target name="pmd">
      <pmd>
  <ruleset>basic,imports,unusedcode,strings</ruleset>
          <formatter type="net.sourceforge.pmd.renderers.HTMLRenderer"
 toFile="pmdout.html"/>
          <fileset dir="src">
              <include name="**/*.java"/>
          </fileset>
      </pmd>
</target>
   
<target name="analysis" depends="build,findbugs,checkstyle,pmd"/>

My April Fools Program

I meant to post this yesterday but, oldest excuse in the book, got busy. Anyway, not wanting to waste the opportunity and not willing to wait a year I'm posting it now. It is an example using a new feature of Java reflection. Here goes:

package april.fools;

public class Thp {
 
 Thp(Object input) {
  while(Class.doesHalt(this.getClass(),input));
 }
 
 public static void main(String [] args) {
  try {
   new Thp(Class.forName("april.fools.Thp"));
  } catch (ClassNotFoundException e) {
   e.printStackTrace();
  }
 }
}

I hope you appreciate it.

2009-03-31

EDVAC, von Neumann, Turing

The machine on the right is EDVAC, the first based on von Neumann's architecture ever designed.^* I put this image here as a reference point for the rest of my blog. The speed with which that machine has evolved into what you're using right now, these gadgets that have infiltrated pretty much every aspect of our lives, is simply astounding. The most amazing thing is that the basic architecture hasn't changed much. You are using, essentially, a von Neumann machine right now. Sure you might have quad cores, fancy pipelines and other newfangled stuff in your CPU (which, admittedly, are modifications of VNA possibly making your computer slightly non-von Neumann) but beyond those details it's mostly a von Neumann machine. It's that big hunk of metal on the right, just (much) faster and smaller.

The main concept behind a von Neumann architecture is the stored program, the idea that data and instructions can be treated equivalently. Therefore programs can write, read and execute other programs by treating them as data. One thereby gets a kind of generality such that anything that can be made into instructions (really, anything that you could "instruct" somebody to do) can be done by such a machine by reading in those instructions as data and running them using it's internal (von Neumann) architecture designed to do exactly that.

In fact von Neumann wasn't the first to come up with this idea. The credit for the invention really goes to Alan Turing, a man to whom we owe a debt of gratitude for many reasons (including playing a huge role in helping defeat the Nazis through his code breaking work). His invention, purely a thought experiment until the 1940s, is now called the Universal Turning Machine. It introduced the concept stored program, that is the data/instruction equivalence, in mathematical terms. So the von Neumann architecture can be considered the first architectural implementation of a Universal Turing Machine.

We modern programmers have become so accustomed to our routine that we don't even stop to marvel. We write coded instructions as simple character data. That data is read by another program, itself made with similar data constructs, which either interprets our code or compiles into another instruction set and ultimately reduces our own instructions to a different set of (this time binary) instructions that a von Neumann machine can interpret and follow. This was Turing's idea in a nutshell, his purely mathematical concept, that millions of programmers now implement daily to do real work.

So what I'm trying to impress upon the reader is this: that all this stuff we take for granted, these online social networks, the Wii, your ipod, whatever, they all have their source with these guys and some other intellectual giants who came up with these ideas over a short period a little more than 50 years ago. It's an amazing story, and they are some fascinating characters to boot.

The question still to be answered is, how far will this VNA/UTM stuff take us? These men thought it would lead to genuinely intelligent machines with all the capacities of human thought. Whether this will happen is still unknown (I have opinions on this that I'll share later). It has certainly taken longer than they thought it would. Then again I don't think they envisioned ATMs, online shopping, Twittering, DVRs, Ipods, WoW, netbooks, texting, blogs... yikes.

^* Although not the first built. Due to budgetary constraints that distinction goes to a British team that built the Manchester Mark I in 1949 [1]. The EDVAC itself became operational in 1951.

2009-03-30

A Project Log (plog)

In the course of these postings I intend to update readers on the progress of a project I am undertaking. The gist of my project is to create a web application to facilitate user creatable online courses. I intend it to become a synthesis of an online university, a blog and a social networking site. As such it will contain functionality for creating course materials in the form of blog-type postings, designing quizzes both multiple choice and free form and common social network features similar to those found on linkedin, twitter, etc. I think it's an ambitious project and I don't know where it will end up but it will at least be a good learning experience. I plan to release it under GPL and since I will be sharing my step by step design experiences it should be easily implementable by anybody who wishes to steal my idea (giving me deserved credit, no doubt).

Anyway I have already started on the project but I will in the course of posting about it back up and start from the beginning thereby giving the full design cycle. It will be a Java web application using Spring MVC as the framework and JPA for data access. I started the project on Struts 2, with which I'm already familiar, but decided to switch to Spring since I wanted to learn it. So far I haven't regretted it although there are things I like about Struts that I miss in Spring MVC (more later) and it has been a steep learning curve. But, hey, learning is what it's all about.

So, in subsequent posts I will lay out my basic design and then move through implementation details as I encounter (encountered) them. Hopefully it will be an interesting experience for the reader, as it has been for me.

2009-03-29

Cilly C

I'm mostly a Java programmer with a smattering of other languages thrown in. I've recently come to the position that any programmer worthy of the title should be at least comfortable in C/C++. I have long forgotten most of my C so I am now making the effort to relearn it.

I will here relate my experience with my first C program in this curriculum. Since I despise Hello World programs I decided, instead, to write a program to compute and display a Fibonacci sequence. BTW, I love Fibonacci numbers, but that's for another post.

The basic program was simple enough. For those who don't know a Fibonacci sequence consists of all numbers that are the sum of the previous two numbers in the sequence with a starting condition of 0, 1. So one just needs a simple loop to perform that calculation.

Where it got interesting is when I decided I wanted my sequence to be 50 numbers long. One of the properties of Fibonacci numbers is that they get really big really fast (they were originally invented to model rabbit population explosions.) So a little before 50 one gets an integer overflow and nasty twos complement artifacts.

Now, being a Java programmer, I say "no big deal, I'll just use a long." How naive. Silly me, I had forgotten that C doesn't give any guarantees about the size of primitives across implementations. gcc apparently treats a long as a 32 bit integer, just like an int so I still got exactly the same behavior.

Digging through the gcc compiler documentation I found the solution. There's a data type long long int. I love it. At least those C compiler writers have a sense of humor. Anyway it works great... unless of course you want 100 Fibonacci numbers.

Here's my final program.


#include 

int main(void) {
 int i=0;
 long long int fib1=0LL;
 long long int  fib2=1LL;
 long long int temp=0LL; 
 printf("%d\n",fib1);
 printf("%d\n",fib2);
 for(i=2;i<93;i++) { //93 is overflow
  temp=fib2;
  fib2+=fib1;
  fib1=temp;
  printf("%lld\n",fib2);
 }
 return 0;
}

Inaugural Post

This is the inaugural post of my new personal and professional blog. I plan to focus primarily on information technology and computing issues. From the perspective of a professional blog I will describe my own projects as well as my views on trends in information technology. I will also examine computing from a historical perspective, speculate on the future and make observations of a more academic nature in the field of Computer Science.

I am passionate about information technology and Computer Science. The power of these machines and the ways they have permeated our lives is phenomenal. I want to track how this came about and so there will be a number of postings of a historical nature, examining the giants of the field and their ideas and creations. I think most people have little idea just how remarkable a journey this has been. Also speculating on where this will all end up can only give one pause. It's my firm belief that we have barely scratched the surface of what information processing and computing machines, aka Turing Machines, are capable.

Finally I'm an actively practicing technologist. From that perspective this will be a professional blog. I am working on a number of projects and I would like to describe them for those who are interested and who might find my experiences useful in their own professional endeavors. I work mostly as a Java and Oracle programmer although I am lately branching out into new fields. More on that will follow.

Welcome. I hope you find something interesting or useful in these pages.

The EZ Way