2009-05-06

Thoughts on SDLC

I've been reading a great deal lately on testing and development methodologies. Certainly all the rage these days are such buzzwords as Test Driven Development (TDD) and agile/RAD programming. Exactly what these terms mean have varied interpretations and I think there are few development shops that implement them precisely as any given description (except, perhaps, their own) presents it. Anyway, in reading blogs, books, Wikipedia posts, etc. I have tried to assimilate, filter and aggregate in order to come up with my own development methodology, admitting that I can't claim to be an expert in the field given that many others have spent a great deal more time thinking about the subject. Still, I feel entitled to an opinion so here goes.

First of all I want to say that I do in principle agree with the one week SDLC cycle emphasized in Agile and RAD methods. It's my experience that the waterfall SDLC concept is generally not effective. Insisting on a set of rigorous and complete requirements up front I feel is a near guarantee of project failure. Requirements change constantly, at least on every project I've been on. More to the point, we learn about the requirements in the process of trying to implement them. Things we thought up front sounded like great ideas just don't look so good as implementations. So a continuous analyze/prototype/evaluate cycle makes sense. Get ideas, see if they work and if they do make those the requirements. I think this is efficient and effective, and given the popularity of Agile methods so, apparently, do many others.

I do think that it is important to ensure that the rapid Agile cycles don't become excuses for cutting corners, though. One thing, for instance, that I keep reading in descriptions of Agile methods is "involves less documentation". I don't buy that. Why, just because we have quick cycles, do we get away with less documentation? This just sounds like a ploy by developers, famous for not wanting to document to get away with it. I feel that every week the team should be spending time keeping running documentation up to date. If they can design and program as they go, they can document as they go.

Basically the way I see it is that in Agile, one shouldn't be abandoning the steps in traditional Waterfall SDLC. One should just be compressing them all into a single week. So, planning/analysis/design/coding/testing/implementation/maintenance (well, replace implementation with prototyping, that may become implementation) gets done every week. No cutting corners in any of these steps should be allowed. The only difference is that the project is broken into bite sized steps that can be accomplished in weekly cycles. I think this is doable and practical.

The "less documentation" paradigm is especially odd given the emphasis on testing in Agile methods, specifically unit testing. Testing is another area notoriously neglected by software developers and Agile testing methods were specifically proposed to address this deficiency. Still, I'm concerned that simply declaring that every method, and ideally every logical input type to the method, needs to be tested does not guarantee good, effective tests. It's easy to toss together a quick test for a method and accomplish little more than not testing at all.

So, here's my proposal. I think that there should be some kind of guidelines such that for every unit of time spent programming, the thing must of us programmers enjoy doing most and would do all day if we were allowed, there should be some proportional unit of time devoted to these less popular activities, testing and documentation. I'm going to, as an initial estimate, propose 2:2:1. So for every 2 units (say 2 hours) of programming there should be an equal 2 units of testing and 1 unit of documentation. One might make the objection that such rigid rules are unreasonable, that every day is different and requires adjustments to the times on different tasks. So projects might have complex programming requirements but not need as much testing and documentation. I don't buy that. If a programming task was complex then it logically requires proportional testing to validate that complexity and proportional documentation to understand it. That seems reasonable to me so I'm sticking to my guns on that.

The proportions themselves I will leave negotiable since I would need to try to put it into practice and reevaluate. They seem reasonable to me on the surface but I believe in empirical evidence so I'm keeping an open mind. The testing requirement I feel pretty confident in since it seems very natural that the complexity of the testing can't be less than the complexity of the coding. In fact, maybe the testing time should be higher. Testing, let's face it, is hard, involved work. I don't think that developers should spend the same amount of time documenting as on these other two tasks, though. The primary job of a developer is not writing documentation, although it is a critical aspect of their job. Besides, even 2:1 between coding and documentation is much more than is typical (e.g. 0). A team could get very nice documents out of such an effort, I think. I doubt it should be much less, though. While not the primary role of the developer, we don't want to de-emphasize the importance of the task.

So, there's my proposal in a nutshell. I hope to think, and write, more about the subject as time goes by. Maybe I'll completely change my mind. These are admittedly initial thoughts. I definitely plan to focus on this area more in this blog, though. After all, algorithms are fun, but this is what really matters in real life software development.

2009-05-05

Horspool's Algorithm

Today, another algorithm. Admittedly this one is not likely to be of much use to many programmers as an actual implementation. Horspool's algorithm, the subject of today's post, is nothing more than a pretty efficient way to search a string for a pattern. Obviously unless your project is to write a new programming language API then you already have such facilities ready at hand. In fact I'd be willing to bet that whatever API you're using has an algorithm similar to this to do the job.

The point here is that the purpose of these exercises are not to reinvent the wheel. They are to understand how things work, under the hood so to speak. I, for one, like that. I feel that it is the sign of a good programmer to have that sort of curiosity, to not just know how to do things but how things work. For my own selfish reasons, I find that I never really understand something until I write about it. So, acknowledging that this post might not have a huge audience, at least I'll get something out of it.

Anyway, about Horspool's algorithm. Obviously the simplest way to do a string pattern search is just to start at the beginning of both string and pattern and do a character by character comparison. Then, if a mismatch is found in any character, shift the pattern one position forward in the string and do the comparison again. Really, this is probably efficient enough for most purposes, should a programmer actually need to implement such an algorithm for some reason. For large searches through many large strings something better might be worth it, though.

Obviously (hopefully) I'm not going to be so foolish as to post such a trivial algorithm as the simple version described above, so there is an optimization, Horspool's that is. (One day I'll have an algorithm named after me.) Firstly, Horspool's starts by aligning the string and pattern as described above. Then it does it's comparison starting at the end of the pattern instead of the start. Now, the key insight is that if a mismatch is found the pattern can be shifted more than a single character. For instance if the final character, the first checked, does not match and also is not contained anywhere in the pattern then the pattern can be shifted the whole length of the pattern forward in the string. If, on the other hand, there is a mismatch but the character does exist elsewhere in the pattern then the pattern can be shifted forward the distance necessary to align that character in the pattern with the string character just checked. Furthermore this is true of any character mismatch anywhere in the string. The exact same shifting rule applies.

The algorithm will continue in this way until either all characters are found to match or the start position of the pattern within the string is such that the end of the pattern would be beyond the end of the string. in that case a "no match" indicator is returned (-1 in my version).

Note that an algorithm like this would not be worth it in code that just does low frequency single character comparisons. The simple algorithm is plenty good enough for that. In fact the computation of the shift table might erase any potential efficiency gain. For large comparisons, though, say a grep on a set of large files this algorithm could make a substantial difference. Note that the shift table is calculated only once and so can be amortized across multiple searches. Of course, as I mentioned before, in the real world we already have grep at our disposal but I'm sure whatever greppers one uses have a similar implementation to this (or maybe even more efficient versions, which exist, although those are quite a bit more complex and might be considered not worth it even for api vendors. Evidence of this is that Horpool published his algorithm after the more efficient, but similar, version. Apparently he saw a need for a simplification).

Anyway, here is my Ruby implementation. I tried to include a good many comments to make the logic easy to follow. Hopefully I succeeded. Enjoy.
class Horspool
def findIn(str, pattern)
@shiftTable=Hash.new(nil) #will contain shift amounts for all valid chars
@pattern=pattern
calcShift #buld the shift table
i=0 #pos in str
j=pattern.size-1 #pos in pattern
shift=pattern.size #initialize shift to maximum posible
while i+pattern.size-1<str.size #Pattern doesn't extend past end of str
#Use shift table lookup to shift the precomputed amount minus dist from
#end of pattern (which is (pattern.size-1)-j), if that's less then
#previously computed shift make it the current shift
newShift=@shiftTable[str[i+j,1]]-((pattern.size-1)-j)
(newShift<shift)?shift=newShift:nil
if str[i+j]==pattern[j] #str/pattern char match
j-=1 #moving backward through pattern
if j<0 #Made it all the way through pattern, must match
return i
end
else
i+=shift #Start comparing at new post in str
j=pattern.size-1 #reset j to end of pattern
shift=pattern.size
end
end
return -1 #No match
end

private

def calcShift
#Precompute to shift whole pattern length for chars not in pattern
for i in 'a'..'z'
@shiftTable[i]=@pattern.size
end
#Shift distance from end of pattern-1 for chars in pattern
for i in 0... @pattern.size-1
@shiftTable[@pattern[i,1]]=(@pattern.size-i)-1
end
end
end

if __FILE__ == $0
print Horspool.new.findIn("ericzetterbaum","zett")
end

2009-05-03

Java/Tomcat Web Services

I've decided I want to have my Java algorithms library available as a web services API. This post will describe how I set about to accomplish that.

First of all, I must give credit where it is due. Most of the information on how to do this for Tomcat (my container of choice right now) came from this document by Peter Yeung. Mostly the steps laid out in that document were sufficient for my project, although there were a few gotchas that took me some time to work out. I think he was using an older version of the Java Web Services Developer Pack. It looks like there have been a few changes, including the introduction of bugs!

So, to start, I downloaded the above mentioned JWSDP as noted above. This definitely makes the job of creating the web service infrastructure easier. It has utilities to generate the WSDL and a bunch of SOAP utility classes, stuff for serialization and such. Who would want to deal with all that? Not me! (Indeed, one could create highly functional web services without ever knowing WSDL using this an other similar utilities out there.)

The actual programming steps are then quite simple. First, one needs to create an interface to define the functionality of web service. For my Warshall algorithm, it looks like this:

package algorithms.server;
import java.rmi.Remote;
import java.rmi.RemoteException;

public interface Warshall extends Remote{
public PathMatrix getPathMatrix(byte[][] adjacencyMatrix)
throws RemoteException;
}


Then I created a WarshallImpl class that implements that interface (I won't bother posting it here since it's just a POJO, basically the Warshall.java I posted earlier, but I will make the whole project available to download.) Also I created the PathMatrix class rather than just returning the matrix as an array since I want to be able to send a status back to the client, like if it made an unreasonable request.

Now, you need two XML files. One is a JaxRPC web service description file that the JWSDP deploy utility will use to get its necessary information. It should be named jaxrpc-ri.xml. It looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<!-- configuration file for JWSDP wsdeploy tool -->
<webServices xmlns="http://java.sun.com/xml/ns/jax-rpc/ri/dd" version="1.0"
targetNamespaceBase="http://com.test/wsdl"
typeNamespaceBase="http://com.test/types"
>
<!-- the endpoint name becomes the service name in the WSDL -->
<endpoint name="warshallService"
interface="algorithms.Warshall"
implementation="algorithms.WarshallImpl"/>
<endpointMapping endpointName="warshallService" urlPattern="/warshallService"/>
</webServices>


Then you need a deployment descriptor. This was the first gotcha from Peter Yeung's instructions. He said to make an empty web.xml, i.e. just web-app tags with no content. This doesn't work. The utility will generate some kind of null pointer exception. After some Googling, I came up with this solution:
<?xml version="1.0" encoding="UTF-8"?>
<web-app>
<session-config>
<session-timeout>60</session-timeout>
</session-config>
</web-app>


Put both the web.xml and raxrpc-ri.xml in a config directory under your project root, and put the java source in a src directory. Also make a build directory, a dist directory and a web directory. One might choose to simplify this by keeping the configuration files in the web/WEB-INF directory, and building straight to to web/WEB-INF/classes. Or, just make a nice ant script to automate it all. I'm in the process of doing the latter, which I'll include with my project when I upload it. This will make it all very easy. In the meantime, after building the project I have to copy the conf and build contents into the web/WEB-INF and web/WEB-INF/classes directories.

Now we get to the magic parts. First, made a war file from the web directory:

jar -cvf build/myWEB-INF.war -C web/ .

Then I ran
%JWSDP_HOME%/jaxrpc/bin/wsdeploy.bat -verbose -o dist/algorithms.war build/myWEB-INF.war assuming that %JWSDP_HOME% is the install directory of JWSDP. This makes a new war file that is the web service app itself, that can then be deployed to Tomcat. Just dump this war file in the Tomcat webapps folder and the web service is deployed. That's it, done. Oh, do make sure you put the needed .jar files from the JWSDP distribution into the Tomcat lib. This is %JWSDP_HOME%\jaxrpc\lib, %JWSDP_HOME%\saaj\lib, %JWSDP_HOME%\jwsdp-shared\lib and %JWSDP_HOME%\jaxp\lib\endorsed.

OK, so now we have a functional web service. We can check that it works like so: http://localhost:8082/algorithms/warshallService?WSDL. That shows us our WSDL, which we didn't have to bother to write.

Now, how about a client? JWSDP also contains an automatic client maker. Nice! It generates all the client stub classes for us. All that's needed is another simple XML configuration file to give it instructions. For my client, it looks like this:

<?xml version="1.0"?>
<configuration xmlns="http://java.sun.com/xml/ns/jax-rpc/ri/config">
<!-- WSDL URL and generated package name -->
<wsdl location="http://localhost:8082/algorithms/warshallService?WSDL" packageName="clientStub"></wsdl>
</configuration>

I named it /config/wscompile_config.xml under the project root.

Then I ran:

%JWSDP_HOME%/jaxrpc/bin/wscompile -d build -gen:client -keep config/wscompile_config.xml

That generates all the client stub classes. Finally, make a jar of it:

jar -cvf lib/wsstub.jar -C build clientStub/

OK, now we're good to go... mostly. This was the other big gotcha. To actually use this we will need the necessary jars from the JWSDP distribution. The one's we need are from: jwsdp-shared\lib, \fastinfoset\lib and \jaxrpc\lib. Now I come to the biggest gotcha of this project You also need the saaj jars, but the ones included with the JWSDP dist DIDN'T WORK. It seems like there is some kind of incompatibility with some of the other jars, or maybe something. But I kept getting a class cast exception, specifically:

java.lang.ClassCastException: com.sun.xml.internal.messaging.saaj.soap.ver1_1.Message1_1Impl cannot be cast to com.sun.xml.messaging.saaj.soap.MessageImpl

This one drove me crazy. Finally I solved it by downloading the lib files directly from the SAAJ (SOAP with Attachments API for Java) project, https://saaj.dev.java.net/. That indeed solved the problem. And I was done with my client. My source, the part I had to write myself, looks like this:
package algorithms.client;
import java.rmi.RemoteException;
import java.util.Random;

import javax.xml.rpc.ServiceException;

import clientStub.*;

public class WarshallClient {
public static void main (String[] args) {
WarshallService warshallService = new WarshallService_Impl();
Warshall warshall = null;
try {
warshall = warshallService.getWarshallPort();
} catch (ServiceException e) {
e.printStackTrace();
}
try {
PathMatrix pm = warshall.getPathMatrix(new byte[][]{{0,0,0,1},{1,0,1,1},{1,0,0,1},{1,0,1,0}});
System.out.println();
System.out.printf("Status=%s\n",pm.getStatus());
System.out.print("Path matrix:");
for(int i=0;i<pm.getPathMatrix().length;i++) {
System.out.println();
for(int j=0;j<pm.getPathMatrix().length;j++) {
System.out.print(pm.getPathMatrix()[i][j] + ",");
}
}
} catch (RemoteException e) {
e.printStackTrace();
}
}
}

WarshallService, WarshallService_Impl and Warshall were all generated automatically by the wscompile utiltiy. Warshall.java is just a stub for the actual web service. The rest is just to test that it's actually working. And it does! Now, making web services is about as easy as making the POJOs. Nice! Now I can add my algorithms API to the cloud! I'm sure thousands of programmers are waiting anxiously. :)

Anyway, I'll post the whole source as soon as I have a chance to clean it up a little. In the meantime, I hope my experience proves useful to others.

2009-05-01

Java Interview Questions

Here are a few of the more notable Java programming questions I've had at interviews lately.

Q: How do sets and lists differ in handling multiple nulls?

A: In a Set, there will have be single null, while in a List there will be as many nulls as were inserted. This is because sets consolidate equal inputs into a single entry (no duplicates) while lists allow multiples of the same value. BTW, this comes from the mathematical definition of sets and lists.

Q: Given the following: Set s = new HashSet<String>(); what is the behavior of this set? For instance what happens if I add, say, an Integer?

A: The Set will contain the Integer. No compiler or runtime errors will occur. Whether or not a collection is a generic depends on the declaration type, not the assignment type.

Q: In an HttpSession, if one adds a HashMap as a session attribute in a request, then in another request retrieves the HashMap and adds another entry: is it necessary to set the attribute again for yet another request to retrieve the new HashMap entry?

A: Yes. If you don't then the HashMap will appear to the subsequent request as if you haven't updated the HashMap. In other words, if one modifies the object in a session attribute, the attribute needs to be re-set with the object.

Q: Are there memory leaks in Java?

A: Not in the traditional sense as they are understood in C programming, for instance. By that I mean that one doesn't need to worry about explicitly freeing memory after allocating it. The garbage collector takes care of that. There are programming habits that lead to effective memory leakage, however. For example one might have a private instance variable that is initialized in, say, a constructor. That variable might not be referenced anywhere else in the class but as long as an application retains a reference to the class object itself the variable can't be garbage collected. Now this is probably really an example of the bad programming technique of using instance variables when a local variable will do but, still, it illustrates how allocated memory can stay allocated even though it is not reachable. In a more complex example it might not be as obvious as in this simplified version.