Monday, October 1, 2007

WebSpider Review

The package I reviewed is from Laura Matsuo. The following is my result for the reviews:

1. Installation Review


I downloaded the zip file from Laura's blog, and the installation process went very smoothly with only a few modifications, such as renaming the files and folders. I noticed that the zip file comes with 3 text files: a.txt, b.txt, c.txt. They seem to be some logging output files, but I think they are only for intermediate testing purpose. Also, there are 2 jar files already included in the package: webspider.jar and webspider-lauramat.jar. I think the author forgot to delete these jar files before distributing the package. Nonetheless, the jar file can be correctly created using the ant jar command.


For the QA tools testing, I was a little surprised that the verify test did not pass. Therefore, I went to run the test tool one by one, and here are the results:
JUnit: Successful, but there are a lot of logging info printed out, and it made the output in the command prompt a little bit messy.
Checkstyle: Successful.
PMD: 1 rule violation "assertTrue(true) or similar statements are unnecessary". This comes from the main method testing. As the Prof. Johnson said, the main method is evil, and I think the author did this in order to complete the 100% coverage in Emma.
FindBugs: Successful.
Emma: Successful with 100% coverage to all classes, methods, blocks and lines.


For the program execution, I invoked the command "java –jar webspider-lauramat.jar –totallinks
http://www.hackystat.org 100", and the result was 1553 links. Regardless this result is correct or not, the program seemed to be functioning correctly. However, I found one thing that is a little inconsistent: the arguments are passed and checked as an array, but the order of the arguments does not affect the program because they are checked in a way such that as long as they are valid, they will be passed. This is not consistent with the normal way of invoking this program, which I tend to consider it as a bad thing.

2. Code Format and Convention Review


The source codes are very nicely written and documented. I did not find any apparent violation in terms of coding convention. Also, the methods are highly modulized, thus it was actually quite easy to understand the code.

3. Test Case Review

All testing methods do not contain the annotation of "@Test". I am not sure whether the annotation is needed when the test class extends the TestCase class, since there will be error when I tried to add back the @Test annotation.

Black box perspective: All the public methods are tested directly or indirectly. The two major tasks are tested through the website
http://www2.hawaii.edu/~lauramat/myfavorites/. For the total links task and the most popular page task, the test cases are correct by comparing to the expect results. The test cases also included the boundaries input, such as an invalid URL, an invalid page, and a page with no link. However, the program did not include test cases like invalid number of pages (negative number, very large number) to crawl through, or a page with link points to itself.

White box perspective: The author did a very good job on this one. The Emma summary is as below:
class: 100% (2/2)
method: 100% (26/26)
block: 100% (648/648)
line: 100% (136/136)
Every single line of code has been tested, although this does not guarantee high quality codes, it is still an outstanding sign for the completeness of the testing.

Break the buggah: I used Prof. Johnson's myspace
website to run through the total links task, the program seemed to be crushed because of the JavaScript content, even though it still returned a result of 0 links found in the first 100 pages. On the other hand, I tried to omit the number of pages argument, and the program went to find the total number of links on the first page, instead of reporting the missing of the argument.

4. Summary and Lessons Learned


From review Laura's code, I learned that a nicely written code is very self-explaining. I actually enjoyed reading her code while doing this review. I am pretty sure that my reviewer is going to have a hard time to read through my code since they are quite messy and poorly commented. Also, I learned that when we distribute a package, we should always check the content of package to make sure there is no missing files or extra files.


On the other hand, I notice that testing can be very tricky. When I wrote my test cases, not all the exceptions cases are included, and I did not have the black box and white box perspective in mind. From now on I should pay more attention in writing test cases using the different perspectives because they can complement each other.

No comments: