Monday, October 22, 2007

MyISERN 1.2

Milestone release
User Guide
Developer Guide
Release Notes

This assignment turned out to be very repetitive, and also very difficult on the details. Although we have finished all the tasks, they are not done in a very elegant and efficient way.

The command line interface for inputting data is more complicated than I thought. By complicated, I mean the tons of if and else causes and several infinite loops inside almost every function. The backbone of the program is relatively simple, but that is only based on the assumption of the user do not input error data and knows what to input. A lot of effort has been spent on error checking. However, there are still some minor bugs in the program. We did not notice these bugs until we did the data input at the very end, where we did not know most of the data. If the data is complete and correct, the program will work perfectly fine. However, if the data is incomplete, the program will throw out a bunch of errors due to the schema requirement. Although the errors are all caught by the error checking and indicated, fixing them is such a harsh and tedious job.

What makes this assignment very repetitive and boring is the data input part. It is very often that the information is incomplete, and we will have to end up guessing. Also, because we assigned different data for different group members, such as one person does the researchers and another person does the organization, the program has a serious problem when merge these data together. The different names of organizations and researchers, which are chosen by different group members, give out many invalid links errors, even though they are essentially referring to a same organization or researcher.

This assignment once again showed the importance of face-to-face group work. At the last minute when we tried to figure out the data errors, we were using instant messengers, which was extremely inefficient. Also, a good planning ahead is still not achieved. Although we took our time to scratch out the structure of the program when we met, there are still many changes happened during the actual implementation. The failures to foresee these changes gave us quite a bit of troubles.

In terms of improvement to the system, there are many can be done. First is the Emma testing. We did not cover most of the methods in the editing class, which has many user input inside. Although Professor Johnson said it is not quite possible to do a detailed testing case for it in one week, I still feel unsatisfied for the low coverage of code. On the other hand, a more strict and accurate error checking should be implemented. The current system will still miss some error input.

Overall, this is another lengthy and hard assignment. Well, I think the good thing is I am starting to get used to it.

Use Cases

Use Case Wiki Pages

This is not a really difficult assignment, since we have gathered the necessary information we need to create the use cases.

We decided to use a use case template to keep our uses cases consistent. This is easily done. After several Google searches, we found some templates that might fit our needs. We modified the templates so it matches our use cases. At first we tried to fill out the use cases using tables, but then very soon we found out it is almost impossible to have a nicely formatted table in the Google project wiki pages. The only way to preserve the new lines is using the code block, which has some weird color when the content is not a real code. Also, some new lines are not preserved somehow. However, we figured this would be the best we can do in these Google project wiki pages.

In this assignment, I have a better understanding of use case. When writing a use case, consideration for both the user and the system are necessary. Also, a clear flow of how the process will be very helpful. Although use case is better when writing as scenarios, it is very easy to get wordy. A balance point between detained description and concise story is important. Lastly, use case driven design seems reasonable approach for developing software. After all, the users are the ones who determine whether the software satisfy their need. Knowing how they would use the software would save a lot of unnecessary mistakes.

Wednesday, October 17, 2007

MyISERN 1.1 Review

This time we are able to choose the author to review. Therefore, I chose one of the graduate students group to see how they are doing differently from us. The author I picked was Andrew Wong from the Silver group.

Installation Review:

The source code is downloaded successfully. The Google project page also includes the wiki pages for user guide, develop guide, and the release notes. Following the instruction in the user guide, the jar file is successfully created. The verify task ran successfully also, which implies the JUnit, Checkstyle, PMD, and FindBugs are all passed as well. However, when I tried to execute some of the commands to generate the tables, I saw some really chaos and random display instead of a nicely organized table. It took me a while to figure this out: the author is doing the display based on the column numbers, which is obviously a number greater than the normal width of a command prompt window. I tried to resize my command prompt window to 800 columns, and then a nicely structured table finally showed up. The information is mostly horizontally listed, which requires a quite a big width for the window to display them correctly.

I would recommend the author to put this issue into the user guide, since normal users would not resize their command prompt window to verify the display, and they would simply consider the program is not working.

Code format and conventions review:

There are not many code violation can be found. However, for most of the files, the private data members are not documented. (For example, line 46-48 in MyIsernXmlLoader.java) Other than that, the code are well written and easy to read.

Test case review:
Black box perspective:
The program does print out the information correctly when there is enough room to display them. However, duplicates and invalid links in the data files are not shown. I would expect to see, for example, at least an error message showing that there is duplicate researcher in my researchers XML file, so that I can go ahead and change it. Other than that, the black box testing seems to be working pretty well.

White box perspective: The Emma report is as follow:
Emma Coverage summary
class: 100% (1/1)
method: 97% (9/10)
block: 100% (497/508)
line: 100% (104/108)
Besides one method that has not been covered, all the other codes are tested. I looked into the Emma report and found out the non-tested method was a method that involves the enum type classes, which Andrew has been mentioned in the discussion. I personally do not know much about enum type, and it seems like the non-tested methods are somehow generated inside an enum class, which is not really inside the program. With the fact that the Java development team is probably responsible for the testing for the enum class, I consider the author did an excellent job in the white box testing.

Break da buggah: There is not really a way to "break" the program, but there are some ways to cause it malfunction, or at least, not functioning as expected. The first way is mentioned before, to execute the program in a normal command prompt window, which it does not print out information correctly. Secondly, the program does not print out error message most of the time for some invalid command calls. For instance, if I invoke the “-describe –all” task by passing something other than “researchers, organizations, collaborations”, the program will simple do nothing.

Summary and Lessons Learned:
Overall, the program is pretty well written. One thing I really like about is the modulization of the program. The author breaks down the program into loader, printer, query engine, checker, and command line parser. Consider this MyISERN project would be last for two semesters, it is definitely necessary to factor the program into different components, which does not interfere each other when one gets changed. I have been thinking to modulize my whole program, and now the Silver group gives me a very good example of doing this. Despite the tremendous amount of work to modulize my current chubby program, I think it is worth the time and effort to do so.

Monday, October 15, 2007

MyISERN-1.1

Milestone release
User Guide
Developer Guide
Release Notes

This is another very challenging assignment. Although I managed to accomplish all the tasks, I feel strongly there are many improvements can be made.

The hardest part of this assignment would be the unclear requirement. Although the requirements are posted on the course page, many of them are not clarified, especially the one that asks for link verification. Even though Professor Johnson answered it in the discussion, it still left me quite a bit of confusion. It seems like it could be done in several ways, and there are pros and cons in each way. I think this is what will happen in real life too. The customer do not always give clear requirement of what they want, or more often, even the customer themselves do not know what they want.

Another hard part of this assignment was the cooperation in the group. We started working for the different parts of the assignment, and then very soon we encountered the conflict problem while we were trying to commit our own files. Although we worked in different code, the conflict still appeared somehow. I think for student group project, commit conflict will be more often since students tend to work together around the same time and commit the code at the same time. Anyhow, we figured out the problem by merging or all updated to a same version.

Last hard part of this assignment is the testing-base design. Since our code are supposed to be tested by JUnit, I tried to avoid using void methods as much as I can. However, the requirements of this assignment seem to be still printing stuff out on the standard output. I could have made the methods return some values, such as true or false, or even a list or a map, but I really did not see the purpose of doing that. It simple violates the nature of the requirement. If the requirement is to print out something, then the methods should print out something without returning anything. After a little bit debate, we stick with the void methods.

For the possible improvement during the next assignment, I would first make sure what the framework of the program would be like, then list out all the classes, methods, and variables that will be implemented. In this way, everyone could work differently but still go towards a same goal. Besides, I would try to find a balance between the design and testing, and avoid the usage of void methods. After all, they are not good in the sense that they cannot be tested easily. Also, one thing I will definitely do is the issue organizations. In this assignment, our issue management was very poor, and I did not figure out how to edit an issue till the very end. It was simply a mess. In my opinion, the red team did very well on this. They divided the issue according to the requirement, thus for every commit there is always an issue to be related. This can avoid the redundant creation of the issues.

Overall, this assignment is not easy, but as usual, I learned a great deal of stuff from it.

Tuesday, October 9, 2007

MyISERN Review

I was assigned to review Jung Kim’s code this time, but by the time I was writing this review, I could not find the entry from his blog for the MyISERN assignment. Therefore, I went to his team’s Google project page to download the source code.

Installation Review:

The source code is downloaded successfully. There is no wiki page for the user guide or something like that, but after all this was not required in the previous assignment. I had no problem at all while trying to import the project into eclipse. I invoked the verify test in the command prompt, and it ran successfully. This implied the Checkstyle, PMD, FindBugs, and JUnit tests should all be passed.

Since the first MyISERN assignment is relatively simple, there is not much code added in the program. I can easily understand the code.

For the program execution, I invoked “ant jar” and it creates a jar without any problem. Then I invoke the jar file by “java –jar” command. The tables were printed out correctly. Overall, the program does what it supposes to do, and it functions correctly.

Code format and conventions review:

There are a few violations in the code:


FileLinesViolationComment
MyIsernXmlLoader152, 169, 196, *EJS #7Unnecessary blank line within a method
MyIsernXmlLoader142, 159, 194, *EJS #9 Pick more meaningful variable name or give it a comment
MyIsernXmlLoader191EJS #27Name for collection should be plural


Test case review:

Black box perspective
: The test does print out the table as expected. Other than that, the number of researchers, organizations, and collaboration are also tested, but that is given from the initial files. However, printing out the table is the only objective of the assignment, and the data are all given, there is not much black box testing can be done.

White box perspective: The Emma report is as follow:
Emma Coverage summary
class: 100% (1/1)
method: 90% (9/10)
block: 98% (497/508)
line: 96% (104/108)
Although the coverage is not 100%, it does not mean the testing is not sufficient from our experience of code coverage. By inspecting the testing case, I found out the reason for the missing coverage is that the main method has not been tested. However, since main method is “evil”, and JUnit is almost useless when it comes to testing of the standard output, I consider the author has done a fairly good job on the testing. Also, there is one line of error checking code is not covered. As mentioned above, the data for the assignment is given, thus an error input cannot be produce and the error checking will never be executed.

Break da buggah: I could break the program by running it outside its original folder, because the XML data files are defined such that they are located within the original folder. However, this seems a little bit too extreme to me because I think only a programmer that knows the internal structure of the program would know this could cause the program to fail.

Summary and Lessons Learned:
Overall, the program is pretty well written. It functions correctly and safely. One lesson I learned is that when the data is given, there is still no guarantee that the program will not go wrong. Furthermore, things tend to change a lot in real life, and errors usually emerge during the change. Another lessoned I learned is the importance of documentation. Not only the meaningful comment in the code is necessary, but also a user guide or manual is very helpful.

Monday, October 8, 2007

MyISERN-1.0

The source code can be downloaded from here.
Our Google project page is here.

The objective of this assignment is to get us started on building the social network application for ISERN, as well as start using SVN and Google project hosting. In this assignment, I learned about JAXB, XML Schema, group software development process, and I am more familiar with SVN, project hosting and XML.

JAXB: This application is very powerful when it pairs up with XML Schema. It took me a couple of hours to figure out how JAXB Marshaller class works. It basically provides a bridge to connect Java and XML. We can easily convert data that is defined by XML Schema to Java classes, and it is done almost automatically.

XML Schema: This is a way to define data in XML as far as what I understand right now. It might seem a little bit complicated at first, but it really makes sense once it is converted to Java classes by using the XJC compiler. Since XML is becoming a more and more popular standard to hold data, I think it would be very useful to learn about and get familiar with Schema.

SVN: It works very nicely for team development. By having a master copy of the working code, we can still share code and make progress without meeting face to face. I personally keep the source code for commit and update in a separate folder than the workspace folder in eclipse. This turned out to be a good choice I think, because some of my classmates informed me they are having problem when they commit or update the code straight from the workspace folder in eclipse.

Google project hosting: This turned out to be a much better service than what I expected. It helps us organize the project in a consistent manner. However, I do not really understand the purpose of the Wiki pages functionality. They seem to be only some text files with a .wiki extension. Also, the naming restriction of these Wiki pages is quite weird: It does not allow spaces or some special characters. Nonetheless, Google project hosting is satisfactory for me.

Group software development: I am lucky enough to have a good group for this assignment. We meet twice for long period of time during school, which made us finish most parts of assignment. We also tried pair-programming during the time we meet, and it was very helpful. Many errors or potential bugs, which might take several minutes to discover if we writing the code individually, were caught right at the time when we were writing the code. I do think it improve the speed of producing high quality code. For the Scrum methodology approach, since we were doing pair-programming, many tasks were accomplished together, but we still tried our best to make everyone to do everything.

JUnit: Once again, I feel a little powerless when it comes to void method test case in JUnit, especially the System.out.println() method. Since the objective of this assignment is to print out tables, which obviously should be using void method, the test cases are more or less impractical. They seem to be just for the purpose of fulfilling the Emma coverage. We tried to use Boolean as a return value to do error checking for these print methods, but the data in this assignment is all given by the XML files, thus we cannot even produce an error case in the testing, which makes the error case never being executed.

Thursday, October 4, 2007

CM Practice

Project Homepage
Discussion Group
SVN Group

Comparing to the WebSpider assignment, this is a relatively easy assignment. I was able to accomplish all three tasks. Although this task is simple, I still encountered some difficulties.

The first problem for me was the authentication of the tortoiseSVN. Thanks to the Prof. Johnson's help, I figured out that it was the HTTPS prefix of the URL screwed me over. When the HTTPS is used, that means an encrypted connection between the browser and website is established. In the HTTP case, the connection is unencrypted, and thus the write access is granted. Although I have heard about this concept before, this is actually the first time I encounter a problem caused by it.

The second problem is the mailing list setting for my project. The PDF file from the class website is unclear and different from what I see in the website, which made me a little confusing. Fortunately with the help from my classmates, I am able to solve the problem. I also commit twice to the SVN to make sure that there is a notification sends out to the mailing list whenever I do a commit. It would be much better that Google can integrate this functionality into its project hosting page.

From this assignment, I learned how to host a project use Google hosting, and I clarify the mechanism of how the SVN works. I had used SVN in a previous class, but there were a lot of troubles because the instructor did not teach us anything about it. Terms like trunk, commit, and configuration really confused me. This time I actually realize the advantages of using SVN, which I believe it will be a great help to my group project in the rest of the semester.

Monday, October 1, 2007

WebSpider Review

The package I reviewed is from Laura Matsuo. The following is my result for the reviews:

1. Installation Review


I downloaded the zip file from Laura's blog, and the installation process went very smoothly with only a few modifications, such as renaming the files and folders. I noticed that the zip file comes with 3 text files: a.txt, b.txt, c.txt. They seem to be some logging output files, but I think they are only for intermediate testing purpose. Also, there are 2 jar files already included in the package: webspider.jar and webspider-lauramat.jar. I think the author forgot to delete these jar files before distributing the package. Nonetheless, the jar file can be correctly created using the ant jar command.


For the QA tools testing, I was a little surprised that the verify test did not pass. Therefore, I went to run the test tool one by one, and here are the results:
JUnit: Successful, but there are a lot of logging info printed out, and it made the output in the command prompt a little bit messy.
Checkstyle: Successful.
PMD: 1 rule violation "assertTrue(true) or similar statements are unnecessary". This comes from the main method testing. As the Prof. Johnson said, the main method is evil, and I think the author did this in order to complete the 100% coverage in Emma.
FindBugs: Successful.
Emma: Successful with 100% coverage to all classes, methods, blocks and lines.


For the program execution, I invoked the command "java –jar webspider-lauramat.jar –totallinks
http://www.hackystat.org 100", and the result was 1553 links. Regardless this result is correct or not, the program seemed to be functioning correctly. However, I found one thing that is a little inconsistent: the arguments are passed and checked as an array, but the order of the arguments does not affect the program because they are checked in a way such that as long as they are valid, they will be passed. This is not consistent with the normal way of invoking this program, which I tend to consider it as a bad thing.

2. Code Format and Convention Review


The source codes are very nicely written and documented. I did not find any apparent violation in terms of coding convention. Also, the methods are highly modulized, thus it was actually quite easy to understand the code.

3. Test Case Review

All testing methods do not contain the annotation of "@Test". I am not sure whether the annotation is needed when the test class extends the TestCase class, since there will be error when I tried to add back the @Test annotation.

Black box perspective: All the public methods are tested directly or indirectly. The two major tasks are tested through the website
http://www2.hawaii.edu/~lauramat/myfavorites/. For the total links task and the most popular page task, the test cases are correct by comparing to the expect results. The test cases also included the boundaries input, such as an invalid URL, an invalid page, and a page with no link. However, the program did not include test cases like invalid number of pages (negative number, very large number) to crawl through, or a page with link points to itself.

White box perspective: The author did a very good job on this one. The Emma summary is as below:
class: 100% (2/2)
method: 100% (26/26)
block: 100% (648/648)
line: 100% (136/136)
Every single line of code has been tested, although this does not guarantee high quality codes, it is still an outstanding sign for the completeness of the testing.

Break the buggah: I used Prof. Johnson's myspace
website to run through the total links task, the program seemed to be crushed because of the JavaScript content, even though it still returned a result of 0 links found in the first 100 pages. On the other hand, I tried to omit the number of pages argument, and the program went to find the total number of links on the first page, instead of reporting the missing of the argument.

4. Summary and Lessons Learned


From review Laura's code, I learned that a nicely written code is very self-explaining. I actually enjoyed reading her code while doing this review. I am pretty sure that my reviewer is going to have a hard time to read through my code since they are quite messy and poorly commented. Also, I learned that when we distribute a package, we should always check the content of package to make sure there is no missing files or extra files.


On the other hand, I notice that testing can be very tricky. When I wrote my test cases, not all the exceptions cases are included, and I did not have the black box and white box perspective in mind. From now on I should pay more attention in writing test cases using the different perspectives because they can complement each other.