Static Analysis with PMD
by Tom Copeland02/12/2003
What is PMD?
PMD is a utility for finding problems in Java code. PMD does this using static analysis; that is, analyzing the source code without actually running the program. PMD comes with a number of ready-to-run rules that you can run on your own source code to find unused variables, unnecessary object creation, empty catch blocks, and so forth. You can also write your own rules to enforce coding practices specific to your organization. For example, if you're doing EJB programming, you could write a PMD rule that would flag any creation of
Thread
or Socket
objects. If you're feeling generous, you can donate that rule back to PMD for anyone to use.Background
PMD was initially written in support of Cougaar, a Defense Advanced Research Projects Agency (DARPA) project billed as "An Open Source Agent Architecture for Large-scale, Distributed Multi-Agent Systems." DARPA agreed that the utility could be open sourced, and since its release on SourceForge, it has been downloaded over 14,000 times and has garnered over 130,000 page views. More importantly, though, numerous PMD rule suggestions and IDE plugins have been written by open source developers and contributed back to the core PMD project.
Installing and Running
You can download PMD in either a binary release or with all of the source code; both are available in .zip files on the PMD web site. Assuming you've downloaded the latest PMD binary release, unzip the archive to any directory. Then it's up to you how to use it--if you simply want to run PMD on a directory of Java source files, you can run it from a command line like this (the command should be all on one line):
C:\data\pmd\pmd>java -jar lib\pmd-1.02.jar c:\j2sdk1.4.1_01\src\java\util
text rulesets/unusedcode.xml
c:\j2sdk1.4.1_01\src\java\util\AbstractMap.java 650
Avoid unused local variables such as 'v'
c:\j2sdk1.4.1_01\src\java\util\Date.java 438
Avoid unused local variables such as 'millis'
// etc, etc, remaining errors skipped
You can also run PMD using Ant, Maven, or an assortment of Integrated Development Environments (IDEs) including jEdit, Netbeans, Eclipse, Emacs, IDEAJ, and JBuilder.
Built-in Rules
So what rules come with PMD? Well, here are some examples:
- Unused code is always bad:
public class Foo { // an unused instance variable private List bar = new ArrayList(500); }
- Why are we returning a concrete class here when an interface--i.e.,
List
--would do just as well?
public ArrayList getList() { return new ArrayList(); }
- Nothing's being done inside the
if
success block ... this could be rewritten for clarity:
public void doSomething(int y) { if (y >= 2) { } else { System.out.println("Less than two"); } }
- Why are we creating a new
String
object? Just useString x = "x";
instead.
String x = new String("x");
Related Reading
|
There are many other rules, but you get the idea. Static analysis rules can catch the things that would make an experienced programer say "Hmm, that's not good."
How it Works: JavaCC/JJTree
At the heart of PMD is the JavaCC parser generator, which PMD uses in conjunction with an Extended Backus-Naur Formal (EBNF) grammar and JJTree to parse Java source code into an Abstract Syntax Tree (AST). That was a big sentence with a lot of acronyms, so let's break it down into smaller pieces.
Java source code is, at the end of the day, just plain old text. As your compiler will tell you, however, that plain text has to be structured in a certain way in order to be valid Java code. That structure can be expressed in a syntactic metalanguage called EBNF and is usually referred to as a "grammar." JavaCC reads the grammar and generates a parser that can be used to parse programs written in the Java programming language.
There's another layer, though. JJTree, an add-on to JavaCC, enhances the JavaCC-generated parser by decorating it with an Abstract Syntax Tree (AST)--a semantic layer on top of the stream of Java tokens. So instead of getting a sequence of tokens like
System
, .
, out
, .
, and println
, JJTree serves up a tree-like hierarchy of objects. Here's a simple code snippet and the corresponding AST:Source Code
public class Foo {
public void bar() {
System.out.println("hello world");
}
}
Abstract Syntax Tree
CompilationUnit
TypeDeclaration
ClassDeclaration
UnmodifiedClassDeclaration
ClassBody
ClassBodyDeclaration
MethodDeclaration
ResultType
MethodDeclarator
FormalParameters
Block
BlockStatement
Statement
StatementExpression
PrimaryExpression
PrimaryPrefix
Name
PrimarySuffix
Arguments
ArgumentList
Expression
PrimaryExpression
PrimaryPrefix
Literal
Your code can traverse this tree using the
Visitor
pattern--the object tree generated by JJTree supports this.How it Works: Rule Coding
Generally, a PMD rule is a
Visitor
that traverses the AST looking for a particular pattern of objects that indicates a problem. This can be as simple as checking for occurrences of new Thread
, or as complex as determining whether or not a class is correctly overriding both equals
and hashcode
.
Here's a simple PMD rule that checks for empty
if
statements:// Extend AbstractRule to enable the Visitor pattern
public class EmptyIfStmtRule extends AbstractRule implements Rule {
// This method gets called when there's a Block in the source code
public Object visit(ASTBlock node, Object data){
// If the parent node is an If statement and there isn't anything
// inside the block
if ((node.jjtGetParent().jjtGetParent() instanceof ASTIfStatement)
&& node.jjtGetNumChildren()==0) {
// then there's a problem, so add a RuleViolation to the Report
RuleContext ctx = (RuleContext)data;
ctx.getReport().addRuleViolation(createRuleViolation(ctx,
node.getBeginLine()));
}
// Now move on to the next node in the tree
return super.visit(node, data);
}
}
The code can be a bit obscure, but the concepts are, for the most part, straightforward:
- Extend the
AbstractRule
base class. - Put in a "hook" so you'll get a callback when a node you are interested in is encountered. In the example above, we want to be notified for each
ASTBlock
, so we declared the methodvisit(ASTBlock node, Object data)
. - Once we get a callback, poke around to see if we find the problem for which we're looking. In the example, we are looking for
if
statements with empty bodies, so we look up the tree to ensure we are in anASTIfStatement
, and then down the tree to see if there are any child nodes. - Note that we can do this another way--we can register for a callback when we hit an
ASTIfStatement
and then look down the tree to check for an empty block. Which way you do this is up to you; if you run into performance problems, consider rewriting your rule.
How it Works: Rule Configuration
Once you've written a rule, you need to put it inside a PMD ruleset--which is, naturally, a set of rules. A PMD ruleset is defined in an XML file, and looks like this:
<rule name="EmptyIfStmt"
message="Avoid empty 'if' statements"
class="net.sourceforge.pmd.rules.EmptyIfStmtRule">
<description>
Empty If Statement finds instances where a condition is checked but
nothing is done about it.
</description>
<priority>3</priority>
<example>
<![CDATA[
if (absValue < 1) {
// not good
}
</XMLCDATA>
</example>
</rule>
As you can see, the rule configuration file contains a lot of information. This is primarily so that an Integrated Development Environment (IDE) can show a complete description of your rule--code samples, description, etc.
To run the new rule, put both the ruleset XML file and the code in your
CLASSPATH
and run PMD to see the results.
The best way to learn how to write a custom rule is to find a source code problem you want to catch, look at some of the many rules that come with PMD to see how it's done, and then experiment with your own rule. Be sure to post to the PMD forums if you have problems; your feedback can help us improve the documentation and examples so that others will find PMD easier to use.
Flotsam and Jetsam
PMD has some nifty features that, although not germane to PMD, deserve a mention. For example:
- The PMD web site is generated by Jakarta's Maven project, which puts together a nicely-linked set of web pages including cross-referenced source code, Javadocs, a team member listing, and a change log.
- We use JUnit to test PMD--currently, we've got over 400 JUnit tests. These provide a comfortable safety net for use in day-to-day PMD development.
- We've set up a "PMD Scoreboard" to which SourceForge users can add projects to be checked for unused code several times a day. Over sixty projects have been added to the list, and the colors are quite lovely. The same report is also run for Jakarta projects on a different site.
Conclusion
We've discussed static code analysis and how problems can be found without needing to compile and run the code. We've had a quick tour of EBNF grammars and JavaCC, as well as a brief discussion of an Abstract Syntax Tree. We've seen how PMD uses all of that to check source code, and we've seen how to write a custom rule to implement rules specific to your project. There's much more information at the PMD web site; give it a try!
Credits
Thanks to David Dixon-Peugh and Dave Craine for proofreading this article. Thanks also to the many contributors without whom PMD would be a much less useful utility.
References
- PMD home page
- PMD Scoreboard (SourceForge)
- PMD Scoreboard (Jakarta)
- "How to write a PMD rule"
- DARPA home page
- Cougaar home page
- JUnit home page
- Maven home page
- JavaCC/JJTree home page
Nessun commento:
Posta un commento