JavaCC 21 now has assertions!

JavaCC 21 now has assertions. There are actually two kinds of assertions:

  • The assertion condition is expressed in Java code.
  • The assertion condition is a lookahead expansion.

The first kind of condition looks like this:

ASSERT {someCondition()}

Optionally, the assertion can have a message, as in:

ASSERT {x!=y} : "At this point, x and y cannot be the same!"

Note that this kind of assertion is only applied when parsing, not inside a lookahead routine. That is unless you put in a # character that indicates that it also applies when scanning ahead, as in:

ASSERT {x != y}#

The other kind of assertion looks like:

ASSERT ("foo" "bar")  : "Expecting \"foo\" followed by \"bar\" here."

The assertion fails if the next two tokens are not "foo" followed by "bar".

The expansion must be within paretheses and is optionally prefixed by a ~ meaning that the condition is negated.

ASSERT ~(",") : "A comma cannot occur here!"

Semantics, parsing vs. scanahead

One notable aspect of assertions in JavaCC 21 is that an assertion failing has a completely different meaning, depending on whether you are in a regular parsing mode or in a lookahead routine. This is analogous to the semantics of the FAIL statement.

If you hit a FAIL instruction in regular parsing, this means that a ParseException is thrown and processing is aborted. (N.B. If the experimental fault-tolerant parsing is turned on, the error recovery machinery does get invoked and the parser will try to resync.) However, if you reach a FAIL directive in a lookahead, that just means that the lookahead routine fails.

The same applies to an assertion failing. If you are in regular parsing, this is really a parsing failure. If you are in a lookahead routine, it just means that the lookahead fails.

But JavaCC 21 already had assertions!

A little secret is that the above-described feature is really a sort of syntactic sugar. The following:

ASSERT {someCondition()}

is really a shorthand for:

[
    SCAN {!someCondition()} => FAIL
]

and similarly:

ASSERT (Foobar) : "Expecting a Foobar here"

could already be expressed this way:

(
    SCAN Foobar => {}
    |
    FAIL "Expecting a Foobar here"
)

I think most people would prefer to write it the previous way. In fact, now that assertions can be expressed in such a terse manner, I would anticipate people peppering their code with ASSERT statements to verify their beliefs about how the parser they're developing works.

Well, one feature that Java assertions have that is not in JavaCC 21 is the ability to enable and disable them with a command-line switch. Frankly, I don't know what the point of disabling assertions is. I guess some people like to live dangerously! Okay, I understand that checking the various assertions has some run-time execution penalty, but it is usually pretty low. I mean, if an assertion failing really means that there is some problem in your code, then further processing really should abort. Though, granted, that might depend on whether the app was a game or an online banking app. Or if it was the software that controls some life-support machinery. (And in that case, further depending on whether there is any backup system in place!)

So, okay, one's parameters can vary. As things stand, I didn't see much value in being able to disable the assertions. I just tend to think that if you assert that something is true at some point, and it isn't, you should be alerted to it as soon as possible.

Some other odds and ends

There is now a tree-building annotation #scan that indicates that a grammar production is used exclusively in lookahead. What this means is that if you have:

 SomeStatement : 
    SCAN SomeStatementLookahead =>
    .... 
;

so there is another production called SomeStatementLookahead that only occurs in a lookahead, but is never used in actual parsing, you can annotate that production with #scan and no parsing code is generated for it, only lookahead.

SomeStatementLookahead#scan :
    ....
;

Another little point about about productions annotated with #scan is that there is no need to append a # after a semantic condition to indicate that it applies in a lookahead. Any Java code snippet in such a production automatically applies in a lookahead.

Also, no Node subclass is generated, which is the same as #void, but also no parsing production is generated.

Something that has been there for a few months at least is the ability to write something like:

<IDENTIFIER> ("(" Parameter ("," Parameter)*)#MethodCall(+1)

What the above means that is that the tree-building machinery builds a node that includes all the nodes placed on the node stack for the expansion just prior, plus one. In the above spot, the node built would include include the IDENTIFIER just before opening parentheses. This has been present for some time, and is used in internal development, but I don't recall ever documenting it anywhere, even in a blog post. So I mention it here. That is just to mention it somewhere. One of these days we'll have a proper manual!