New Feature: FAIL Statement

The FAIL construct is really pretty dead simple. Here is an example:

A.

(
   "foo"
   |
   "bar"
   |
   FAIL "Was expecting \"foo\" or \"bar\" here!"
)

At first I thought the above was just syntactic sugar, since you can, of course, already write:

B.

(
    "foo"
    |
    "bar"
    |
    {throw new ParseException("Was expecting \"foo\"or "\bar\" here!");}
)

But actually, it isn’t just syntactic sugar, and I’ll get to that further down. There are actually several reasons why the first construct above is preferable and I shall list them here in order of increasing importance. (Save the best for last.)

The FAIL construct is language-neutral

At a later point in time when JavaCC generates parsers in languages other than Java, the first example should continue to work if you want to generate your parser in Python or PHP or C# or Javascript or whatever.

By the way, a similar consideration also applies to the new Lookbehind construct. If you use that in spots where you previously had some ad hoc Java code in a "semantic lookahead", your grammar ought to work unchanged when the capability finally exists to generate code in other languages.

Granted, this is hypothetical, since the current tool only generates Java code at the moment. Moreover, it is entirely possible that the vast majority of people reading this would not care anyway. They do not anticipate any requirement at any point in the future to generate code in any other language anyway.

But, this does lead naturally to a related consideration:

Using the FAIL construct (example A rather than B above) gives JavaCC developers extra degrees of freedom to evolve the tool.

Here is what I mean. Suppose that, at some point, I add a new constructor to ParseException with an extra parameter (or more than one) so that error messages are more informative. If you use A above instead of B you would typically get the benefits of that improvement without any changes to your grammar. One can be pretty sure that code A above will automatically generate code using the newer API without you having to make any changes to your existing grammar.

Or, here is another example. At some later point, logging is likely to be far more fleshed out (it is in the back of my mind, but has not really hit the top of my TODO list yet). In that case, using construct A above, without making any changes to your grammars you could configure different levels of logging for these FAIL conditions.

Well, I should point out a general point here:

It will typically be the case that, if you have a choice between using a construct built into JavaCC and dropping down into Java code to get the same result, you will be far better off using the construct built into JavaCC.

This is precisely for the reasons given above. So, when I say that using the FAIL statement will be better than directly putting in a line of Java code in braces that instantiates and throws the exception, it is really just a specific case of a more general rule of thumb.

Well, that is also kind of theoretical insofar as it is still referring to hypotheticals. As I said, I was saving the best for last, so here is the most important reason that you should prefer A to B above.

Example B doesn’t really work!

Well it works, in a sense, but really it isn’t quite right. The semantics are simply wrong. Suppose you have the following in a grammar production somewhere, like:

Foobar : "foo" | "bar" | {throw new ParseException("expecting foo or bar here");};

And elsewhere, you use Foobar in a lookahead:

SCAN Foobar => Baz

(i.e. LOOKAHEAD(Foobar()) Baz() in the legacy syntax)

Here is the rub: the above syntactic lookahead always succeeds!

This is because the Java code block where the exception is thrown is considered by internal JavaCC machinery to be an expansion that always succeeds!

This, by the way, was one problem with using so-called JAVACODE productions for error handling. (In JavaCC grammars in the wild they were almost exclusively used for error handling.) You’d see something out there like:

void Foobar() :
{}
{
    Foo()
    |
    Bar()
    |
    ErrorHandler()
}

And elsewhere, maybe ErrorHandler would be defined as a JAVACODE production, something like:

JAVACODE void ErrorHandler() {some utterly kludgy java code here}

(Without using the JAVACODE construct, you would write the above as simply {ErrorHandler();} where the ErrorHandler is just a Java method you define normally. So the JAVACODE construct never had any real reason to exist.)

The problem is that any syntactic lookahead of the Foobar production would always succeed. Well, that’s still not as big a problem as the fact that the lookahead would be ignored in situations where it would be reasonable to expect that it would be taken into account. (See here.)

Granted, the way to deal with the above situation, I suppose, would be to have a way to mark that the Java code you reach is to be considered a success or failure in a lookahead. (That still might be a feature well worth adding, it occurs to me…​.)

In any case, perhaps needless to say, if you write the above as:

void Foobar : Foo | Bar | FAIL "Oops!"

then the lookahead, like:

SCAN Foobar => ....

would be effectively the same as:

SCAN Foo | Bar => ....

In other words, if it reaches a FAIL statement inside a lookahead routine, it treats that as a failure, NOT as a success. (Different people may perceive things differently, but I reckon that most people would agree that your parser blowing up with an exception being thrown does not correspond to their intuitive notion of what constitutes "success".)

I suppose the above says it all. I would add, as a little addendum that this feature, in particular in conjunction with negative lookahead, provides a rough and ready way of writing an assertion. You could have something like:

[
    SCAN ~Foobar => FAIL "We are supposed to have a Foobar here!"
]

Well, I close the post here. If you have any comment to make, for example, you think that all of this is a terrible idea, and you want everything to work exactly as it always did (hitting a block that throws a ParseException constitutes "success") then use the Discourse forum to say so. (Hey, everybody needs a good laugh now and then!)

Free entertainment aside, if you have any ideas about how to enhance this new feature or other new features, I would be quite interested in that. Regardless, please do sign up on the discussion forum.

Start the discussion at parsers.org