About to rip out JAVACODE productions. Does anybody object to this?

You know, I bet there are some long-term users of JavaCC who aren't even aware that JavaCC has this thing called a JAVACODE production. (If you were unaware of this, you weren't missing much!)

Basically, the idea is that you can define something that is, to all intents and purposes, just a plain old java method (POJM?) but that is somehow treated as if it was a grammatical production.

The main purpose of it seems to be to support really horrible kludges. Actually, as best I can see, relatively few JavaCC grammars in the wild have any JAVACODE productions and I myself never wrote one, but when it is used, what you typically see is just something like:

JAVACODE void scan_to_matching_brace()
{
   Token tok = getToken(1);
   while (tok.kind != RBRACE) getNextToken();
}

Of course, this is not a grammatical production in any real sense. It's just a hack. Basically, when you enter code like this, you've gone completely off the rails and you're using some sort of extremely crude, bloody minded approach to try to get back on the rails -- in this case, scanning forward to the next closing brace and seeing whether you can trick your parsing machinery into carrying on from there. (Good luck with that!)

The thing is that I am trying to deal with these situations (fault-tolerant parsing, broadly speaking...) in a much more systematic way and having this JAVACODE production "feature" (using the word "feature" generously) is actually just in my way.

It's a funny thing because JAVACODE productions are actually like a big hole in the overall logical system, sort of like null in Java -- this big hole in the type system. It's actually quite amazing how much of the JavaCC code internally is based on handling the possibilities of JAVACODE productions, all this extra code for handling the screwy cases. So, basically, I think my intention is to do for JAVACODE productions in JavaCC. what Kotlin seems to have done for Java nulls, just ban them.

Well, I'm almost certainly going to do this, but I thought I'd just ask people whether this is something anybody objects to.

Oh, in other matters, I got rid of LexicalException a while ago. Now, all exceptions that the parser throws are ParseException. The way that works is that lexically invalid input creates a special kind of Token called InvalidInput and your parsing machinery doesn't know how to deal with it, of course, so it ends up throwing a ParseException just as it would with any other sort of unexpected Token type. I'm wondering whether this is a great discovery. Invalid input is actually a Token, just the same as zero is actually a number.

I didn't even ask anybody about this. I have been aware for some time that this distinction between ParseException and Lexical Exception was never worth the candle.

On getting rid of these JAVACODE productions, for some reason, I thought I should ask people. Would anybody miss them?

 

Notable Replies

  1. Hi. Although I do not have a real world grammar which uses Javacode productions, I would miss them, because I expect I would use them in rare cases where the grammar syntax is difficult to express in JavaCC. I can imagine examples where I would like to branch to another lexer/parser (flex, antlr, …) for very specific parts of the grammar.

  2. In fact, I do have a real world grammar (Uniface grammar) that uses JAVACODE productions that are used in try / catch (ParseException) { jcp(); }, to handle the parse exception and perform error recovery and resume parsing.

  3. Hi Marc, thanks for the comment. One thing to be clear about is that a JAVACODE production is simply a java method pretending to be a grammatical production. Pretty much all of the cases that I find in JavaCC grammars in the wild where they use JAVACODE, they are not really using it as a grammatical production. From what you are describing, that is your case as well.

    Or more precisely, if you have something like:

    Foo()
    |
    Bar()
    |
    Baz()

    and Baz() is declared as a JAVACODE Production, then yes you are using that feature, and you could have a problem by my having ripped out the JAVACODE productions. However, if what you are doing is something like:

    try {
    Foo() | Bar()
    }
    catch (ParseException pe) {
    Baz()
    }

    then you’re really just using Baz() as a plain java method anyway. So, in JavaCC 21, if you had written:

    JAVACODE void Baz() {
    blah blah
    }

    you would need to change that to:

    INJECT(PARSER_CLASS) {}
    {
    void Baz() {
    blah blah
    }
    }

    In the latter case, you are just “injecting” the java method Baz() into your parser code. (That could actually be PARSER_BEGIN… PARSER_END in the older syntax which still works, but that’s actually a minor detail really.) The point is that most usage out there is just people using JAVACODE to declare a java method, not really to declare a java method that is treated as a grammatical production.

    Come to think of it, that is a point that I did not make very clear in the page you are replying to… But you see, the whole thing is actually just confusing, which is already a reason to get rid of it!

  4. Hi
    Be aware that this JAVACODE production can become a node in JTB (I do not remember if this is the same in JJTree).
    So I do not want to inject a node handling code when I had just to define a grammar fragment and let the tool manage the node handling code.
    In the TCF example, in the catch the javacode production can be used to build some specific (even artificial) node to rebuild the failed part / node (like a missing “)”).
    Be aware also that a generated parser can be subclassed, to customize its behavior: in our Uniface example, the javacode production is empty in the grammar, but is overriden in the different subclasses tailored to different specific cases.

  5. Actually, I don’t know ofhand, whether JJTree builds nodes for JAVACODE productions. I’d have to look into that! The thing is that, even if that is the case, real-world usage of that is extremely rare.

    In general, you do have to understand that my goal moving forward is not 100.0% backward compatibility with the legacy tool. My goal is that most people can migrate with minimal bother. (Maybe rather than “most people”, it would be more precise to say “most typical usage scenarios”.)

    Besides, Marc, backward compatibility cannot really be one’s big selling point, you know. Because there is nothing more backward compatible with the older version than the older version itself!

    The selling point has to be new features!

Continue the discussion at parsers.org

2 more replies

Participants