The Dreaded “Code too large” Problem is a Thing of the Past

"It’s too big! It doesn’t fit!" The above does not refer to any particular pornographic feature film, but rather, to a longstanding problem in JavaCC: if you write a very big, complex lexical grammar, the generated XXXTokenManager would fail to compile, with the compiler reporting the error: "Code too large". Well, this has now been …

The Dreaded “Code too large” Problem is a Thing of the Past Read More »

Moving Towards a Maximally Correct Reference Java Grammar

Now that the ability to generate fault-tolerant parsers is coming along so well, I have been thinking about what to do with the Java grammar included in JavaCC21. I decided that the best thing to do was to do the incremental work to make it maximally correct. Ideally, it will serve as a reference implementation …

Moving Towards a Maximally Correct Reference Java Grammar Read More »

A Glimpse of the Promised Land: Fault-tolerant parsing

For some time, it has been a goal of JavaCC 21 to provide the ability to generate fault-tolerant parsers. I started working on the problem about a year ago. However, I had not put in a comprehensive solution until now for several reasons. Basically these: The codebase, though already significantly refactored and cleaned up, was …

A Glimpse of the Promised Land: Fault-tolerant parsing Read More »

Is this Parsing theory just Bullshit: Part Deux

A Little Parlor Game As I pointed out earlier here, in this parser space, there is a great tendency to express concepts — that, once understood, are actually quite simple — in a very abstruse, obfuscated manner. Recently, I was musing about a little conceptual experiment with some real comic potential. Imagine if basic Mathematics …

Is this Parsing theory just Bullshit: Part Deux Read More »

Context-Sensitive Tokenizing, Part Deux: Lexical States

(To get some prerequisite understanding of this topic, it might be a good idea to read this earlier blog post on context-sensitive tokenization from three months ago.) The Lay of the Land There are two quite useful ideas that have been in JavaCC from the very beginning: lookahead (particularly syntactic lookahead) lexical states Syntactic lookahead …

Context-Sensitive Tokenizing, Part Deux: Lexical States Read More »

Tree Building Redux: Nailing another Dmitry Dmitriyevich problem

Greetings, comrades! My name is Vladimir Vladimirovich Vladimirov! Hey, whassup, Vlad! As a follow-on to my blog post of a couple of days ago there were a couple of t’s that needed crossing and an ‘i’ or two that needed a dot. Let’s see… First of all, I misspoke a little bit in that post. …

Tree Building Redux: Nailing another Dmitry Dmitriyevich problem Read More »

Tastes just like home-made! (Some more tree building enhancements)

Before getting into what the minor enhancements to tree building are, I guess I should write a quick synopsis of the current state of affairs. When you have TREE_BUILDING_ENABLED set to true (this is the default in JavaCC21) the tree building machinery will build a Node if the production results in the creation of more …

Tastes just like home-made! (Some more tree building enhancements) Read More »