lookahead

Nested Lookahead Redux

(The TLDR: Nested lookahead was always broken in legacy JavaCC. This is finally fully addressed (as of 11/26/2022) in JavaCC 21. However, since this fix has a lot of potential to break existing code, for now, the fix is only in effect if you put LEGACY_GLITCHY_LOOKAHEAD=false at the top of your grammar(s). Meanwhile, you should …

Nested Lookahead Redux Read More »

Context-Sensitive Tokenizing, Part Deux: Lexical States

(To get some prerequisite understanding of this topic, it might be a good idea to read this earlier blog post on context-sensitive tokenization from three months ago.) The Lay of the Land There are two quite useful ideas that have been in JavaCC from the very beginning: lookahead (particularly syntactic lookahead) lexical states Syntactic lookahead …

Context-Sensitive Tokenizing, Part Deux: Lexical States Read More »

A Bug’s Life

Thomas Hobbes famously said that in humankind’s natural state, a man’s life tends be "nasty, brutish, and short". I suppose the natural corollary of this is that in technologically advanced societies, life is comparatively "pleasant, peaceful, and long". Of those three things, it is the last one that can be measured most objectively; we see, …

A Bug’s Life Read More »

“You can’t get there from here!” — The Problem of Context-Sensitive Tokenization

(N.B. Note added 13 June 2021: This article is useful in terms of understanding how to add token hooks to code. However, in terms of solving the specific problem outlined, the article is obsolete. See here for the updated solution.) Since I picked up my work on the JavaCC codebase at the end of 2019, …

“You can’t get there from here!” — The Problem of Context-Sensitive Tokenization Read More »

New Feature: The =>|| delimiter stands for “scan up to here”

Revisiting LOOKAHEAD Redux Not so long ago, I had a sort of eureka moment when I realized that the legacy LOOKAHEAD construct was fundamentally half-baked or broken. Not only that, but I started narrowing in on how to definitely address the issue! Well, let’s get concrete. Suppose we have a production that looks kind of like this: FooBar : …

New Feature: The =>|| delimiter stands for “scan up to here” Read More »