Tips and Tricks

All sorts of things you always wanted to know about tokenization but were afraid to ask (Part I)

Let’s consider the multiline comment in Java (and C/C++/C#, among others) which, you surely know, looks like this: /* * Comment text. */ This is an interesting construct. Paradoxically, it is extremely simple — I mean to describe in natural, human language (English or whatever) — but shockingly difficult to express in CongoCC. Or, that …

All sorts of things you always wanted to know about tokenization but were afraid to ask (Part I) Read More »

The TERMINATING_STRING setting, a new (and quite minor!) feature

Some days ago, I added a new setting. If, at the top of your grammar, you write: TERMINATING_STRING=”some string”; this means that the input you’re parsing is guaranteed to end with that string. If the file ends with that string already, then it does nothing. Otherwise, it tacks that string to the end. In actual …

The TERMINATING_STRING setting, a new (and quite minor!) feature Read More »

Tastes just like home-made! (Some more tree building enhancements)

Before getting into what the minor enhancements to tree building are, I guess I should write a quick synopsis of the current state of affairs. When you have TREE_BUILDING_ENABLED set to true (this is the default in JavaCC21) the tree building machinery will build a Node if the production results in the creation of more …

Tastes just like home-made! (Some more tree building enhancements) Read More »

“You can’t get there from here!” — The Problem of Context-Sensitive Tokenization

(N.B. Note added 13 June 2021: This article is useful in terms of understanding how to add token hooks to code. However, in terms of solving the specific problem outlined, the article is obsolete. See here for the updated solution.) Since I picked up my work on the JavaCC codebase at the end of 2019, …

“You can’t get there from here!” — The Problem of Context-Sensitive Tokenization Read More »