All sorts of things you always wanted to know about tokenization but were afraid to ask (Part I)

Let’s consider the multiline comment in Java (and C/C++/C#, among others) which, you surely know, looks like this: /* * Comment text. */ This is an interesting construct. Paradoxically, it is extremely simple — I mean to describe in natural, human language (English or whatever) — but shockingly difficult to express in CongoCC. Or, that …

All sorts of things you always wanted to know about tokenization but were afraid to ask (Part I) Read More »

The TERMINATING_STRING setting, a new (and quite minor!) feature

Some days ago, I added a new setting. If, at the top of your grammar, you write: TERMINATING_STRING=”some string”; this means that the input you’re parsing is guaranteed to end with that string. If the file ends with that string already, then it does nothing. Otherwise, it tacks that string to the end. In actual …

The TERMINATING_STRING setting, a new (and quite minor!) feature Read More »

NFA stands for “Nondeterministic finite automaton”. (Does that make you nervous?)

The main goal of the JavaCC 21 project, soon to be renamed as CongoCC, is to develop a very capable, practical, usable parser generator tool. However, there is another overarching goal — perhaps not so much a goal, as a philosophy. That can be summed up in a single word: Demystification You see, somehow or …

NFA stands for “Nondeterministic finite automaton”. (Does that make you nervous?) Read More »

I shall tell you my plans but then I shall have to kill you, Mr. Bond Van Bruggen

There is a scene that repeats itself in various James Bond films, usually towards the end. The archvillain has captured Bond. However, instead of just killing him, he has to do it in some very creative, slow manner, using some sort of Rube Goldberg contraption. Meanwhile, he tells Bond gleefully what his plans are for …

I shall tell you my plans but then I shall have to kill you, Mr. Bond Van Bruggen Read More »