[ragel-users] Incremental parsing with ragel

Discussion:

Karel Sedláček

2012-11-10 16:02:08 UTC

Ideally, I'd like to be able to write a parser that fits this sort of
a prototype:

/* feed data in in blocks and let ragel call the actions to deal with it */
void parse_x(parser_t *p, void *buf, size_t buf_len);
/* now we need an OOB way to tell ragel about real EOF */
void parse_x_eof(parser_t *p);

The need here is that all of the parsing work I am doing deals with
non-blocking I/O, and it is not at all feasible to buffer "all" of the
protocol data and then feed it to ragel. Neither is it acceptible to
do something like the getchar() loop I have seen elsewhere. Is it
possible in Ragel for the parser to encounter a EOB (end of buffer)
state and serialize itself somewhere to be re-entered with more data
later?

k

Adrian Thurston

2012-11-10 17:33:20 UTC

Permalink

You came to the right place. This is what ragel was designed for.

-Adrian
------Original Message------
From: Karel Sedl??ek
Sender: ragel-users-bounces at complang.org
To: ragel-users
ReplyTo: ragel-users
Subject: [ragel-users] Incremental parsing with ragel
Sent: Nov 10, 2012 11:02 AM

Ideally, I'd like to be able to write a parser that fits this sort of
a prototype:

/* feed data in in blocks and let ragel call the actions to deal with it */
void parse_x(parser_t *p, void *buf, size_t buf_len);
/* now we need an OOB way to tell ragel about real EOF */
void parse_x_eof(parser_t *p);

The need here is that all of the parsing work I am doing deals with
non-blocking I/O, and it is not at all feasible to buffer "all" of the
protocol data and then feed it to ragel. Neither is it acceptible to
do something like the getchar() loop I have seen elsewhere. Is it
possible in Ragel for the parser to encounter a EOB (end of buffer)
state and serialize itself somewhere to be re-entered with more data
later?

k

Karel Sedláček

2012-11-10 17:59:09 UTC

Permalink

Thanks, but could you maybe give me a 10-liner that exemplifies this
kind of behavior inside another application? Let's say I have my own
event loop set up to feed Ragel buffers. Where do I (or does Ragel)
allocate and store it's state? How do I get the pair of functions
described above that are parameterized over the state and consume
incremental new input? Can I have Ragel pass me a bit of userdata as
well, where I can keep the parse state and the thing receiving the
events? Keeping it simple, how about a parser that has an in-action
that tells a C function got_null(void *userdata, size_t start_addr)
whenever it encounters a 0 byte. Prototypes below:

void got_null(void *userdata, size_t start_addr);
int parse_null(ragel_state_t *st, void *buf, size_t buf_len, void *userdata);
int parse_null_eof(rage_state_t *st, void *userdata);

Post by Adrian Thurston
You came to the right place. This is what ragel was designed for.
-Adrian
------Original Message------
From: Karel Sedl??ek
Sender: ragel-users-bounces at complang.org
To: ragel-users
ReplyTo: ragel-users
Subject: [ragel-users] Incremental parsing with ragel
Sent: Nov 10, 2012 11:02 AM
Ideally, I'd like to be able to write a parser that fits this sort of
/* feed data in in blocks and let ragel call the actions to deal with it */
void parse_x(parser_t *p, void *buf, size_t buf_len);
/* now we need an OOB way to tell ragel about real EOF */
void parse_x_eof(parser_t *p);
The need here is that all of the parsing work I am doing deals with
non-blocking I/O, and it is not at all feasible to buffer "all" of the
protocol data and then feed it to ragel. Neither is it acceptible to
do something like the getchar() loop I have seen elsewhere. Is it
possible in Ragel for the parser to encounter a EOB (end of buffer)
state and serialize itself somewhere to be re-entered with more data
later?
k
_______________________________________________
ragel-users mailing list
ragel-users at complang.org
http://www.complang.org/mailman/listinfo/ragel-users
_______________________________________________
ragel-users mailing list
ragel-users at complang.org
http://www.complang.org/mailman/listinfo/ragel-users

Michael Conrad

2012-11-11 06:09:48 UTC

Permalink

Post by Karel SedlÃ¡Äek
Thanks, but could you maybe give me a 10-liner that exemplifies this
kind of behavior inside another application? Let's say I have my own
event loop set up to feed Ragel buffers. Where do I (or does Ragel)
allocate and store it's state? How do I get the pair of functions
described above that are parameterized over the state and consume
incremental new input? Can I have Ragel pass me a bit of userdata as
well, where I can keep the parse state and the thing receiving the
events? Keeping it simple, how about a parser that has an in-action
that tells a C function got_null(void *userdata, size_t start_addr)
void got_null(void *userdata, size_t start_addr);
int parse_null(ragel_state_t *st, void *buf, size_t buf_len, void *userdata);
int parse_null_eof(rage_state_t *st, void *userdata);

Well, maybe not a 10-liner, but I think I have a single function that
might make it clear.

Ragel generates code that uses special named variables. You can either
map these to code (as I did with 'stack' and 'eof') or use local
variables (like with 'p' and 'pe'). Or, mix and match. You get to
persist them however you like.

As you can see, ragel's engine (as I'm using it) is nothing more than a
generated while loop state machine. The state is entirely stored in the
variables [p, pe, cs] and I give it an rvalue for 'eof'. I'm also using
the [stack, top] for recursion ability. You can preserve these state
variables however you like.

bool wiki_scanner_scanMore(wiki_scanner_t *scanner) {
const char
*p= scanner->curPos,
*pe= scanner->bufferLimit;
int cs= scanner->curState;

log_pos(start); //macro that prints trace info for debugging

%%{
variable stack scanner->stateStack;
variable top scanner->stateStackPos;
variable eof (scanner->eof? pe : NULL);
prepush {
if (scanner->stateStackPos >= scanner->stateStackLen) {
if (!scanner_enlarge_stateStack(scanner)) {
fprintf(stderr, "ltw_scanner: recursion limit: %d
levels deep; scan may emit incorrect results.\n", scanner->stateStackPos);
scanner->stateStackPos--;
}
}
}
write exec;
}%%

assert(p == pe);

scanner->curPos= p;
scanner->curState= cs;
return true;
}

Hope that helps.
-Mike

Karel Sedláček

2012-11-11 13:57:42 UTC

Permalink

Hey Mike, thanks for the example. I ended up playing with a simplified
version of the Concurrent example, to get the NULL-detection I
described, and examining the generated output. I came to the same
conclusion as you regarding {p, pe, cs}, but it's always good to hear
it from a second source. The result is something like this:

typedef struct {
void (*on_null)(size_t pos);
} par_cb_t;

typedef struct {
int cs;
} par_t;

void par_exec(par_t *par, void *buf, size_t buf_len, par_cb_t *cb, void *ud) {
int cs = par->cs;
void *p = buf;
void *pe = NULL == buf ? NULL : p + buf_len - 1;

%% write exec;

par->cs = cs;
}

In order to get the userdata stuff I wanted, the caller populates a
callback structure, and all the actions call their associated callback
in this structure with ud. This could pretty easily be done without
the callback structure by just passing a function pointer that takes
both a userdata and an event type, since all the events have the same
prototype, if you're doing what I'm doing and notifying the buffer
index on the beginning and ending of machines that represent a
particular type of entity.

k

On Sun, Nov 11, 2012 at 7:09 AM, Michael Conrad

Well, maybe not a 10-liner, but I think I have a single function that might
make it clear.
Ragel generates code that uses special named variables. You can either map
these to code (as I did with 'stack' and 'eof') or use local variables (like
with 'p' and 'pe'). Or, mix and match. You get to persist them however you
like.
As you can see, ragel's engine (as I'm using it) is nothing more than a
generated while loop state machine. The state is entirely stored in the
variables [p, pe, cs] and I give it an rvalue for 'eof'. I'm also using the
[stack, top] for recursion ability. You can preserve these state variables
however you like.
bool wiki_scanner_scanMore(wiki_scanner_t *scanner) {
const char
*p= scanner->curPos,
*pe= scanner->bufferLimit;
int cs= scanner->curState;
log_pos(start); //macro that prints trace info for debugging
%%{
variable stack scanner->stateStack;
variable top scanner->stateStackPos;
variable eof (scanner->eof? pe : NULL);
prepush {
if (scanner->stateStackPos >= scanner->stateStackLen) {
if (!scanner_enlarge_stateStack(scanner)) {
fprintf(stderr, "ltw_scanner: recursion limit: %d levels
deep; scan may emit incorrect results.\n", scanner->stateStackPos);
scanner->stateStackPos--;
}
}
}
write exec;
}%%
assert(p == pe);
scanner->curPos= p;
scanner->curState= cs;
return true;
}
Hope that helps.
-Mike
_______________________________________________
ragel-users mailing list
ragel-users at complang.org
http://www.complang.org/mailman/listinfo/ragel-users