Michael Lachmann
2012-11-03 22:34:30 UTC
Hi,
I'm starting to learn ragel because I'd like to write a very fast
parser to a fairly simple file structure.
I'd like to learn some of the tricks of increasing the performance of
the resulting program. So,
here are a few questions:
1. Is there a good sample program in terms of performance? I
downloaded awkemu - is that a good example?
2. Often, one can use **, or one can find a terminating character.
For example, awkemu has:
line = ( blineElements** '\n' )
I think here just * would have been enough, because there is the
terminating \n - is that right? Does it matter?
Should ** be avoided if possible?
3. Is there a disadvantage of using the lex-like scanner with
\*
pat =>
pat =>
etc., vs just specifying the full machine?
4. Is there a disadvantage of using intersection? For example, I think
the above line handling can written as:
line = something & [^\n]* '\n'
where something doesn't care about handling end-of-line. Is it just as
fast as writing expressions that also handle end-of line?
5. awkemu uses the following:
--
/* Find the last newline by searching backwards. This is where
* we will stop processing on this iteration. */
p = buf;
pe = buf + have + len - 1;
while ( *pe != '\n' && pe >= buf )
pe--;
pe += 1;
/* fprintf( stderr, "running on: %i\n", pe - p ); */
%% write exec;
/* How much is still in the buffer. */
have = data + len - pe;
if ( have > 0 )
memmove( buf, pe, have );
--
Is the first running backward to find the last eol necessary? It seems
to run part of the file through two parsers.
Thanks!
Michael
I'm starting to learn ragel because I'd like to write a very fast
parser to a fairly simple file structure.
I'd like to learn some of the tricks of increasing the performance of
the resulting program. So,
here are a few questions:
1. Is there a good sample program in terms of performance? I
downloaded awkemu - is that a good example?
2. Often, one can use **, or one can find a terminating character.
For example, awkemu has:
line = ( blineElements** '\n' )
I think here just * would have been enough, because there is the
terminating \n - is that right? Does it matter?
Should ** be avoided if possible?
3. Is there a disadvantage of using the lex-like scanner with
\*
pat =>
pat =>
etc., vs just specifying the full machine?
4. Is there a disadvantage of using intersection? For example, I think
the above line handling can written as:
line = something & [^\n]* '\n'
where something doesn't care about handling end-of-line. Is it just as
fast as writing expressions that also handle end-of line?
5. awkemu uses the following:
--
/* Find the last newline by searching backwards. This is where
* we will stop processing on this iteration. */
p = buf;
pe = buf + have + len - 1;
while ( *pe != '\n' && pe >= buf )
pe--;
pe += 1;
/* fprintf( stderr, "running on: %i\n", pe - p ); */
%% write exec;
/* How much is still in the buffer. */
have = data + len - pe;
if ( have > 0 )
memmove( buf, pe, have );
--
Is the first running backward to find the last eol necessary? It seems
to run part of the file through two parsers.
Thanks!
Michael