Solomon Gibbs
2012-12-04 01:22:20 UTC
I have two states; one is a specific instance of the other, more
general, state. I believe that the right way to avoid entering both
states simultaneously is to implement lookahead with k>1, but I can't
find any examples of how to do this.
The Ragle user's guide says:
In both the use of fhold and fexec the user must be cautious of
combining the resulting machine with another in such a way that the
transition on which the current position is adjusted is not combined
with a transition from the other machine.
I'm not entirely sure what this means, except perhaps "don't try to
read past the end of the current expression".
My machine looks like this:
seglen16 = any{2} >{ swab(p, &len, 2); len = len - 2; };
action check {len--}
buffer = (any when check)* %when !check @{ printf("[%d]:%d\n", len, *p); };
# JPEG Markers
mk_app0 = 0xFF 0xE0;
mk_appx = 0xFF (0xE0..0xEF);
marker = 0xFF ^0x00;
nonmarker = !marker - zlen;
# JPEG APP Segments
seg_app0_jfif = mk_app0 seglen16 "JFIF" 0x00 buffer @{ printf("jfif app0\n"); };
seg_appx_unk = mk_appx nonmarker* @{ printf("unknown app content\n"); };
seg_app = (seg_app0_jfif | seg_app1_exif | seg_appx_unk);
# Main Machine
expr = (mk_soi @lerr(bad) nonmarker* seg_app* nonmarker* mk_eoi);
I want to tokenize a JPEG header, skipping unknown segments and
handling well-known segments like JFIF. The JPEG application segment
app0 starts with 0xFFE0. If app0 contains JFIF data, the app0 marker
will be followed by a two-byte length and the string "JFIF\0". This
means I need 7 bytes of lookahead when identifying application
segments
Thanks for any pointers.
general, state. I believe that the right way to avoid entering both
states simultaneously is to implement lookahead with k>1, but I can't
find any examples of how to do this.
The Ragle user's guide says:
In both the use of fhold and fexec the user must be cautious of
combining the resulting machine with another in such a way that the
transition on which the current position is adjusted is not combined
with a transition from the other machine.
I'm not entirely sure what this means, except perhaps "don't try to
read past the end of the current expression".
My machine looks like this:
seglen16 = any{2} >{ swab(p, &len, 2); len = len - 2; };
action check {len--}
buffer = (any when check)* %when !check @{ printf("[%d]:%d\n", len, *p); };
# JPEG Markers
mk_app0 = 0xFF 0xE0;
mk_appx = 0xFF (0xE0..0xEF);
marker = 0xFF ^0x00;
nonmarker = !marker - zlen;
# JPEG APP Segments
seg_app0_jfif = mk_app0 seglen16 "JFIF" 0x00 buffer @{ printf("jfif app0\n"); };
seg_appx_unk = mk_appx nonmarker* @{ printf("unknown app content\n"); };
seg_app = (seg_app0_jfif | seg_app1_exif | seg_appx_unk);
# Main Machine
expr = (mk_soi @lerr(bad) nonmarker* seg_app* nonmarker* mk_eoi);
I want to tokenize a JPEG header, skipping unknown segments and
handling well-known segments like JFIF. The JPEG application segment
app0 starts with 0xFFE0. If app0 contains JFIF data, the app0 marker
will be followed by a two-byte length and the string "JFIF\0". This
means I need 7 bytes of lookahead when identifying application
segments
Thanks for any pointers.