Discussion:
[ragel-users] When does Ragel mark a state as 'Final'?
Solomon Gibbs
2013-07-25 04:31:19 UTC
Permalink
Hello,

I'm not sure I understand what Ragel considers a "final" state. IIRC
the User's Guide says that states that are final before machine
simplification remain final thereafter.

When exactly is a state final, and how does one recognize this?

I'm using the state machine syntax to implement a string finder --
find ASCII strings with length greater than n, and print them. This
means implementing a maximum length matcher, as below.

Despite the fact that the dot output shows no final states, the EOF
transitions behave differently depending on which flavor of {$%@}eof
is used. I do not understand why this should be. For example, in the
"has_string" state below, using %eof instead of @eof causes both the
"commit_nonstring_eof" and "commit_string_eof" actions to be called
from one of the generated/synthetic states terminating the matching
state.

(State graphics for this machine are are available via
http://stackoverflow.com/questions/17848941/ragel-final-states-and-eof)

action commit_string { }

action commit_string_eof { }

action commit_nonstring_eof { }

action set_mark { }

action reset {
/* Force the machine back into state 1. This happens after
* an incomplete match when some graphical characters are
* consumed, but not enough for use to keep the string. */
fgoto start;
}

# Matching classes union to 0x00 .. 0xFF
graphic = (0x09 | 0x20 .. 0x7E);
non_graphic = (0x00 .. 0x08 | 0x0A .. 0x1F | 0x7F .. 0xFF);

collector = (

start: (
# Set the mark if we have a graphic character,
# otherwise go to non_graphic state and consume input
graphic @set_mark -> has_glyph |
non_graphic -> no_glyph
) $eof(commit_nonstring_eof),

no_glyph: (
# Consume input until a graphic character is encountered
non_graphic -> no_glyph |
graphic @set_mark -> has_glyph
) $eof(commit_nonstring_eof),

has_glyph: (
# We already matched one graphic character to get here
# from start or no_glyph. Try to match N-1 before allowing
# the string to be committed. If we don't get to N-1,
# drop back to the start state
graphic{3} $lerr(reset) -> has_string
) @eof(commit_nonstring_eof),

has_string: (
# Already consumed our quota of N graphic characters;
# consume input until we run out of graphic characters
# then reset the machine. All exiting edges should commit
# the string. We diferentiate between exiting on a non-graphic
# input that shouldn't be added to the string and exiting
# on a (graphic) EOF that should be added.
graphic* non_graphic -> start
) %from(commit_string) @eof(commit_string_eof)
#) %from(commit_string) %eof(commit_string_eof) // bad

); #$debug;

main := (collector)+;Hello,

I'm not sure I understand what Ragel considers a "final" state. IIRC
the User's Guide says that states that are final before machine
simplification remain final thereafter.

When exactly is a state final, and how does one recognize this?

I'm using the state machine syntax to implement a string finder --
find ASCII strings with length greater than n, and print them. This
means implementing a maximum length matcher, as below.

Despite the fact that the dot output shows no final states, the EOF
transitions behave differently depending on which flavor of {$%@}eof
is used. I do not understand why this should be. For example, in the
"has_string" state below, using %eof instead of @eof causes both the
"commit_nonstring_eof" and "commit_string_eof" actions to be called
from one of the generated/synthetic states terminating the matching
state.

(State graphics for this machine are are available via
http://stackoverflow.com/questions/17848941/ragel-final-states-and-eof)

action commit_string { }

action commit_string_eof { }

action commit_nonstring_eof { }

action set_mark { }

action reset {
/* Force the machine back into state 1. This happens after
* an incomplete match when some graphical characters are
* consumed, but not enough for use to keep the string. */
fgoto start;
}

# Matching classes union to 0x00 .. 0xFF
graphic = (0x09 | 0x20 .. 0x7E);
non_graphic = (0x00 .. 0x08 | 0x0A .. 0x1F | 0x7F .. 0xFF);

collector = (

start: (
# Set the mark if we have a graphic character,
# otherwise go to non_graphic state and consume input
graphic @set_mark -> has_glyph |
non_graphic -> no_glyph
) $eof(commit_nonstring_eof),

no_glyph: (
# Consume input until a graphic character is encountered
non_graphic -> no_glyph |
graphic @set_mark -> has_glyph
) $eof(commit_nonstring_eof),

has_glyph: (
# We already matched one graphic character to get here
# from start or no_glyph. Try to match N-1 before allowing
# the string to be committed. If we don't get to N-1,
# drop back to the start state
graphic{3} $lerr(reset) -> has_string
) @eof(commit_nonstring_eof),

has_string: (
# Already consumed our quota of N graphic characters;
# consume input until we run out of graphic characters
# then reset the machine. All exiting edges should commit
# the string. We diferentiate between exiting on a non-graphic
# input that shouldn't be added to the string and exiting
# on a (graphic) EOF that should be added.
graphic* non_graphic -> start
) %from(commit_string) @eof(commit_string_eof)
#) %from(commit_string) %eof(commit_string_eof) // bad

); #$debug;

main := (collector)+;
Adrian Thurston
2013-11-24 15:29:59 UTC
Permalink
Every sub-expression has a set of final states. An FSM operation may add
or remove final-state status as it builds new machines. So your main
machine may not have any final states, but they were present as the
machine was built up, and so you see variations in how the eof embedding
operators affect the result.

Arian
Post by Solomon Gibbs
Hello,
I'm not sure I understand what Ragel considers a "final" state. IIRC
the User's Guide says that states that are final before machine
simplification remain final thereafter.
When exactly is a state final, and how does one recognize this?
I'm using the state machine syntax to implement a string finder --
find ASCII strings with length greater than n, and print them. This
means implementing a maximum length matcher, as below.
Despite the fact that the dot output shows no final states, the EOF
is used. I do not understand why this should be. For example, in the
"commit_nonstring_eof" and "commit_string_eof" actions to be called
from one of the generated/synthetic states terminating the matching
state.
(State graphics for this machine are are available via
http://stackoverflow.com/questions/17848941/ragel-final-states-and-eof)
action commit_string { }
action commit_string_eof { }
action commit_nonstring_eof { }
action set_mark { }
action reset {
/* Force the machine back into state 1. This happens after
* an incomplete match when some graphical characters are
* consumed, but not enough for use to keep the string. */
fgoto start;
}
# Matching classes union to 0x00 .. 0xFF
graphic = (0x09 | 0x20 .. 0x7E);
non_graphic = (0x00 .. 0x08 | 0x0A .. 0x1F | 0x7F .. 0xFF);
collector = (
start: (
# Set the mark if we have a graphic character,
# otherwise go to non_graphic state and consume input
non_graphic -> no_glyph
) $eof(commit_nonstring_eof),
no_glyph: (
# Consume input until a graphic character is encountered
non_graphic -> no_glyph |
) $eof(commit_nonstring_eof),
has_glyph: (
# We already matched one graphic character to get here
# from start or no_glyph. Try to match N-1 before allowing
# the string to be committed. If we don't get to N-1,
# drop back to the start state
graphic{3} $lerr(reset) -> has_string
has_string: (
# Already consumed our quota of N graphic characters;
# consume input until we run out of graphic characters
# then reset the machine. All exiting edges should commit
# the string. We diferentiate between exiting on a non-graphic
# input that shouldn't be added to the string and exiting
# on a (graphic) EOF that should be added.
graphic* non_graphic -> start
#) %from(commit_string) %eof(commit_string_eof) // bad
); #$debug;
main := (collector)+;Hello,
I'm not sure I understand what Ragel considers a "final" state. IIRC
the User's Guide says that states that are final before machine
simplification remain final thereafter.
When exactly is a state final, and how does one recognize this?
I'm using the state machine syntax to implement a string finder --
find ASCII strings with length greater than n, and print them. This
means implementing a maximum length matcher, as below.
Despite the fact that the dot output shows no final states, the EOF
is used. I do not understand why this should be. For example, in the
"commit_nonstring_eof" and "commit_string_eof" actions to be called
from one of the generated/synthetic states terminating the matching
state.
(State graphics for this machine are are available via
http://stackoverflow.com/questions/17848941/ragel-final-states-and-eof)
action commit_string { }
action commit_string_eof { }
action commit_nonstring_eof { }
action set_mark { }
action reset {
/* Force the machine back into state 1. This happens after
* an incomplete match when some graphical characters are
* consumed, but not enough for use to keep the string. */
fgoto start;
}
# Matching classes union to 0x00 .. 0xFF
graphic = (0x09 | 0x20 .. 0x7E);
non_graphic = (0x00 .. 0x08 | 0x0A .. 0x1F | 0x7F .. 0xFF);
collector = (
start: (
# Set the mark if we have a graphic character,
# otherwise go to non_graphic state and consume input
non_graphic -> no_glyph
) $eof(commit_nonstring_eof),
no_glyph: (
# Consume input until a graphic character is encountered
non_graphic -> no_glyph |
) $eof(commit_nonstring_eof),
has_glyph: (
# We already matched one graphic character to get here
# from start or no_glyph. Try to match N-1 before allowing
# the string to be committed. If we don't get to N-1,
# drop back to the start state
graphic{3} $lerr(reset) -> has_string
has_string: (
# Already consumed our quota of N graphic characters;
# consume input until we run out of graphic characters
# then reset the machine. All exiting edges should commit
# the string. We diferentiate between exiting on a non-graphic
# input that shouldn't be added to the string and exiting
# on a (graphic) EOF that should be added.
graphic* non_graphic -> start
#) %from(commit_string) %eof(commit_string_eof) // bad
); #$debug;
main := (collector)+;
_______________________________________________
ragel-users mailing list
ragel-users at complang.org
http://www.complang.org/mailman/listinfo/r
Loading...