This library includes both Ruby and C versions of StringScanner. Since the two classes are completely different, please read this whole page before using them.
StringScanner is a Ruby extension for fast scanning.
Since Ruby's Regexp class cannot perform sub-string matches, scanning a sub-string requires first making a new String. For example
p " I_want_to_match_this_word but can't".index( /\A\w+/, 1 )
This code will display "nil". Another way to match it is like this:
str = " word word word" while str.size > 0 do if /\A[ \t]+/ === str then str = $' elsif /\A\w+/ === str then str = $' end end
But this method has a big performace problem. $' makes a new string EVERY time. So, in the above example, all these strings are created:
" word word word" "word word word" " word word" "word word" " word" "word" ""
This results in a heavy load. If the length of 'str' is 50KB, nearly 50KB ** 2 / 5 = 50MB of memory is used.
StringScanner resolves this problem.
StringScanner has a C string and a pointer to it. When scanning, StringScanner
will only increment the pointer, so no new strings are created.
As a result, speed will increase and memory usage will decrease.
Here are two short examples of scanning routines.
The first one is easy to write but performs quite poorly. The second is still
easy to write, but is FAST thanks to the code in the StringScanner class.
First example:
ATOM = /\A\w+/ SPACE = /\A[ \t]+/ while str.size > 0 do if ATOM === str then str = $' return $& elsif SPACE === str then str = $' return $& end end
Second example:
ATOM = /\A\w+/ SPACE = /\A[ \t]+/ s = StringScanner.new( str ) while s.rest? do if tmp = s.scan( ATOM ) then return tmp elsif tmp = s.scan( SPACE ) then return tmp end end
The usage of StringScanner is simple.
First: Create a StringScanner object. Next, call the 'scan' method. It returns
the matched string and at the same time increments its internally maintained
"scan pointer". This is implemented using a pointer to char(char*).
The 'skip' method is similar to 'scan', but returns the length of the matched
string.
s = StringScanner.new( "abcdefg" ) # scan pointer is on 'a', index 0 puts s.scan( /a/ ) # returns 'a'. scan pointer is on 'b', index 1 puts s.skip( /bc/ ) # returns 2. scan pointer is on 'd', index 3
After calling 'scan' or 'skip', the previous "scan pointer" is preserved in the StringScanner object. So, str[ prev pointer..current pointer ] is the "matched string" (the string returned from 'scan') -- we can get it by calling the 'matched' method. Here's an example:
puts s.matched # returns 'bc'. scan pointer doesn't move puts s.scan( /a/ ) # returns nil. again, scan pointer doesn't move. puts s.matched # returns 'bc'.
It is also possible to put the scan pointer back to its previous position. This can be accomplished by using the 'unscan' method. However, 'unscan' can only undo one 'scan' because the StringScanner object can only preserve one "previous pointer" at a time.
puts s.scan( /de/ ) # returns 'de'. scan pointer is on 'f', index 5 s.unscan # scan pointer is on 'd', index 3 puts s.scan( /def/ ) # returns 'def'. scan pointer is on 'g', index 6
For more details, see the reference manual. But of course the source code is the most inportant documentation, I think :-)
The Ruby version of StringScanner (StringScanner_R) resembles the C version, but has these requirements:
This is troublesome, but there's no resolution to this problem.
If you only want to use the C version, simply put this in your code:
StringScanner.must_C_version
Copyright (c) 1999-2001 Minero Aoki <aamine@loveruby.net>