Class | HTMLTokenizer |
In: |
lib/openid/yadis/htmltokenizer.rb
|
Parent: | Object |
A class to tokenize HTML.
Example:
page = "<HTML> <HEAD> <TITLE>This is the title</TITLE> </HEAD> <!-- Here comes the <a href=\"missing.link\">blah</a> comment body --> <BODY> <H1>This is the header</H1> <P> This is the paragraph, it contains <a href=\"link.html\">links</a>, <img src=\"blah.gif\" optional alt='images are really cool'>. Ok, here is some more text and <A href=\"http://another.link.com/\" target=\"_blank\">another link</A>. </P> </body> </HTML> " toke = HTMLTokenizer.new(page) assert("<h1>" == toke.getTag("h1", "h2", "h3").to_s.downcase) assert(HTMLTag.new("<a href=\"link.html\">") == toke.getTag("IMG", "A")) assert("links" == toke.getTrimmedText) assert(toke.getTag("IMG", "A").attr_hash['optional']) assert("_blank" == toke.getTag("IMG", "A").attr_hash['target'])
page | [R] |
Get a tag from the specified set of desired tags. For example: foo = toke.getTag("h1", "h2", "h3") Will return the next header tag encountered.
Get all the text between the current position and the next tag (if specified) or a specific later tag
Like getText, but squeeze all whitespace, getting rid of leading and trailing whitespace, and squeezing multiple spaces into a single space.