Module sgml
:- use_module(library(sgml)).
Predicates for parsing HTML and XML documents.
Currently, two predicates are provided:
load_html(+Source, -Es, +Options)load_xml(+Source, -Es, +Options)
These predicates parse HTML and XML documents, respectively.
Source must be one of:
a list of characters with the document contents
stream(S), specifying a stream S from which to read the contentfile(Name), where Name is a list of characters specifying a file name.
Es is unified with the abstract syntax tree of the parsed document, represented as a list of elements where each is of the form:
a list of characters, representing text
element(Name, Attrs, Children)Name, an atom, is the name of the tagAttrsis a list ofKey=Valuepairs:Keyis an atom, andValueis a list of charactersChildrenis a list of elements as specified here.
Currently, Options are ignored. In the future, more options may be provided to control parsing.
Example:
?- load_html("<html><head><title>Hello!</title></head></html>", Es, []).Yielding:
Es = [element(html,[],
[element(head,[],
[element(title,[],
["Hello!"])]),
element(body,[],[])])].library(xpath) provides convenient reasoning about parsed documents. For example, to fetch the title of the document above, we can use:
?- load_html("<html><head><title>Hello!</title></head></html>", Es, []),
xpath(Es, //title(text), T).Yielding T = "Hello!".
Use http_open/3 from library(http/http_open) to read answers from web servers via streams.