Characteristics
XML handling operation has two major phases.
- Parse XML document and make data(tree of elements)
- xmerl_scan (whole element tree is processed at once)
- SAX type parser
- xmerl_eventp
- erlsom_sax (developed 3rd party. not bundled with Erlang OTP)
- Access (traverse) elements within data
- XPATH (xmerl_xpath)
- XSLT (xmerl_xs)
- callback (hook) function from SAX type parser
- hand-made logic
- traverse tree
- extract tuple from element tree(list) by 'list comprehension' technique
So, methodology of XML parsing is characterized by Parsing and Access method
matrix of each method
parse method | Acceess method | samples on the Web | by | |
xmerl_scan | xmerl_xpath | Parsing Atom with Erlang | Sam Ruby | |
xmerl_scan | xmerl_xpath (with useful MACRO) | XML processing in Erlang | Torbjörn Törnkvist | |
xmerl_scan | hand-made (traverse tree by lists:foldl) | Return Erlang Data from XML | Muharem Hrnjadovic | |
xmerl_scan | hand-made (use list comprehension) |
| Hakan Mattson | |
xmerl_eventp | callback(hook) function |
| Torbjörn Törnkvist | |
erlsom_sax | callback function |
| Willem de Jong |
Operation example
example-1 : emerl_scan + xpath
If you know which elements you need exactly, and source XML file is not so huge, parse by xmerl_scan, access by xmerl_xpath.
Note: As of Erlang/OTP R13B01 supports XPATH 1.0
inspired by Torbjörn Törnkvist's code.
sample xml data ("e.xml")
<Envelope><Title>envelope title</Title>
<InnerEnv>
<IDNUM>403276</IDNUM>
<ItemName>Name String</ItemName>
<Pages>0</Pages>
</InnerEnv>
</Envelope>
code
-module(example1).
-export([doit/1]).
-include_lib("xmerl/include/xmerl.hrl").
-define(Val(X),
(fun() ->
[#xmlElement{name = N,
content = [#xmlText{value = V}|_]}] = X,
{N,V} end)())
.
doit(File) ->
{Xml, _} = xmerl_scan:File(File),
[
?Val(xmerl_xpath:string("/Envelope/Title", Xml)),
?Val(xmerl_xpath:string("//IDNUM", Xml)),
?Val(xmerl_xpath:string("//ItemName", Xml)),
?Val(xmerl_xpath:string("//Pages", Xml))
]
.
results
1> example1:go("e.xml").[
{'Title',"envelope title"},
{'IDNUM',"403276"},
{'ItemName',"Name String"},
{'Pages',"0"}]
example-2 : xmerl_scan + traverse element tree by lists:foldl
if you want to translate whole XML data into other scheme, you need to traverse whole tree by lists:foldl function.
inspired by Muharem Hrnjadovic's code
sample xml data ("e.xml")
<Envelope><Title>envelope title</Title>
<InnerEnv>
<IDNUM>403276</IDNUM>
<ItemName>Name String</ItemName>
<Pages>0</Pages>
</InnerEnv>
</Envelope>
code
-module(example2).-export([go/1]).
-include_lib("xmerl/include/xmerl.hrl").
go(File) -> {R, _} = xmerl_scan:file(File),
io:format("~p~n",[lists:reverse(traverse(R, []))]) .
traverse(R, L) when is_record(R, xmlElement) -> lists:foldl(fun traverse/2, L, R#xmlElement.content) ;
traverse(#xmlText{parents=[{'Title',_},_], value=V}, L) -> [{title, V}|L];
traverse(#xmlText{parents=[{'IDNUM',_},_,_], value=V}, L) -> [{idnum, V}|L];
traverse(#xmlText{parents=[{'ItemName',_},_,_], value=V}, L) -> [{itemname, V}|L];
traverse(#xmlText{parents=[{'Pages',_},_,_], value=V}, L) -> [{pages, V}|L];
traverse(_R, L) ->
L
.
results
2> example2:go("e.xml"). [{title,"envelope title"},
{idnum,"403276"},
{itemname,"Name String"},
{pages,"0"}]
example-3 : SAX type parsing operation
if the computation resouce is limited, whole XML data cannot be processed at once. So, SAX type parser callback functions to process just parsed element.
As SAX type parser, erlsom_sax is welknown. And Willem de Jong posted his code based on erlsom_sax.
see http://blog.tornkvist.org/blog.yaws?id=1193209275268448
here is the code based on xmerl_sax_parser library which is inspired by above code.
sample xml data ("e.xml")
<Envelope>
<Title>envelope title</Title>
<InnerEnv>
<IDNUM>403276</IDNUM>
<ItemName>Name String</ItemName>
<Pages>0</Pages>
</InnerEnv>
</Envelope>
code
-module(example3).-export([go/1]).
go(File) ->
Option = [ {event_fun, fun eventfun/3}, {event_state, {[], []}} ], case xmerl_sax_parser:file(File, Option) of
{ok,{Stack, Acc}} -> lists:reverse(Acc);
{Other} -> Other
end .
eventfun({ignorableWhitespace, _}, _, State) ->
State ;
eventfun({startElement, _, Tag, _, _}, _Location, {Stack, Acc}) -> {[Tag | Stack], Acc} ;
eventfun({characters, Value}, _Location, {[Tag | _L] = Stack, Acc}) -> {Stack, [{Tag, Value} | Acc]} ;
eventfun({endElement, _, _, _}, _Location, {[_ | L], Acc}) -> {L, Acc} ;
eventfun(_,_,State) -> State .
results
6> example3:go("e.xml").
[{"Title","envelope title"},
{"IDNUM","403276"},
{"ItemName","Name String"},
{"Pages","0"}]