Ruby处理Xml主要使用内置的库REXML,它提供了两种解析方式tree parsingstream parsing

实例XML代码,games.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
<?xml version="1.0" encoding="utf-8"?>
<collection id="test">
<game title="The Legend of Zelda: Breath of the Wild">
<company>
<name>Nintendo</name>
<location>Japanese</location>
</company>
<platform>NS</platform>
<platform>WiiU</platform>
<pubdate>2017</pubdate>
</game>
<game title="PUBG">
<company>
<name>PUBG Corporation</name>
<location>Korea</location>
</company>
<platform>PC</platform>
<platform>Xbox one</platform>
<pubdate>2017</pubdate>
</game>
</collection>
在Ruby中使用tree parsing API打开文件
1
2
3
4
5
6
7
8
9
require 'rexml/document'

include REXML

file = File.new("games.xml")

doc = Document.new(file)

puts doc

Document.new方法传入的参数必须是 DocumentIO, 或字符串,这里的Document官方文档指的是REXMLDocument但经测试生成的新的doc为空,不知是否为Ruby的Here Document

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
rb(main):001:0> require 'rexml/document'
=> true
irb(main):002:0> include REXML
=> Object
irb(main):003:0> doc = Document.new(File.new("games.xml"))
=> <UNDEFINED> ... </>

irb(main):004:0> puts doc
<?xml version='1.0' encoding='UTF-8'?>
<collection id='test'>
<game title='The Legend of Zelda: Breath of the Wild'>
<company>
<name>Nintendo</name>
<location>Japanese</location>
</company>
<platform>NS</platform>
<platform>WiiU</platform>
<pubdate>2017</pubdate>
</game>
<game title='PUBG'>
<company>
<name>PUBG Corporation</name>
<location>Korea</location>
</company>
<platform>PC</platform>
<platform>Xbox one</platform>
<pubdate>2017</pubdate>
</game>
</collection>
=> nil

irb(main):005:0> doc2 = Document.new(doc)
=> </>

irb(main):006:0> puts doc2
=> nil
访问元素和属性

方便调试,使用irb工具来访问元素

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
$ irb
>> require 'rexml/document'
=> true
>> include REXML
=> Object
>> doc = Document.new(File.new("games.xml"))
=> <UNDEFINED> ... </>

>> root = doc.root
=> <collection id='test'> ... </>

>> root.class
=> REXML::Element

>> root.attributes['id']
=> "test"

>> puts root.elements[1].elements["company"]
<company>
<name>Nintendo</name>
<location>Japanese</location>
</company>
=> nil

>> puts root.elements["game[1]/company"]
<company>
<name>Nintendo</name>
<location>Japanese</location>
</company>
=> nil

>> puts root.elements["game[@title='PUBG']"]
<game title='PUBG'>
<company>
<name>PUBG Corporation</name>
<location>Korea</location>
</company>
<platform>PC</platform>
<platform>Xbox one</platform>
<pubdate>2017</pubdate>
</game>
=> nil

>> root.each_element('//company') {|company| puts company}
<company>
<name>Nintendo</name>
<location>Japanese</location>
</company>
<company>
<name>PUBG Corporation</name>
<location>Korea</location>
</company>
=> [<company> ... </>, <company> ... </>]

首先,使用Document.new(),生成一个Document对象,Document.root 返回文档的根元素(Element对象),如果此文档没有子元素,则返回nil。Element对象提供了attributeelement的增加、删除和检测、text的添加、子element的遍历等等,详见其API文档。Element.elements 是Elements对象Elements是为Element提供子元素过滤和Xpath搜索支持,主要使用的是[]方法,[]的参数可以是子元素的索引和XPath表达式,子元素的索引和XPath的一样是从1开始。

创建和插入元素属性

创建一个空白文档,并添加元素和属性

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
irb(main):001:0> require 'rexml/document'
=> true
irb(main):002:0> include REXML
=> Object
irb(main):003:0> doc2 = Document.new
=> <UNDEFINED/>

irb(main):004:0> doc2.class
=> REXML::Document

irb(main):005:0> doc2.add_element("collection", {"id"=> "test"})
=> <collection id='test'/>

irb(main):006:0> doc2.root.add_element("game")
=> <game/>

irb(main):007:0> game = doc2.root.elements[1]
=> <game/>

irb(main):008:0> company = Element.new("company")
=> <company/>

irb(main):009:0> company.add_element("name")
=> <name/>

irb(main):010:0> company.elements["name"].text = "Riot Games"
=> "Riot Games"

irb(main):011:0> company.add_element("location")
=> <location/>

irb(main):012:0> company.elements["location"].text = "America"
=> "America"

irb(main):013:0> game.elements << company
=> <company> ... </>

irb(main):014:0> platform = Element.new("platform")
=> <platform/>

irb(main):015:0> platform.text = "Microsoft Windows"
=> "Microsoft Windows"

irb(main):016:0> game.elements << platform
=> <platform> ... </>

irb(main):018:0> game.elements << Element.new("pubdate")
=> <pubdate/>

irb(main):020:0> game.elements["pubdate"].text = "2011"
=> "2011"

irb(main):021:0> game.add_attribute("title", "League of Legends")
=> "League of Legends"

irb(main):022:0> puts doc2
<collection id='test'><game title='League of Legends'><company><name>Riot Games</name><location>America</location></company><platform>Microsoft Windows</platform></><pubdate>2011</pubdate></game></collection>
=> nil

irb(main):030:0> doc2.write(:indent => 2)
<collection id='test'>
<game title='League of Legends'>
<company>
<name>
Riot Games
</name>
<location>
America
</location>
</company>
<platform>
Microsoft Windows
</platform>
<pubdate>
2011
</pubdate>
</game>
</collection>
=> [<?xml ... ?>, <collection id='test'> ... </>]

通过Element.add_attribution可以添加元素属性,第一个参数时key,第二个参数是值;通过Element.newElement.add_element都可以添加新的元素,但前者需要使用<<方法添加到其父元素下,<<方法是 Elements.add的别名;可以通过Element.text添加或修改元素的文本值; 最后可以通过Document.write输出document

如果想在指定位置插入元素,可以使用Element.insert_beforeElement.insert_after

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
irb(main):031:0> platform2 = Element.new("platform")
=> <platform/>

irb(main):032:0> platform2.add_text("MacOS X")
=> <platform> ... </>

irb(main):034:0> doc2.root.insert_before("//pubdate", platform2)
=> <collection id='test'> ... </>

<collection id='test'>
<game title='League of Legends'>
<company>
<name>
Riot Games
</name>
<location>
America
</location>
</company>
<platform>
Microsoft Windows
</platform>
<platform>
MacOS X
</platform>
<pubdate>
2011
</pubdate>
</game>
</collection>
=> [<?xml ... ?>, <collection id='test'> ... </>]
删除元素和属性

删除元素和属性的方法分别是Element.delete_attributeElement.delete_element

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
irb(main):040:0> doc2.root.delete_attribute("id")
=> <collection> ... </>

irb(main):042:0> doc2.write(:indent => 2)
<collection>
<game title='League of Legends'>
<company>
<name>
Riot Games
</name>
<location>
America
</location>
</company>
<platform>
Microsoft Windows
</platform>
<platform>
MacOS X
</platform>
<pubdate>
2011
</pubdate>
</game>
</collection>
=> [<?xml ... ?>, <collection> ... </>]

irb(main):044:0> doc2.delete_element("//pubdate")
=> <pubdate> ... </>

irb(main):045:0> doc2.write(:indent => 2)
<collection>
<game title='League of Legends'>
<company>
<name>
Riot Games
</name>
<location>
America
</location>
</company>
<platform>
Microsoft Windows
</platform>
<platform>
MacOS X
</platform>
</game>
</collection>
=> [<?xml ... ?>, <collection> ... </>]

irb(main):046:0> doc2.root.delete_element(1)
=> <game title='League of Legends'> ... </>

irb(main):047:0> doc2.write(:indent => 2)
<collection/>
=> [<?xml ... ?>, <collection/>]

irb(main):048:0> puts doc2
<collection/>
=> nil

通过delete_attribuite来删除属性,参数为属性的key,通过delete_element来删除元素,参数和Elements[]方法一样,可以是子元素的索引(从1开始)和XPath定位。