The schema.xml file contains all of the details about which fields your documents can contain, and how those fields should be dealt with when adding documents to the index, or when querying those fields.
schema.xml位于solr/conf/目录下,类似于数据表配置文件,定义了加入索引的数据的数据类型,主要包括type、fields和其他的一些缺省设置。
Data Types
The section allows you to define a list of declarations you wish to use in your schema, along with the underlying Solr class that should be used for that type, as well as the default options you want for fields that use that type.
types节点,这里面定义FieldType子节点,包括name,class,positionIncrementGap等一些参数。
- name:就是这个FieldType的名称。
- class:指向org.apache.solr.analysis包里面对应的class名称,用来定义这个类型的行为。 Xml代码
- positionIncrementGap="0"/>
- positionIncrementGap="0"/>
- positionIncrementGap="0"/>
- positionIncrementGap="0"/>
- ...
- ...
-
必要的时候fieldType还需要自己定义这个类型的数据在建立索引和进行查询的时候要使用的分析器analyzer,包括分词和过滤,如下: Xml代码
- ignoreCase="true"
- words="stopwords.txt"
- enablePositionIncrements="true"
- />
- generateNumberParts="1" catenateWords="1" catenateNumbers="1"
- catenateAll="0" splitOnCaseChange="1"/>
- protected="protwords.txt"/>
- expand="true"/>
- ignoreCase="true"
- words="stopwords.txt"
- enablePositionIncrements="true"
- />
- generateNumberParts="1" catenateWords="0" catenateNumbers="0"
- catenateAll="0" splitOnCaseChange="1"/>
- protected="protwords.txt"/>
-
Fields
The section is where you list the individual declarations you wish to use in your documents. Each has a name that you will use to reference it when adding documents or executing searches, and an associated type which identifies the name of the fieldtype you wish to use for this field. There are various field options that apply to a field. These can be set in the field type declarations, and can also be overridden at an individual field's declaration.
fields节点内定义具体的字段(类似数据库的字段),含有以下属性:
- name:字段名
- type:之前定义过的各种FieldType
- indexed:是否被索引
- stored:是否被存储(如果不需要存储相应字段值,尽量设为false)
- multiValued:是否有多个值(对可能存在多值的字段尽量设置为true,避免建索引时抛出错误) Xml代码
-
Copy Fields
建议建立一个拷贝字段,将所有的 全文本 字段复制到一个字段中,以便进行统一的检索:
以下是拷贝设置: Xml代码
-
以下为一个完整的copyfield定义: Java代码
- xmlns:xi="http://www.w3.org/2001/XInclude">
- Copy Fields
字段t是文章的标题,字段a是文章的摘要,字段ta是文章标题和摘要的联合。添加索引文档时,只需要传入t和a字段的内容,solr会自动索引ta字段。
Dynamic fields
One of the powerful features of Lucene is that you don't have to pre-define every field when you first create your index. Even though Solr provides strong datatyping for fields, it still preserves that flexibility using "Dynamic Fields". Using declarations, you can create field rules that Solr will use to understand what datatype should be used whenever it is given a field name that is not explicitly defined, but matches a prefix or suffix used in a dynamicField.
For example the following dynamic field declaration tells Solr that whenever it sees a field name ending in "_i" which is not an explicitly defined field, then it should dynamically create an integer field with that name... Xml代码
-
The Unique Key Field
The declaration can be used to inform Solr that there is a field in your index which should be unique for all documents. If a document is added that contains the same value for this field as an existing document, the old document will be deleted. It is not mandatory for a schema to have a uniqueKey field.
schema.xml文档注释中的信息:
1、为了改进性能,可以采取以下几种措施:
|