`
pz9042
  • 浏览: 29881 次
最近访客 更多访客>>
社区版块
存档分类
最新评论

基于sphinx的中文搜索

阅读更多
关于中文搜索,如果大家想用sphinx来实现,还是算了,因为sphinx本身并不支持中文搜索,虽然coreseek公司有针对sphinx提供补丁文件,但目前为止最新的版本知针对0.9.8,不过,建议不要这样作,之前我也尝试打补丁,但事实证明不可行,因为sphinx低级的版本不支持关于中文配置的选项。coreseek其实是sphinx的升级版,说白了,就是sphinx 加上 mmseg,mmseg就是中文分词的工具,coreseek就是使得sphinx也能对中文进行索引。接下来我们就来配置coreseek,



coreseek3.2.14 下载地址  http://www.coreseek.cn/uploads/csft/3.2/coreseek-3.2.14.tar.gz


#tar -zxvf coreseek-3.2.14.tar.gz   //解压


#cd coreseek-3.2.14       // 进入源文件


//安装mmseg start
#cd mmseg-3.2.14       //  进入mmseg,先安装mmseg  (中文分词插件)
Q1   ./configure --prefix=/usr/local/coreseek   //配置   报错 config.status: error: cannot find input file: src/Makefile.in  
//解决方法  依次执行
#yum -y install autoconf automake libtool 
#aclocal
#libtoolize --force
#automake --add-missing
#autoconf
#autoheader
#./configure --prefix=/usr/local/coreseek
#make
#make install
//mmseg安装 end

//安装coreseek  start
#cd /usr/local/src/coreseek-3.2.14/csft-3.2.14/
#./configure --prefix=/usr/local/coreseek --with-mmseg-libs=/usr/local/mmseg/lib --with-mmseg-includes=/usr/local/mmseg/include/mmseg
#make
#make install
//安装coreseek  end


接下来就是配置配置文件了,配置文件的选项去看sphinx的官方文档

#cd /usr/local/coreseek
#cp sphinx.conf.dist sphinx.conf
以下为我的sphinx配置文件

#
# Sphinx configuration file sample
#
# WARNING! While this sample file mentions all available options,
# it contains (very) short helper descriptions only. Please refer to
# doc/sphinx.html for details.
#

#############################################################################
## data source definition
#############################################################################

source product
{
# data source type. mandatory, no default value
# known types are mysql, pgsql, mssql, xmlpipe, xmlpipe2, odbc
type = mysql

#####################################################################
## SQL settings (for 'mysql' and 'pgsql' types)
#####################################################################

# some straightforward parameters for SQL source types
sql_host = localhost
sql_user = root
sql_pass =********
sql_db = sphinx
sql_port = 3306 # optional, default is 3306

# UNIX socket name
# optional, default is empty (reuse client library defaults)
# usually '/var/lib/mysql/mysql.sock' on Linux
# usually '/tmp/mysql.sock' on FreeBSD
#BSD   if linux /var/lib/mysql/mysql.sock
sql_sock = /tmp/mysql.sock


# MySQL specific client connection flags
# optional, default is 0
#
mysql_connect_flags = 32 # enable compression

# MySQL specific SSL certificate settings
# optional, defaults are empty
#
# mysql_ssl_cert = /etc/ssl/client-cert.pem
# mysql_ssl_key = /etc/ssl/client-key.pem
# mysql_ssl_ca = /etc/ssl/cacert.pem

# MS SQL specific Windows authentication mode flag
# MUST be in sync with charset_type index-level setting
# optional, default is 0
#
# mssql_winauth = 1 # use currently logged on user credentials


# MS SQL specific Unicode indexing flag
# optional, default is 0 (request SBCS data)
#
# mssql_unicode = 1 # request Unicode data from server


# ODBC specific DSN (data source name)
# mandatory for odbc source type, no default value
#
# odbc_dsn = DBQ=C:\data;DefaultDir=C:\data;Driver={Microsoft Text Driver (*.txt; *.csv)};
# sql_query = SELECT id, data FROM documents.csv


# pre-query, executed before the main fetch query
# multi-value, optional, default is empty list of queries
#
sql_query_pre = SET NAMES utf8
sql_query_pre = SET SESSION query_cache_type=OFF


# main document fetch query
# mandatory, integer document ID field MUST be the first selected column
#and product.ID=product_info.product_id
#sql_query = \
#SELECT product.ID,Purchase, Product_Name,Member_price,content_info\
#FROM product, product_info where product.ID>=$start and product.ID<=$end and product.ID=product_info.product_id
   #搭建实时索引
sql_query_pre = REPLACE INTO sph_counter SELECT 1, MAX(pid) FROM pre_cosmetics_product

sql_query = \
SELECT pid,cname\
FROM pre_cosmetics_product where pid>=$start and pid<=$end;
# joined/payload field fetch query
# joined fields let you avoid (slow) JOIN and GROUP_CONCAT
# payload fields let you attach custom per-keyword values (eg. for ranking)
#
# syntax is FIELD-NAME 'from'  ( 'query' | 'payload-query' ); QUERY
# joined field QUERY should return 2 columns (docid, text)
# payload field QUERY should return 3 columns (docid, keyword, weight)
#
# REQUIRES that query results are in ascending document ID order!
# multi-value, optional, default is empty list of queries
#
# sql_joined_field = tags from query; SELECT docid, CONCAT('tag',tagid) FROM tags ORDER BY docid ASC
# sql_joined_field = wtags from payload-query; SELECT docid, tag, tagweight FROM tags ORDER BY docid ASC


# file based field declaration
#
# content of this field is treated as a file name
# and the file gets loaded and indexed in place of a field
#
# max file size is limited by max_file_field_buffer indexer setting
# file IO errors are non-fatal and get reported as warnings
#
# sql_file_field = content_file_path


# range query setup, query that must return min and max ID values
# optional, default is empty
#
# sql_query will need to reference $start and $end boundaries
# if using ranged query:
#
# sql_query = \
# SELECT doc.id, doc.id AS group, doc.title, doc.data \
# FROM documents doc \
# WHERE id>=$start AND id<=$end
#开启分区查询,有助于避免myisam死锁问题
sql_query_range = SELECT MIN(pid),MAX(pid) FROM pre_cosmetics_product


# range query step
# optional, default is 1024
#
# sql_range_step = 1000


# unsigned integer attribute declaration
# multi-value (an arbitrary number of attributes is allowed), optional
# optional bit size can be specified, default is 32
#
# sql_attr_uint = author_id
# sql_attr_uint = forum_id:9 # 9 bits for forum_id
#sql_attr_uint = Purchase

# boolean attribute declaration
# multi-value (an arbitrary number of attributes is allowed), optional
# equivalent to sql_attr_uint with 1-bit size
#
# sql_attr_bool = is_deleted


# bigint attribute declaration
# multi-value (an arbitrary number of attributes is allowed), optional
# declares a signed (unlike uint!) 64-bit attribute
#
# sql_attr_bigint = my_bigint_id


# UNIX timestamp attribute declaration
# multi-value (an arbitrary number of attributes is allowed), optional
# similar to integer, but can also be used in date functions
#
# sql_attr_timestamp = posted_ts
# sql_attr_timestamp = last_edited_ts
#sql_attr_timestamp = date_added

# string ordinal attribute declaration
# multi-value (an arbitrary number of attributes is allowed), optional
# sorts strings (bytewise), and stores their indexes in the sorted list
# sorting by this attr is equivalent to sorting by the original strings
#
# sql_attr_str2ordinal = author_name


# floating point attribute declaration
# multi-value (an arbitrary number of attributes is allowed), optional
# values are stored in single precision, 32-bit IEEE 754 format
#
# sql_attr_float = lat_radians
#sql_attr_float = Member_price


# multi-valued attribute (MVA) attribute declaration
# multi-value (an arbitrary number of attributes is allowed), optional
# MVA values are variable length lists of unsigned 32-bit integers
#
# syntax is ATTR-TYPE ATTR-NAME 'from' SOURCE-TYPE [;QUERY] [;RANGE-QUERY]
# ATTR-TYPE is 'uint' or 'timestamp'
# SOURCE-TYPE is 'field', 'query', or 'ranged-query'
# QUERY is SQL query used to fetch all ( docid, attrvalue ) pairs
# RANGE-QUERY is SQL query used to fetch min and max ID values, similar to 'sql_query_range'
#
# sql_attr_multi = uint tag from query; SELECT id, tag FROM tags
# sql_attr_multi = uint tag from ranged-query; \
# SELECT id, tag FROM tags WHERE id>=$start AND id<=$end; \
# SELECT MIN(id), MAX(id) FROM tags


# string attribute declaration
# multi-value (an arbitrary number of these is allowed), optional
# lets you store and retrieve strings
#
# sql_attr_string = stitle


# wordcount attribute declaration
# multi-value (an arbitrary number of these is allowed), optional
# lets you count the words at indexing time
#
# sql_attr_str2wordcount = stitle


# combined field plus attribute declaration (from a single column)
# stores column as an attribute, but also indexes it as a full-text field
#
# sql_field_string = author
# sql_field_str2wordcount = title


# post-query, executed on sql_query completion
# optional, default is empty
#
# sql_query_post =


# post-index-query, executed on successful indexing completion
# optional, default is empty
# $maxid expands to max document ID actually fetched from DB
#
# sql_query_post_index = REPLACE INTO counters ( id, val ) \
# VALUES ( 'max_indexed_id', $maxid )


# ranged query throttling, in milliseconds
# optional, default is 0 which means no delay
# enforces given delay before each query step
sql_ranged_throttle = 0

# document info query, ONLY for CLI search (ie. testing and debugging)
# optional, default is empty
# must contain $id macro and must fetch the document by that id
sql_query_info = SELECT * FROM pre_cosmetics WHERE pid=$id

# kill-list query, fetches the document IDs for kill-list
# k-list will suppress matches from preceding indexes in the same query
# optional, default is empty
#
# sql_query_killlist = SELECT id FROM documents WHERE edited>=@last_reindex


# columns to unpack on indexer side when indexing
# multi-value, optional, default is empty list
#
# unpack_zlib = zlib_column
# unpack_mysqlcompress = compressed_column
# unpack_mysqlcompress = compressed_column_2


# maximum unpacked length allowed in MySQL COMPRESS() unpacker
# optional, default is 16M
#
# unpack_mysqlcompress_maxsize = 16M


#####################################################################
## xmlpipe2 settings
#####################################################################

# type = xmlpipe

# shell command to invoke xmlpipe stream producer
# mandatory
#
# xmlpipe_command = cat /usr/local/sphinx/var/test.xml

# xmlpipe2 field declaration
# multi-value, optional, default is empty
#
# xmlpipe_field = subject
# xmlpipe_field = content


# xmlpipe2 attribute declaration
# multi-value, optional, default is empty
# all xmlpipe_attr_XXX options are fully similar to sql_attr_XXX
#
# xmlpipe_attr_timestamp = published
# xmlpipe_attr_uint = author_id


# perform UTF-8 validation, and filter out incorrect codes
# avoids XML parser choking on non-UTF-8 documents
# optional, default is 0
#
# xmlpipe_fixup_utf8 = 1
}


# inherited source example
#
# all the parameters are copied from the parent source,
# and may then be overridden in this source definition
#source src1throttled : src1
#{
# sql_ranged_throttle = 100
#W}

#######################################
#product的增量数据原
########################################

source delta :product
{
sql_host = localhost
sql_user = root
sql_pass =qwer1111
sql_db = sphinx
sql_port = 3306 # optional, default is 3306
sql_sock = /tmp/mysql.sock
mysql_connect_flags = 32 # enable compression
sql_query_pre = SET NAMES utf8
sql_query_pre = SET SESSION query_cache_type=OFF
sql_query_pre =

sql_query = \
SELECT pid,cname\
FROM pre_cosmetics_product  WHERE pid>=$start and pid<=$end



sql_query_range = SELECT (SELECT max_doc_id FROM sph_counter WHERE counter_id=1),MAX(pid) FROM product
#sql_attr_uint = Purchase
#sql_attr_float = Member_price

sql_ranged_throttle = 0

sql_query_info = SELECT * FROM pre_cosmetics_product WHERE pid=$id
}


#############################################################################
## index definition
#############################################################################

# local index example
#
# this is an index which is stored locally in the filesystem
#
# all indexing-time options (such as morphology and charsets)
# are configured per local index
index product
{
# index type
# optional, default is 'plain'
# known values are 'plain', 'distributed', and 'rt' (see samples below)
# type = distributed

# document source(s) to index
# multi-value, mandatory
# document IDs must be globally unique across all sources
source = product

# index files path and file name, without extension
# mandatory, path must be writable, extensions will be auto-appended
path = /usr/local/coreseek/var/data/product

# document attribute values (docinfo) storage mode
# optional, default is 'extern'
# known values are 'none', 'extern' and 'inline'
docinfo = extern

# memory locking for cached data (.spa and .spi), to prevent swapping
# optional, default is 0 (do not mlock)
# requires searchd to be run from root
mlock = 0

# a list of morphology preprocessors to apply
# optional, default is empty
#
# builtin preprocessors are 'none', 'stem_en', 'stem_ru', 'stem_enru',
# 'soundex', and 'metaphone'; additional preprocessors available from
# libstemmer are 'libstemmer_XXX', where XXX is algorithm code
# (see libstemmer_c/libstemmer/modules.txt)
#
# morphology = stem_en, stem_ru, soundex
# morphology = libstemmer_german
# morphology = libstemmer_sv
morphology = none

# minimum word length at which to enable stemming
# optional, default is 1 (stem everything)
#
# min_stemming_len = 1


# stopword files list (space separated)
# optional, default is empty
# contents are plain text, charset_table and stemming are both applied
#
# stopwords = /usr/local/coreseek/var/data/stopwords.txt


# wordforms file, in "mapfrom > mapto" plain text format
# optional, default is empty
#
# wordforms = /usr/local/coreseek/var/data/wordforms.txt


# tokenizing exceptions file
# optional, default is empty
#
# plain text, case sensitive, space insensitive in map-from part
# one "Map Several Words => ToASingleOne" entry per line
#
# exceptions = /usr/local/coreseek/var/data/exceptions.txt


# minimum indexed word length
# default is 1 (index everything)
min_word_len = 1

# charset encoding type
# optional, default is 'sbcs'
# known types are 'sbcs' (Single Byte CharSet) and 'utf-8'
charset_type = zh_cn.utf-8

# charset definition and case folding rules "table"
# optional, default value depends on charset_type
#
# defaults are configured to include English and Russian characters only
# you need to change the table to include additional ones
# this behavior MAY change in future versions
#
# 'sbcs' default value is
# charset_table = 0..9, A..Z->a..z, _, a..z, U+A8->U+B8, U+B8, U+C0..U+DF->U+E0..U+FF, U+E0..U+FF
#
# 'utf-8' default value is
# charset_table = 0..9, A..Z->a..z, _, a..z, U+410..U+42F->U+430..U+44F, U+430..U+44F


# ignored characters list
# optional, default value is empty
#
# ignore_chars = U+00AD


# minimum word prefix length to index
# optional, default is 0 (do not index prefixes)
#
# min_prefix_len = 0


# minimum word infix length to index
# optional, default is 0 (do not index infixes)
#
# min_infix_len = 0


# list of fields to limit prefix/infix indexing to
# optional, default value is empty (index all fields in prefix/infix mode)
#
# prefix_fields = filename
# infix_fields = url, domain


# enable star-syntax (wildcards) when searching prefix/infix indexes
# search-time only, does not affect indexing, can be 0 or 1
# optional, default is 0 (do not use wildcard syntax)
#
# enable_star = 1


# expand keywords with exact forms and/or stars when searching fit indexes
# search-time only, does not affect indexing, can be 0 or 1
# optional, default is 0 (do not expand keywords)
#
# expand_keywords = 1


# n-gram length to index, for CJK indexing
# only supports 0 and 1 for now, other lengths to be implemented
# optional, default is 0 (disable n-grams)
#
# ngram_len = 1


# n-gram characters list, for CJK indexing
# optional, default is empty
#
# ngram_chars = U+3000..U+2FA1F


# phrase boundary characters list
# optional, default is empty
#
# phrase_boundary = ., ?, !, U+2026 # horizontal ellipsis


# phrase boundary word position increment
# optional, default is 0
#
# phrase_boundary_step = 100


# blended characters list
# blended chars are indexed both as separators and valid characters
# for instance, AT&T will results in 3 tokens ("at", "t", and "at&t")
# optional, default is empty
#
# blend_chars = +, &, U+23


# whether to strip HTML tags from incoming documents
# known values are 0 (do not strip) and 1 (do strip)
# optional, default is 0
html_strip = 0

# what HTML attributes to index if stripping HTML
# optional, default is empty (do not index anything)
#
# html_index_attrs = img=alt,title; a=title;


# what HTML elements contents to strip
# optional, default is empty (do not strip element contents)
#
# html_remove_elements = style, script


# whether to preopen index data files on startup
# optional, default is 0 (do not preopen), searchd-only
#
# preopen = 1


# whether to keep dictionary (.spi) on disk, or cache it in RAM
# optional, default is 0 (cache in RAM), searchd-only
#
# ondisk_dict = 1


# whether to enable in-place inversion (2x less disk, 90-95% speed)
# optional, default is 0 (use separate temporary files), indexer-only
#
# inplace_enable = 1


# in-place fine-tuning options
# optional, defaults are listed below
#
# inplace_hit_gap = 0 # preallocated hitlist gap size
# inplace_docinfo_gap = 0 # preallocated docinfo gap size
# inplace_reloc_factor = 0.1 # relocation buffer size within arena
# inplace_write_factor = 0.1 # write buffer size within arena


# whether to index original keywords along with stemmed versions
# enables "=exactform" operator to work
# optional, default is 0
#
# index_exact_words = 1


# position increment on overshort (less that min_word_len) words
# optional, allowed values are 0 and 1, default is 1
#
# overshort_step = 1


# position increment on stopword
# optional, allowed values are 0 and 1, default is 1
#
# stopword_step = 1


# hitless words list
# positions for these keywords will not be stored in the index
# optional, allowed values are 'all', or a list file name
#
# hitless_words = all
# hitless_words = hitless.txt
ngram_len = 0
charset_dictpath  = /usr/local/mmseg/etc/
}

index delta : product {
source = delta
path = /usr/local/coreseek/var/data/delta
docinfo = extern
mlock = 0
morphology = none
min_word_len = 1
charset_type = zh_cn.utf-8
html_strip = 0
  ngram_len = 0
charset_dictpath  = /usr/local/mmseg/etc/

}

# inherited index example
#
# all the parameters are copied from the parent index,
# and may then be overridden in this index definition
#index test1stemmed : test1
#{
#path = /usr/local/coreseek/var/data/test1stemmed
#morphology = stem_en
#}


# distributed index example
#
# this is a virtual index which can NOT be directly indexed,
# and only contains references to other local and/or remote indexes
#index dist1
#{
# 'distributed' index type MUST be specified
#type = distributed

# local index to be searched
# there can be many local indexes configured
#local = test1
#Elocal = test1stemmed

# remote agent
# multiple remote agents may be specified
# syntax for TCP connections is 'hostname:port:index1,[index2[,...]]'
# syntax for local UNIX connections is '/path/to/socket:index1,[index2[,...]]'
#agent = localhost:9313:remote1
#agent = localhost:9314:remote2,remote3
# agent = /var/run/searchd.sock:remote4

# blackhole remote agent, for debugging/testing
# network errors and search results will be ignored
#
# agent_blackhole = testbox:9312:testindex1,testindex2


# remote agent connection timeout, milliseconds
# optional, default is 1000 ms, ie. 1 sec
#agent_connect_timeout = 1000

# remote agent query timeout, milliseconds
# optional, default is 3000 ms, ie. 3 sec
#agent_query_timeout = 3000
#}


# realtime index example
#
# you can run INSERT, REPLACE, and DELETE on this index on the fly
# using MySQL protocol (see 'listen' directive below)
#index rt
#{
# 'rt' index type must be specified to use RT index
#type = rt

# index files path and file name, without extension
# mandatory, path must be writable, extensions will be auto-appended
#path = /usr/local/coreseek/var/data/rt

# RAM chunk size limit
# RT index will keep at most this much data in RAM, then flush to disk
# optional, default is 32M
#
# rt_mem_limit = 512M

# full-text field declaration
# multi-value, mandatory
#rt_field = title
#rt_field = content

# unsigned integer attribute declaration
# multi-value (an arbitrary number of attributes is allowed), optional
# declares an unsigned 32-bit attribute
#rt_attr_uint = gid

# RT indexes currently support the following attribute types:
# uint, bigint, float, timestamp, string
#
# rt_attr_bigint = guid
# rt_attr_float = gpa
# rt_attr_timestamp = ts_added
# rt_attr_string = author
#}

#############################################################################
## indexer settings
#############################################################################

indexer
{
# memory limit, in bytes, kiloytes (16384K) or megabytes (256M)
# optional, default is 32M, max is 2047M, recommended is 256M to 1024M
mem_limit = 256M

# maximum IO calls per second (for I/O throttling)
# optional, default is 0 (unlimited)
#
# max_iops = 40


# maximum IO call size, bytes (for I/O throttling)
# optional, default is 0 (unlimited)
#
max_iosize = 1048576


# maximum xmlpipe2 field length, bytes
# optional, default is 2M
#
# max_xmlpipe2_field = 4M


# write buffer size, bytes
# several (currently up to 4) buffers will be allocated
# write buffers are allocated in addition to mem_limit
# optional, default is 1M
#
# write_buffer = 1M


# maximum file field adaptive buffer size
# optional, default is 8M, minimum is 1M
#
# max_file_field_buffer = 32M
}

#############################################################################
## searchd settings
#############################################################################

searchd
{
# [hostname:]port[:protocol], or /unix/socket/path to listen on
# known protocols are 'sphinx' (SphinxAPI) and 'mysql41' (SphinxQL)
#
# multi-value, multiple listen points are allowed
# optional, defaults are 9312:sphinx and 9306:mysql41, as below
#
# listen = 127.0.0.1
# listen = 192.168.0.1:9312
# listen = 9312
# listen = /var/run/searchd.sock
listen = 9312
listen = 9306:mysql41

# log file, searchd run info is logged here
# optional, default is 'searchd.log'
log = /usr/local/coreseek/var/log/searchd.log

# query log file, all search queries are logged here
# optional, default is empty (do not log queries)
query_log = /usr/local/coreseek/var/log/query.log

# client read timeout, seconds
# optional, default is 5
read_timeout = 5

# request timeout, seconds
# optional, default is 5 minutes
client_timeout = 300

# maximum amount of children to fork (concurrent searches to run)
# optional, default is 0 (unlimited)
max_children = 10

# PID file, searchd process ID file name
# mandatory
pid_file = /usr/local/coreseek/var/log/searchd.pid

# max amount of matches the daemon ever keeps in RAM, per-index
# WARNING, THERE'S ALSO PER-QUERY LIMIT, SEE SetLimits() API CALL
# default is 1000 (just like Google)
max_matches = 1000

# seamless rotate, prevents rotate stalls if precaching huge datasets
# optional, default is 1
seamless_rotate = 1

# whether to forcibly preopen all indexes on startup
# optional, default is 0 (do not preopen)
preopen_indexes = 0

# whether to unlink .old index copies on succesful rotation.
# optional, default is 1 (do unlink)
unlink_old = 1

# attribute updates periodic flush timeout, seconds
# updates will be automatically dumped to disk this frequently
# optional, default is 0 (disable periodic flush)
#
# attr_flush_period = 900


# instance-wide ondisk_dict defaults (per-index value take precedence)
# optional, default is 0 (precache all dictionaries in RAM)
#
# ondisk_dict_default = 1


# MVA updates pool size
# shared between all instances of searchd, disables attr flushes!
# optional, default size is 1M
mva_updates_pool = 1M

# max allowed network packet size
# limits both query packets from clients, and responses from agents
# optional, default size is 8M
max_packet_size = 8M

# crash log path
# searchd will (try to) log crashed query to 'crash_log_path.PID' file
# optional, default is empty (do not create crash logs)
#
# crash_log_path = /usr/local/sphinx/var/log/crash


# max allowed per-query filter count
# optional, default is 256
max_filters = 256

# max allowed per-filter values count
# optional, default is 4096
max_filter_values = 4096


# socket listen queue length
# optional, default is 5
#
# listen_backlog = 5


# per-keyword read buffer size
# optional, default is 256K
#
# read_buffer = 256K


# unhinted read size (currently used when reading hits)
# optional, default is 32K
#
# read_unhinted = 32K


# max allowed per-batch query count (aka multi-query count)
# optional, default is 32
#max_batch_queries = 32


# max common subtree document cache size, per-query
# optional, default is 0 (disable subtree optimization)
#
# subtree_docs_cache = 4M


# max common subtree hit cache size, per-query
# optional, default is 0 (disable subtree optimization)
#
# subtree_hits_cache = 8M


# multi-processing mode (MPM)
# known values are none, fork, prefork, and threads
# optional, default is fork
#
#workers = threads # for RT to work


# max threads to create for searching local parts of a distributed index
# optional, default is 0, which means disable multi-threaded searching
# should work with all MPMs (ie. does NOT require workers=threads)
#
# dist_threads = 4


# binlog files path; use empty string to disable binlog
# optional, default is build-time configured data directory
#
# binlog_path = # disable logging
# binlog_path = /usr/local/coreseek/var/data # binlog.001 etc will be created there


# binlog flush/sync mode
# 0 means flush and sync every second
# 1 means flush and sync every transaction
# 2 means flush every transaction, sync every second
# optional, default is 2
#
# binlog_flush = 2


# binlog per-file size limit
# optional, default is 128M, 0 means no limit
#
# binlog_max_log_size = 256M
}

# --eof--


关于中文词典的构造清访问  http://www.coreseek.cn/opensource/mmseg/
//详细查看官方文档中关于source的各个配置选项
接着就是建立索引
#cd /usr/local/coreseek/
#bin/indexer --config /usr/local/coreseek/etc/sphinx.conf product
启动searchd
#bin/searchd --config /usr/local/coreseek/etc/sphinx.conf
#bin/search 要搜索的字符串
关于php 端的调用,coreseek源码中有api。
这篇博客只是帮助初学者了解配置sphinx,更多请访问coreseek官方网站http://www.coreseek.cn



分享到:
评论

相关推荐

    基于Sphinx 0.9.8 开发的中文全文搜索引擎Coreseek

    基于Sphinx 0.9.8 开发 支持Mysql全文搜索,支持PHP开发。 新增如下特性: 修正 2.5.x 系列searchd可能崩溃的Bug 改进 高亮的算法,支持词权重自定义 改进 切分算法,支持必须出现的关键词(仅在简单查询模式...

    中文搜索引擎sphinx索引coreseek-4.1-beta.zip压缩文件

    coreseek是一款基于sphinx开源的搜索引擎,因为sphinx只支持英文和俄文(即只能进行英文分词和俄文分词),所以如果要使用sphinx做中文搜索的话,需要自己独立去导入中文词库。而coreseek里集成了中文词库模块mmseg,...

    Sphinx搜索引擎架构与使用文档(和MySQL结合)V1.1.

    1、生成Sphinx中文分词词库 11 ⑴、词典的构造 12 ⑵、词典文件格式 12 ⑶、XX网搜索引擎的中文分词词库 12 2、创建Sphinx主索引文件、增量索引文件存放目录 12 3、创建Sphinx配置文件 13 4、初始化sphinx.conf中...

    sphinx中文文档

    sphinx中文开发手册 Sphinx是一个基于SQL的全文检索引擎,可以结合MySQL,PostgreSQL做全文搜索,它可以提供比数据库本身更专业的搜索功能,使得应用程序更容易实现专业化的全文检索。Sphinx特别为一些脚本语言设计...

    sphinx开源的搜索引擎的PHP模块扩展包(linux) sphinx.1.1.0版本

    sphinx是一个开源的搜索引擎,因为sphinx只支持英文和俄文(即只能进行英文分词和俄文分词),所以如果要使用sphinx做中文搜索的话,再引入一个中文词库,可以在我的文章里搜索 基于sphinx的开源搜索引擎coreseek的...

    coreseek(sphinx + 中文分词)

    Coreseek 是一款中文全文检索/搜索软件,以GPLv2许可协议开源发布,基于Sphinx研发并独立发布,专攻中文搜索和信息处理领域,适用于行业/垂直搜索、论坛/站内搜索、数据库搜索、文档/文献检索、信息检索、数据挖掘等...

    站内全文搜索引擎 coreseek

    Coreseek发布了3.2.14版本和4.1版本,其中的3.2.14版本是2010年发布的,它是基于Sphinx0.9.9搜索引擎的。而4.1版本是2011年发布的,它是基于Sphinx2.0.2的。Sphinx从0.9.9到2.0.2还是有改变了很多的,有很多功能,...

    论文研究-基于数据挖掘的sphinx站内搜索结果改进的设计与实现 .pdf

    基于数据挖掘的sphinx站内搜索结果改进的设计与实现,武红宽,马怡伟,sphinx在站内中文全文搜索能够有效的解决搜索效率的问题,但对于特定的应用场景,搜索结果准确性有待提高,为解决此问题,本文设计

    sphinx2-0.4

    基于Sphinx研发并独立发布,专攻中文搜索和信息处理领域,适用于行业/垂直搜索、论坛/站内搜索、数据库搜索、文档/文献检索、信息检索、数据挖掘等应用场景

    coreseek4.1.zip

    Coreseek 是一款中文全文检索/搜索软件,以GPLv2许可协议开源发布,基于Sphinx研发并独立发布,专攻中文搜索和信息处理领域,适用于行业/垂直搜索、论坛/站内搜索、数据库搜索、文档/文献检索、信息检索、数据挖掘等...

    mmseg-0.7.3.tar.gz

    基于sphinx的中文支持包 LibMMSeg是为Sphinx全文搜索引擎设计的

    浅谈Coreseek、Sphinx-for-chinaese、Sphinx+Scws的区别

    Sphinx是一个基于SQL的全文检索引擎;普遍使用于很多网站 Sphinx的特性如下: a) 高速的建立索引(在当代CPU上,峰值性能可达到10 MB/秒); b) 高性能的搜索(在2 – 4GB 的文本数据上,平均每次检索响应时间小于0.1...

    coreseek-4.1-win64支持多拼音全文搜索索引

    全文搜索coreseek+sphinx支持拼音搜索,Coreseek 是一款中文全文检索/搜索软件,以GPLv2许可协议开源发布,基于Sphinx研发并独立发布,专攻中文搜索和信息处理领域,适用于行业/垂直搜索、论坛/站内搜索、数据库搜索...

    coreseek-4.1中文全文检索/搜索软件

    Coreseek 是一款中文全文检索/搜索软件,以GPLv2许可协议开源发布,基于Sphinx研发并独立发布,专攻中文搜索和信息处理领域,适用于行业/垂直搜索、论坛/站内搜索、数据库搜索、文档/文献检索、信息检索、数据挖掘等...

    coreseek4.1 支持拼音索引 win32

    Coreseek 是一款中文全文检索/搜索软件,以GPLv2许可协议开源发布,基于Sphinx研发并独立发布,专攻中文搜索和信息处理领域,适用于行业/垂直搜索、论坛/站内搜索、数据库搜索、文档/文献检索、信息检索、数据挖掘等...

    coreseek4.1 支持多音字拼音索引第三版

    Coreseek 是一款中文全文检索/搜索软件,以GPLv2许可协议开源发布,基于Sphinx研发并独立发布,专攻中文搜索和信息处理领域,适用于行业/垂直搜索、论坛/站内搜索、数据库搜索、文档/文献检索、信息检索、数据挖掘等...

    coreseek4.1 支持多音字拼音索引 win64

    Coreseek 是一款中文全文检索/搜索软件,以GPLv2许可协议开源发布,基于Sphinx研发并独立发布,专攻中文搜索和信息处理领域,适用于行业/垂直搜索、论坛/站内搜索、数据库搜索、文档/文献检索、信息检索、数据挖掘等...

    coreseek-4.1-win32.zip

    reseek其实就是基于sphinx的中文分词版本,sphinx本身并没有提供中文分词功能,需要自行安装中文词库比较麻烦,coreseek提供了中文分词功能,提供了完整的官方中文使用文档,并且在使用上和官方的sphinx并没有差别。

    coreseek安装

    Coreseek 是一款可供企业使用的、基于Sphinx(可独立于Sphinx原始版本运行)中文全文检索/搜索软件,以GPLv2许可协议开源发布,专攻中文搜索和信息处理领域,适用于行业/垂直搜索、论坛/站内搜索、数据库搜索、文档/...

    CoreseekDocker:Coreseek 中文全文检索服务的 Dockerfile

    Coreseek DockerfileCoreseek 是一款中文全文检索/搜索软件,以 GPLv2 许可协议开源发布,基于 Sphinx 研发并独立发布,专攻中文搜索和信息处理领域,适用于行业/垂直搜索、论坛/站内搜索、数据库搜索、文档/文献...

Global site tag (gtag.js) - Google Analytics