elasticSearch 安装中文分词器 IK Analysis

elasticSearch · Fecmall · 于 4年前 发布 · 2350 次阅读

1.安装elasticSearch,参看文档:http://www.fecmall.com/topic/672

2.下载相应的 IK Analysis版本,上面的文档安装的是6.1.3,因此中文分词器IK Analysis也要下载6.1.3

下载地址:https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.1.3/elasticsearch-analysis-ik-6.1.3.zip

如果您是其他的版本,请将v6.1.3/elasticsearch-analysis-ik-6.1.3,这2处的版本改成您的elasticSearch的版本进行下载

3.安装,进入elasticSearch的安装路径

3.1下载zip压缩包

cd ./plugins
wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.1.3/elasticsearch-analysis-ik-6.1.3.zip

您也可以ftp上传上去,下载文成后是一个压缩包elasticsearch-analysis-ik-6.1.3.zip

3.2解压zip包,删除压缩包,更名文件夹

解压:unzip elasticsearch-analysis-ik-6.1.3.zip

删除压缩包(必须删除,否则elastic启动报错):rm -f elasticsearch-analysis-ik-6.1.3.zip

将解压后的文件夹elasticsearch更名为ik

上面的步骤需要都操作,否则会出现启动失败

3.3重启elasticSearch

ps -ef | grep elastic
然后用kill掉,然后通过下面的命令启动es
su elasticsearch  -c "/usr/local/elasticsearch/bin/elasticsearch -d"

启动成功,则代表成功

4.查看

4.1使用默认分词器

curl -XGET http://127.0.0.1:9200/_analyze?pretty -H 'Content-Type:application/json' -d'               
{
  "text": "听说看这篇博客的哥们最帅、姑娘最美"
}' 

结果:

[root@iZ942k2d5ezZ plugins]# curl -XGET http://127.0.0.1:9200/_analyze?pretty -H 'Content-Type:application/json' -d'               
> {
>   "text": "听说看这篇博客的哥们最帅、姑娘最美"
> }' 
{
  "tokens" : [
    {
      "token" : "听",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "<IDEOGRAPHIC>",
      "position" : 0
    },
    {
      "token" : "说",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "<IDEOGRAPHIC>",
      "position" : 1
    },
    {
      "token" : "看",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "<IDEOGRAPHIC>",
      "position" : 2
    },
    {
      "token" : "这",
      "start_offset" : 3,
      "end_offset" : 4,
      "type" : "<IDEOGRAPHIC>",
      "position" : 3
    },
    {
      "token" : "篇",
      "start_offset" : 4,
      "end_offset" : 5,
      "type" : "<IDEOGRAPHIC>",
      "position" : 4
    },
    {
      "token" : "博",
      "start_offset" : 5,
      "end_offset" : 6,
      "type" : "<IDEOGRAPHIC>",
      "position" : 5
    },
    {
      "token" : "客",
      "start_offset" : 6,
      "end_offset" : 7,
      "type" : "<IDEOGRAPHIC>",
      "position" : 6
    },
    {
      "token" : "的",
      "start_offset" : 7,
      "end_offset" : 8,
      "type" : "<IDEOGRAPHIC>",
      "position" : 7
    },
    {
      "token" : "哥",
      "start_offset" : 8,
      "end_offset" : 9,
      "type" : "<IDEOGRAPHIC>",
      "position" : 8
    },
    {
      "token" : "们",
      "start_offset" : 9,
      "end_offset" : 10,
      "type" : "<IDEOGRAPHIC>",
      "position" : 9
    },
    {
      "token" : "最",
      "start_offset" : 10,
      "end_offset" : 11,
      "type" : "<IDEOGRAPHIC>",
      "position" : 10
    },
    {
      "token" : "帅",
      "start_offset" : 11,
      "end_offset" : 12,
      "type" : "<IDEOGRAPHIC>",
      "position" : 11
    },
    {
      "token" : "姑",
      "start_offset" : 13,
      "end_offset" : 14,
      "type" : "<IDEOGRAPHIC>",
      "position" : 12
    },
    {
      "token" : "娘",
      "start_offset" : 14,
      "end_offset" : 15,
      "type" : "<IDEOGRAPHIC>",
      "position" : 13
    },
    {
      "token" : "最",
      "start_offset" : 15,
      "end_offset" : 16,
      "type" : "<IDEOGRAPHIC>",
      "position" : 14
    },
    {
      "token" : "美",
      "start_offset" : 16,
      "end_offset" : 17,
      "type" : "<IDEOGRAPHIC>",
      "position" : 15
    }
  ]
}

4.2使用ik_smart分词器:

curl -XGET http://127.0.0.1:9200/_analyze?pretty -H 'Content-Type:application/json' -d'               
{
  "analyzer": "ik_smart",
  "text": "听说看这篇博客的哥们最帅、姑娘最美"
}' 

结果

[root@iZ942k2d5ezZ plugins]# curl -XGET http://127.0.0.1:9200/_analyze?pretty -H 'Content-Type:application/json' -d'               
> {
>   "analyzer": "ik_smart",
>   "text": "听说看这篇博客的哥们最帅、姑娘最美"
> }' 
{
  "tokens" : [
    {
      "token" : "听说",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "看",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "CN_CHAR",
      "position" : 1
    },
    {
      "token" : "这篇",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "博客",
      "start_offset" : 5,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
      "token" : "的",
      "start_offset" : 7,
      "end_offset" : 8,
      "type" : "CN_CHAR",
      "position" : 4
    },
    {
      "token" : "哥们",
      "start_offset" : 8,
      "end_offset" : 10,
      "type" : "CN_WORD",
      "position" : 5
    },
    {
      "token" : "最",
      "start_offset" : 10,
      "end_offset" : 11,
      "type" : "CN_CHAR",
      "position" : 6
    },
    {
      "token" : "帅",
      "start_offset" : 11,
      "end_offset" : 12,
      "type" : "CN_CHAR",
      "position" : 7
    },
    {
      "token" : "姑娘",
      "start_offset" : 13,
      "end_offset" : 15,
      "type" : "CN_WORD",
      "position" : 8
    },
    {
      "token" : "最美",
      "start_offset" : 15,
      "end_offset" : 17,
      "type" : "CN_WORD",
      "position" : 9
    }
  ]
}

4.3ik_max_word 和 ik_smart的区别

ik_max_word: 会将文本做最细粒度拆分,比如将中华人民共和国国歌拆分-->中华人民共和国,中华人民,中华,华人,人民共和国,人民,,,共和国,共和,,国国,国歌,会穷尽各种可能的组合;

ik_smart: 会做最粗粒度的拆分,比如将中华人民共和国国歌拆分-->中华人民共和国,国歌

5.对于应用扩展:http://addons.fecmall.com/44669378

如果想使用中文分词,那么进行配置:

@fecelastic\models\elasticSearch\Product, 将

public static $langAnalysis = [
            'zh' => 'cjk', // 中国
            'kr' => 'cjk', // 韩国
            'jp' => 'cjk', // 日本
            'en' => 'english', //
            'fr' => 'french', 
            'de' => 'german',
            'it' => 'italian',
            'pt' => 'portuguese',
            'es' => 'spanish',
            'ru' => 'russian',
            'nl' => 'dutch',
            'br' => 'brazilian',            
        ];

'zh' => 'cjk' 改成 'zh' => 'ik_smart',

然后根目录执行

./yii elasticsearch/clean
./yii elasticsearch/updatemapping

然后,执行数据同步

cd  ./vendor/fancyecommerce/fecshop/shell/search
sh fullSearchSync.sh

不清楚是内存太小还是其他原因,我安装了这个中文分词插件,elasticSearch很卡,因此这个我暂时没验证结果,您可以自行验证。

共收到 0 条回复
没有找到数据。
添加回复 (需要登录)
需要 登录 后方可回复, 如果你还没有账号请点击这里 注册
Your Site Analytics