机器学习推理搜索响应处理器

搜索ml_inference响应处理器用于调用已注册的机器学习 (ML) 模型，以便将它们的输出作为新字段添加到搜索结果中的文档中。

前提条件：
在使用ml_inference搜索响应处理器之前，您必须拥有托管在 UDB-SX 集群上的本地机器学习模型，或者通过 ML Commons 插件连接到 UDB-SX 集群的外部托管模型。有关本地模型的更多信息，请参阅“在 UDB-SX 中使用机器学习模型”。有关外部托管模型的更多信息，请参阅“连接到外部托管模型”。

语法

以下是ml-inference搜索响应处理器的语法：

{
  "ml_inference": {
    "model_id": "<model_id>",
    "function_name": "<function_name>",
    "full_response_path": "<full_response_path>",
    "model_config":{
      "<model_config_field>": "<config_value>"
    },
    "model_input": "<model_input>",
    "input_map": [
      {
        "<model_input_field>": "<document_field>"
      }
    ],
    "output_map": [
      {
        "<new_document_field>": "<model_output_field>"
      }
    ],
    "override": "<override>",
    "one_to_one": false
  }
}

请求字段

下表列出了ml-inference搜索响应处理器的必需参数和可选参数。

| 参数 | 数据类型 | 必填/选填 | 描述 | |:–| :— | :— |:—| | model_id | String | 必填 | 处理器使用的机器学习模型的ID。 | | function_name | String | 对于外部托管模型，此项为选填；

对于本地模型，此项为必填。 | 在处理器中配置的机器学习模型的函数名称。对于本地模型，有效的值为sparse_encoding, sparse_tokenize, text_embedding和text_similarity。对于外部托管模型，有效的值为remote。默认值为remote。 | | model_config | Object | 选填 |机器学习模型的自定义配置选项。对于外部托管模型，如果设置了此项，则此配置将覆盖默认的连接器参数。对于本地模型，您可以在model_input中添加model_config来覆盖在注册时设置的模型配置。如需更多信息，请参阅model_config对象。 | model_input | String | 对于外部托管模型，此项为选填；

对于本地模型，此项为必填。 | 一个定义了模型所期望的输入字段格式的模板。每个本地模型类型可能使用不同的输入集合。对于外部托管模型，默认设置为 "{ \"parameters\": ${ml_inference.parameters} }. | | input_map | Array | 对于外部托管模型，此项为选填；

对于本地模型，此项为必填。 | 一个数组，用于指定如何将搜索响应中的文档字段映射到模型的输入字段。数组中的每个元素都是以"<model_input_field>": "<document_field>" 的格式表示的映射，并且对应于对一个文档字段的一次模型调用。如果未为外部托管模型指定输入映射，则所有文档字段都将直接作为输入传递给模型。input_map的大小表示模型被调用的次数（即 Predict API 请求的次数）。 | | <model_input_field> | String | 对于外部托管模型，此项为选填；

对于本地模型，此项为必填。 | 模型输入字段名称。 | | <document_field> | String | 对于外部托管模型，此项为选填；

对于本地模型，此项为必填。 | 搜索响应中用作模型输入的文档字段的名称或 JSON 路径。 | | output_map | Array | 对于外部托管模型，此项为选填；

对于本地模型，此项为必填。 |一个数组，用于指定如何将模型输出字段映射到搜索响应文档中的新字段。数组中的每个元素都是一个映射，格式如下 "<new_document_field>": "<model_output_field>"。 | | <new_document_field> | String | 对于外部托管模型，此项为选填；

对于本地模型，此项为必填。 | 该文档中新字段的名称，用于存储模型的输出（由model_output指定的内容）。如果未为外部托管模型指定输出映射，则会将模型输出中的所有字段添加到新文档字段中。 | | <model_output_field> | String | 对于外部托管模型，此项为选填；

对于本地模型，此项为必填。 | 要存储在模型输出中的字段的名称或 JSON 路径 new_document_field. | | full_response_path | Boolean | 选填 | 如果model_output_field中包含的是指向该字段的完整 JSON 路径（而非字段名称），则将此参数设置为true。这样，模型输出将被完全解析以获取该字段的值。对于本地模型，默认值为true，而对于外部托管模型，默认值为false。 | | ignore_missing | Boolean | 选填 | 如果 true 且在 input_map 或 output_map 中定义的任何输入字段缺失，则此处理器将被忽略。否则，若存在缺失的字段，则会导致失败。默认值为 false 。 | | ignore_failure | Boolean | 选填 | 指定处理器在遇到错误时是否继续执行。如果为 true，则忽略此处理器并继续搜索。如果为 false，则任何失败都会导致搜索被取消。默认值为 false。 | | override | Boolean | 选填 | 如果响应中的某个文档已经包含名为 <new_document_field> 中指定的字段，则此条件成立。如果 override 的值为 false，则跳过输入字段。如果 true，则新模型的输出值将覆盖现有字段的值。默认值为 false。 | | max_prediction_tasks | Integer | 选填 | 在文档搜索过程中能够同时运行的模型调用的最大数量。默认值为10。 | | one_to_one | Boolean | 选填 | 将此参数设置为true可针对每份文档调用一次模型（发出一次预测 API 请求）。默认值（false）表示一次性使用搜索响应中的所有文档调用模型，从而发出一次预测 API 请求。 | | description | String | 选填 | 处理器的简要描述。 | | tag | String | 选填 | 处理器的标识标签。用于调试，以区分相同类型的处理器。 |

和映射支持标准的JSON路径符号，用于指定复杂数据结构。input_mapoutput_map

设置

创建一个名为 index 的索引my_index，并索引一个文档来解释映射关系：

POST /my_index/_doc/1
{
  "passage_text": "hello world"
}

使用处理器

请按照以下步骤在管道中使用该处理器。创建处理器时，必须提供模型 ID。在使用该处理器测试管道之前，请确保模型已成功部署。您可以使用“获取模型”API检查模型状态。

对于本地模型，您必须提供一个model_input字段来指定模型输入格式。将所有输入字段添加model_config到model_input.

对于远程模型，该model_input字段是可选的，其默认值为"{ \"parameters\": ${ml_inference.parameters} }。

示例：本地模型

以下示例向您展示如何使用ml_inference本地模型配置搜索响应处理器。

步骤 1：创建管道

以下示例展示了如何为huggingface/sentence-transformers/all-distilroberta-v1本地模型创建搜索管道。该模型是一个预训练的句子转换器模型，托管在您的 UDB-SX 集群中。

如果使用 Predict API 调用模型，则请求如下所示：

POST /_plugins/_ml/_predict/text_embedding/cleMb4kBJ1eYAeTMFFg4
{
  "text_docs":[ "today is sunny"],
  "return_number": true,
  "target_response": ["sentence_embedding"]
}

使用此方案，model_input按如下方式指定：

 "model_input": "{ \"text_docs\": ${input_map.text_docs}, \"return_number\": ${model_config.return_number}, \"target_response\": ${model_config.target_response} }"

在映射中input_map，将passage_text文档字段映射到text_docs模型期望的字段：

"input_map": [
  {
    "text_docs": "passage_text"
  }
]

因为您将要转换为嵌入的字段指定为一个 JSON 路径，所以您需要将 full_response_path 设置为 true。然后会解析整个 JSON 文档，以获取输入字段:

"full_response_path": true

该字段中的文本passage_text将用于生成词嵌入：

{
  "passage_text": "hello world"
}

Predict API 请求返回以下响应：

{
  "inference_results" : [
    {
      "output" : [
        {
          "name" : "sentence_embedding",
          "data_type" : "FLOAT32",
          "shape" : [
            768
          ],
          "data" : [
            0.25517133,
            -0.28009856,
            0.48519906,
            ...
          ]
        }
      ]
    }
  ]
}

该模型在$.inference_results.*.output.*.data字段中生成嵌入向量。并将output_map该字段映射到passage_embedding搜索响应文档中新创建的字段：

"output_map": [
  {
    "passage_embedding": "$.inference_results.*.output.*.data"
  }
]

要使用ml_inference本地模型配置搜索响应处理器，请function_name显式指定。在本例中，为function_name。text_embedding有关有效值的信息function_name，请参阅请求字段。

以下是ml_inference采用本地模型的搜索响应处理器的最终配置：

PUT /_search/pipeline/ml_inference_pipeline_local
{
  "description": "search passage and generates embeddings",
  "processors": [
    {
      "ml_inference": {
        "function_name": "text_embedding",
        "full_response_path": true,
        "model_id": "<your model id>",
        "model_config": {
          "return_number": true,
          "target_response": ["sentence_embedding"]
        },
        "model_input": "{ \"text_docs\": ${input_map.text_docs}, \"return_number\": ${model_config.return_number}, \"target_response\": ${model_config.target_response} }",
        "input_map": [
          {
            "text_docs": "passage_text"
          }
        ],
        "output_map": [
          {
            "passage_embedding": "$.inference_results.*.output.*.data"
          }
        ],
        "ignore_missing": true,
        "ignore_failure": true
      }
    }
  ]
}

步骤 2：运行管道

运行以下查询，并在请求中提供管道名称：

GET /my_index/_search?search_pipeline=ml_inference_pipeline_local
{
"query": {
  "term": {
    "passage_text": {
      "value": "hello"
      }
    }
  }
}

响应

该响应确认处理器已在该passage_embedding字段中生成文本嵌入：

{
  "took": 288,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 0.00009405752,
    "hits": [
      {
        "_index": "my_index",
        "_id": "1",
        "_score": 0.00009405752,
        "_source": {
          "passage_text": "hello world",
          "passage_embedding": [
            0.017304314,
            -0.021530833,
            0.050184276,
            0.08962978,
            ...]
        }
      }
    ]
  }
}

示例：外部托管的文本嵌入模型

以下示例向您展示如何配置ml_inference具有外部托管模型的搜索响应处理器。

步骤 1：创建管道

以下示例向您展示了如何为一个外部托管的文本嵌入模型创建搜索管道。该模型需要一个input字段，并在data字段中生成结果。它会将passage_text字段中的文本转换为文本嵌入，并将这些嵌入存储在passage_embedding字段中。在处理器配置中，未明确指定function_name，因此默认值为remote，这表明这是一个外部托管的模型：

PUT /_search/pipeline/ml_inference_pipeline
{
  "description": "Generate passage_embedding when search documents",
  "processors": [
    {
      "ml_inference": {
        "model_id": "<your model id>",
        "input_map": [
          {
            "input": "passage_text"
          }
        ],
        "output_map": [
          {
            "passage_embedding": "data"
          }
        ]
      }
    }
  ]
}

当向外部托管模型发出 Predict API 请求时，所有必要的字段和参数通常都包含在一个parameters对象中：

POST /_plugins/_ml/models/cleMb4kBJ1eYAeTMFFg4/_predict
{
  "parameters": {
    "input": [
      {
        ...
      }
    ]
  }
}

在为外部托管模型指定字段时input_map，可以直接引用该input字段，而无需提供其点路径parameters.input：

"input_map": [
  {
    "input": "passage_text"
  }
]

步骤 2：运行管道

运行以下查询，并在请求中提供管道名称：

GET /my_index/_search?search_pipeline=ml_inference_pipeline_local
{
  "query": {
    "match_all": {
    }
  }
}

该响应确认处理器已在passage_embedding字段中生成了文本嵌入。在_source文档中，现在同时包含了passage_text和passage_embedding两个字段：

{
  "took": 288,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 0.00009405752,
    "hits": [
      {
        "_index": "my_index",
        "_id": "1",
        "_score": 0.00009405752,
        "_source": {
          "passage_text": "hello world",
          "passage_embedding": [
            0.017304314,
            -0.021530833,
            0.050184276,
            0.08962978,
            ...]
        }
      }
      }
    ]
  }
}

例如：外部托管的大型语言模型

本示例演示如何配置ml_inference搜索响应处理器，使其能够与外部托管的大型语言模型 (LLM) 协同工作，并将模型的响应映射到搜索扩展对象。使用该ml_inference处理器，您可以使 LLM 直接在响应中汇总搜索结果。摘要包含在ext搜索响应的字段中，使用户能够无缝访问 AI 生成的见解以及原始搜索结果。

先决条件

您必须为此用例配置外部托管的 LLM。有关外部托管模型的更多信息，请参阅[连接到外部托管模型]。注册 LLM 后，您可以使用以下请求对其进行测试。此请求需要提供 promptand和context字段：

POST /_plugins/_ml/models/KKne6JIBAs32TwoK-FFR/_predict
{
  "parameters": {
    "prompt":"\n\nHuman: You are a professional data analysist. You will always answer question: Which month had the lowest customer acquisition cost per new customer? based on the given context first. If the answer is not directly shown in the context, you will analyze the data and find the answer. If you don't know the answer, just say I don't know. Context: ${parameters.context.toString()}. \n\n Assistant:",
    "context":"Customer acquisition cost: January: $50, February: $45, March: $40. New customers: January: 500, February: 600, March: 750"
  }
}

响应中包含以下inference_results字段的模型输出：

{
  "inference_results": [
    {
      "output": [
        {
          "name": "response",
          "dataAsMap": {
            "response": """ Based on the data provided:

                        - Customer acquisition cost in January was $50 and new customers were 500. So cost per new customer was $50/500 = $0.10
                        - Customer acquisition cost in February was $45 and new customers were 600. So cost per new customer was $45/600 = $0.075
                        - Customer acquisition cost in March was $40 and new customers were 750. So cost per new customer was $40/750 = $0.053
            
                        Therefore, the month with the lowest customer acquisition cost per new customer was March, at $0.053."""
          }
        }
      ],
      "status_code": 200
    }
  ]
}

步骤 1：创建管道

为已注册的模型创建搜索管道。该模型需要一个context字段作为输入。模型响应会对该字段中的文本进行摘要review，并将摘要存储在ext.ml_inference.llm_response搜索响应的字段中：

PUT /_search/pipeline/my_pipeline_request_review_llm
{
  "response_processors": [
    {
      "ml_inference": {
        "tag": "ml_inference",
        "description": "This processor is going to run llm",
        "model_id": "EOF6wJIBtDGAJRTD4kNg",
        "function_name": "REMOTE",
        "input_map": [
          {
            "context": "review"
          }
        ],
        "output_map": [
          {
            "ext.ml_inference.llm_response": "response"
          }
        ],
        "model_config": {
          "prompt": "\n\nHuman: You are a professional data analysist. You will always answer question: Which month had the lowest customer acquisition cost per new customer? based on the given context first. If the answer is not directly shown in the context, you will analyze the data and find the answer. If you don't know the answer, just say I don't know. Context: ${parameters.context.toString()}. \n\n Assistant:"
        },
        "ignore_missing": false,
        "ignore_failure": false
      }
    }
  ]
}

在此配置中，您提供了以下参数：

该model_id参数指定生成式人工智能模型的 ID。
该function_name参数设置为REMOTE，表示该模型托管在外部。
该input_map参数将文档中的评论字段映射到模型期望的上下文字段。
该output_map参数指定模型的响应应存储ext.ml_inference.llm_response在搜索响应中。
该model_config参数包含一个提示，告诉模型如何处理输入并生成摘要。

步骤二：索引样本文件

索引一些示例文档以测试流程：

POST /_bulk
{"index":{"_index":"review_string_index","_id":"1"}}
{"review":"Customer acquisition cost: January: $50, New customers: January: 500."}
{"index":{"_index":"review_string_index","_id":"2"}}
{"review":"Customer acquisition cost: February: $45, New customers: February: 600."}
{"index":{"_index":"review_string_index","_id":"3"}}
{"review":"Customer acquisition cost: March: $40, New customers: March: 750."}

步骤 3：运行管道

使用管道运行搜索查询：

GET /review_string_index/_search?search_pipeline=my_pipeline_request_review_llm
{
  "query": {
    "match_all": {}
  }
}

回复内容包括原始文档和生成的摘要（位于相应ext.ml_inference.llm_response字段）：

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 3,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "review_string_index",
        "_id": "1",
        "_score": 1,
        "_source": {
          "review": "Customer acquisition cost: January: $50, New customers: January: 500."
        }
      },
      {
        "_index": "review_string_index",
        "_id": "2",
        "_score": 1,
        "_source": {
          "review": "Customer acquisition cost: February: $45, New customers: February: 600."
        }
      },
      {
        "_index": "review_string_index",
        "_id": "3",
        "_score": 1,
        "_source": {
          "review": "Customer acquisition cost: March: $40, New customers: March: 750."
        }
      }
    ]
  },
  "ext": {
    "ml_inference": {
      "llm_response": """ Based on the context provided:

      - Customer acquisition cost in January was $50 and new customers were 500. So the cost per new customer was $50/500 = $0.10

      - Customer acquisition cost in February was $45 and new customers were 600. So the cost per new customer was $45/600 = $0.075

      - Customer acquisition cost in March was $40 and new customers were 750. So the cost per new customer was $40/750 = $0.053

      Therefore, the month with the lowest customer acquisition cost per new customer was March, as it had the lowest cost per customer of $0.053."""
    }
  }
}

示例：使用文本相似度模型对搜索结果进行重新排序

以下示例向您展示如何使用ml_inference文本相似度模型配置搜索响应处理器。

先决条件

对于此用例，您必须配置一个外部托管的文本相似度模型。有关外部托管模型的更多信息，请参阅连接到外部托管模型。注册文本相似度模型后，您可以使用以下请求对其进行测试。此请求要求您在 inputs 字段中提供 text 和 text_pair 字段：

POST /_plugins/_ml/models/Ialx65IBAs32TwoK1lXf/_predict
{
  "parameters": {
    "inputs":
    {
      "text": "I like you",
      "text_pair": "I hate you"
    }
  }
}

该模型返回每个输入文档的相似度得分：

{
  "inference_results": [
    {
      "output": [
        {
          "name": "response",
          "dataAsMap": {
            "label": "LABEL_0",
            "score": 0.022704314440488815
          }
        }
      ],
      "status_code": 200
    }
  ]
}

步骤 1：索引样本文件

创建索引并添加一些示例文档：

POST _bulk
{"index":{"_index":"demo-index-0","_id":"1"}}
{"diary":"I hate you"}
{"index":{"_index":"demo-index-0","_id":"2"}}
{"diary":"I love you"}
{"index":{"_index":"demo-index-0","_id":"3"}}
{"diary":"I dislike you"}

步骤 2：创建搜索管道 在此示例中，您将创建一个搜索管道，该管道在“一对一”推理模式下使用文本相似度模型，对搜索结果中的每个文档单独进行处理。这种设置允许模型针对每个文档发出一个预测请求，为每个搜索命中提供具体的相关性见解。当使用 input_map 将搜索请求映射到查询文本时，JSON 路径必须以 $._request 或 _request 开头：

PUT /_search/pipeline/my_rerank_pipeline
{
  "response_processors": [
    {
      "ml_inference": {
        "tag": "ml_inference",
        "description": "This processor runs ml inference during search response",
        "model_id": "Ialx65IBAs32TwoK1lXf",
        "model_input":"""{"parameters":{"inputs":{"text":"${input_map.text}","text_pair":"${input_map.text_pair}"}}}""",
        "function_name": "REMOTE",
        "input_map": [
          {
            "text": "diary",
            "text_pair":"$._request.query.term.diary.value"
          }
        ],
        "output_map": [
          {
            "rank_score": "$.score"
          }
        ],
        "full_response_path": false,
        "model_config": {},
        "ignore_missing": false,
        "ignore_failure": false,
        "one_to_one": true
        },
        "rerank": {
          "by_field": {
            "target_field": "rank_score",
            "remove_target_field": true
          }
        }
    }
  ]
}

在此配置中，您提供了以下参数：

该model_id参数指定文本相似度模型的唯一标识符。
该function_name参数设置为REMOTE，表示该模型托管在外部。
该input_map 参数将每个文档中的 diary 字段映射到模型的 text 输入，同时将搜索查询词映射到 text_pair 输入。
该output_map参数将模型的得分映射到rank_score每个文档中指定的字段。
该model_input参数用于格式化模型的输入，确保其符合 Predict API 所期望的结构。
该one_to_one参数设置为true，确保模型单独处理每个文档，而不是将多个文档一起批量处理。
该ignore_missing参数设置为 true false，如果文档中缺少映射字段，则处理器将失败。
该ignore_failure参数设置为 true false，这意味着如果 ML 推理处理器遇到错误，整个管道将失败。
该rerank处理器在机器学习推理之后应用。它根据rank_score机器学习模型生成的字段对文档进行重新排序，然后从最终结果中删除该字段。

步骤 3：运行管道

现在使用创建的管道执行搜索：

GET /demo-index-0/_search?search_pipeline=my_rerank_pipeline
{
  "query": {
    "term": {
      "dairy": {
        "value": "today"
      }
    }
  }
}

回复内容包括原始文件及其重新排名后的分数：

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 3,
      "relation": "eq"
    },
    "max_score": 0.040183373,
    "hits": [
      {
        "_index": "demo-index-0",
        "_id": "1",
        "_score": 0.040183373,
        "_source": {
          "diary": "I hate you"
        }
      },
      {
        "_index": "demo-index-0",
        "_id": "2",
        "_score": 0.022628736,
        "_source": {
          "diary": "I love you"
        }
      },
      {
        "_index": "demo-index-0",
        "_id": "3",
        "_score": 0.0073115323,
        "_source": {
          "diary": "I dislike you"
        }
      }
    ]
  },
  "profile": {
    "shards": []
  }
}

下一步

请参见使用外部托管交叉编码器模型，通过字段重新排序的综合示例