文本摘要(Text Summarization)
是一种自然语言处理 (NLP) 技术,涉及从文本文档中提取最相关的信息,并以简洁、连贯的格式呈现它。
摘要的工作原理是向模型发送提示指令,要求其总结文本,如下例所示:
Please summarize the following text:
<text>
Lorem ipsum dolor sit amet, consectetur adipiscing elit,
sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
Sem fringilla ut morbi tincidunt augue interdum velit euismod in.
Quis hendrerit dolor magna eget est.
</text>
为了让模型执行汇总任务,我们使用Prompt Engineering
,为模型提供有关处理数据时预期内容和所需响应格式的指令(以纯文本形式)。
生成摘要的一些常见案例:
学术论文
法律文件
财务报告
Text Summarization
一个关键的挑战是管理超出令牌限制的大型文档。另一个是获得高质量的摘要。
本节我们将少量数据(字符串数据)发送到 Amazon Bedrock API 中,并为其提供汇总相应文本的指令。
我们将生成以下链接的摘要:
我们先使用 Amazon Titan 模型,然后使用 Anthropic Claude 模型。
代码如下:
import json
import os
import sys
import boto3
import botocore
boto3_bedrock = boto3.client('bedrock-runtime')
# prompt以`Please provide a summary of the following text.`指令开始,以`<text>`标签包围文本。
prompt = """
Please provide a summary of the following text. Do not add any information that is not mentioned in the text below.
<text>
AWS took all of that feedback from customers, and today we are excited to announce Amazon Bedrock, \
a new service that makes FMs from AI21 Labs, Anthropic, Stability AI, and Amazon accessible via an API. \
Bedrock is the easiest way for customers to build and scale generative AI-based applications using FMs, \
democratizing access for all builders. Bedrock will offer the ability to access a range of powerful FMs \
for text and images—including Amazons Titan FMs, which consist of two new LLMs we’re also announcing \
today—through a scalable, reliable, and secure AWS managed service. With Bedrock’s serverless experience, \
customers can easily find the right model for what they’re trying to get done, get started quickly, privately \
customize FMs with their own data, and easily integrate and deploy them into their applications using the AWS \
tools and capabilities they are familiar with, without having to manage any infrastructure (including integrations \
with Amazon SageMaker ML features like Experiments to test different models and Pipelines to manage their FMs at scale).
</text>
"""
body = json.dumps({"inputText": prompt,
"textGenerationConfig":{
"maxTokenCount":1024,
"stopSequences":[],
"temperature":0,
"topP":1
},
})
modelId = 'amazon.titan-tg1-large' # change this to use a different version from the model provider
accept = 'application/json'
contentType = 'application/json'
try:
# 指定请求参数`modelId`,`accept`, 和`contentType`。
response = boto3_bedrock.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType)
response_body = json.loads(response.get('body').read())
print(response_body.get('results')[0].get('outputText'))
except botocore.exceptions.ClientError as error:
if error.response['Error']['Code'] == 'AccessDeniedException':
print(f"\x1b[41m{error.response['Error']['Message']}\
\nTo troubeshoot this issue please refer to the following resources.\
\nhttps://docs.aws.amazon.com/IAM/latest/UserGuide/troubleshoot_access-denied.html\
\nhttps://docs.aws.amazon.com/bedrock/latest/userguide/security-iam.html\x1b[0m\n")
else:
raise error
我们用InvokeModel
用于向基础模型发送请求的 API,上面是将文本发送到 Amazon Titan Text Large 的 API 请求示例。推理参数textGenerationConfig
取决于要使用的模型,Amazon Titan Text 的推理参数为:
temperature
:较低的值会导致更陡峭的曲线和更具确定性的响应,而较高的值会导致更平坦的曲线和更随机的响应。 (浮点数,默认为0,最大值为1.5)运行上面代码,输出:
使用Claude执行上面任务:
import json
import boto3
import botocore
boto3_bedrock = boto3.client('bedrock-runtime')
prompt = """
Human: Please provide a summary of the following text.
<text>
AWS took all of that feedback from customers, and today we are excited to announce Amazon Bedrock, \
a new service that makes FMs from AI21 Labs, Anthropic, Stability AI, and Amazon accessible via an API. \
Bedrock is the easiest way for customers to build and scale generative AI-based applications using FMs, \
democratizing access for all builders. Bedrock will offer the ability to access a range of powerful FMs \
for text and images—including Amazons Titan FMs, which consist of two new LLMs we’re also announcing \
today—through a scalable, reliable, and secure AWS managed service. With Bedrock’s serverless experience, \
customers can easily find the right model for what they’re trying to get done, get started quickly, privately \
customize FMs with their own data, and easily integrate and deploy them into their applications using the AWS \
tools and capabilities they are familiar with, without having to manage any infrastructure (including integrations \
with Amazon SageMaker ML features like Experiments to test different models and Pipelines to manage their FMs at scale).
</text>
Assistant:"""
body = json.dumps({"prompt": prompt,
"max_tokens_to_sample":4096,
"temperature":0.5,
"top_k":250,
"top_p":0.5,
"stop_sequences":[]
})
modelId = 'anthropic.claude-v2'
accept = 'application/json'
contentType = 'application/json'
response = boto3_bedrock.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType)
response_body = json.loads(response.get('body').read())
print(response_body.get('completion'))
输出: