AWS Sagemaker

LiteLLM supports Llama2 on Sagemaker

API KEYS

!pip install boto3 

os.environ["AWS_ACCESS_KEY_ID"] = ""
os.environ["AWS_SECRET_ACCESS_KEY"] = ""
os.environ["AWS_REGION_NAME"] = ""

Usage

import os 
from litellm import completion

os.environ["AWS_ACCESS_KEY_ID"] = ""
os.environ["AWS_SECRET_ACCESS_KEY"] = ""
os.environ["AWS_REGION_NAME"] = ""

response = completion(
            model="sagemaker/jumpstart-dft-meta-textgeneration-llama-2-7b", 
            messages=[{ "content": "Hello, how are you?","role": "user"}],
            temperature=0.2,
            max_tokens=80
        )

Passing credentials as parameters - Completion()

Pass AWS credentials as parameters to litellm.completion

import os 
from litellm import completion

response = completion(
            model="sagemaker/jumpstart-dft-meta-textgeneration-llama-2-7b",
            messages=[{ "content": "Hello, how are you?","role": "user"}],
            aws_access_key_id="",
            aws_secret_access_key="",
            aws_region_name="",
)

Usage - Streaming

Sagemaker currently does not support streaming - LiteLLM fakes streaming by returning chunks of the response string

import os 
from litellm import completion

os.environ["AWS_ACCESS_KEY_ID"] = ""
os.environ["AWS_SECRET_ACCESS_KEY"] = ""
os.environ["AWS_REGION_NAME"] = ""

response = completion(
            model="sagemaker/jumpstart-dft-meta-textgeneration-llama-2-7b", 
            messages=[{ "content": "Hello, how are you?","role": "user"}],
            temperature=0.2,
            max_tokens=80,
            stream=True,
        )
for chunk in response:
    print(chunk)

AWS Sagemaker Models

Here's an example of using a sagemaker model with LiteLLM

Model Name	Function Call
Meta Llama 2 7B	`completion(model='sagemaker/jumpstart-dft-meta-textgeneration-llama-2-7b', messages=messages)`
Meta Llama 2 7B (Chat/Fine-tuned)	`completion(model='sagemaker/jumpstart-dft-meta-textgeneration-llama-2-7b-f', messages=messages)`
Meta Llama 2 13B	`completion(model='sagemaker/jumpstart-dft-meta-textgeneration-llama-2-13b', messages=messages)`
Meta Llama 2 13B (Chat/Fine-tuned)	`completion(model='sagemaker/jumpstart-dft-meta-textgeneration-llama-2-13b-f', messages=messages)`
Meta Llama 2 70B	`completion(model='sagemaker/jumpstart-dft-meta-textgeneration-llama-2-70b', messages=messages)`
Meta Llama 2 70B (Chat/Fine-tuned)	`completion(model='sagemaker/jumpstart-dft-meta-textgeneration-llama-2-70b-b-f', messages=messages)`

AWS Sagemaker

API KEYS​

Usage​

Passing credentials as parameters - Completion()​

Usage - Streaming​

AWS Sagemaker Models​

API KEYS

Usage

Passing credentials as parameters - Completion()

Usage - Streaming

AWS Sagemaker Models