본문 바로가기

카테고리 없음

LangChain - 1.3.3 Caching & Streaming & Token usage

참조 : https://python.langchain.com/docs/modules/model_io/llms/llm_caching

 

Caching | 🦜️🔗 Langchain

LangChain provides an optional caching layer for LLMs. This is useful

python.langchain.com

 

 

Caching

API call 횟수를 줄이기 위해서 caching 사용

Method Code Etc.
(Prepare) llm = OpenAI(model_name="gpt-3.5-turbo-instruct", n=2, best_of=2)  
InMemoryCache set_llm_cache(InMemoryCache())  
SQLiteCache set_llm_cache(SQLiteCache(database_path=".langchain.db")) DB가 없으면 자동생성
(Check) # The first time, it is not yet in cache, so it should take longer
varNow = datetime.datetime.now()
result = llm.invoke("Tell me a joke")

print('Output>', result, datetime.datetime.now() - varNow)


varNow = datetime.datetime.now()
result = llm.invoke("Tell me a joke")

print('Output>', result, datetime.datetime.now() - varNow)
첫번째는 

 

 

InMemoryCache

import datetime
from langchain.globals import set_llm_cache
from langchain_openai import OpenAI

# To make the caching really obvious, lets use a slower model.
llm = OpenAI(model_name="gpt-3.5-turbo-instruct", n=2, best_of=2)


from langchain.cache import InMemoryCache

set_llm_cache(InMemoryCache())

# The first time, it is not yet in cache, so it should take longer
varNow = datetime.datetime.now()
result = llm.invoke("Tell me a joke")

print('Output>', result, datetime.datetime.now() - varNow)


varNow = datetime.datetime.now()
result = llm.invoke("Tell me a joke")

print('Output>', result, datetime.datetime.now() - varNow)

Output>

Why don't scientists trust atoms?

Because they make up everything. 0:00:00.769805
Output>

Why don't scientists trust atoms?

Because they make up everything. 0:00:00.000299

 

  • 첫번째 수행은 caching 하기 전, 두번째 수행은 caching 후 에 cache된 결과를 이용 -> 수행시간 차이 확인 필요

 

SQLite Cache

import datetime
from langchain.globals import set_llm_cache
from langchain_openai import OpenAI

# To make the caching really obvious, lets use a slower model.
llm = OpenAI(model_name="gpt-3.5-turbo-instruct", n=2, best_of=2)


# We can do the same thing with a SQLite cache
from langchain.cache import SQLiteCache

set_llm_cache(SQLiteCache(database_path=".langchain.db"))


# The first time, it is not yet in cache, so it should take longer
varNow = datetime.datetime.now()
# The first time, it is not yet in cache, so it should take longer
result = llm.invoke("Tell me a joke")
print('Output>', result, datetime.datetime.now() - varNow)


varNow = datetime.datetime.now()
result = llm.invoke("Tell me a joke")

print('Output>', result, datetime.datetime.now() - varNow)

Output>

Why couldn't the bicycle stand up by itself? Because it was two-tired. 0:00:00.917748
Output>

Why couldn't the bicycle stand up by itself? Because it was two-tired. 0:00:00.072242

  • Cache 부분을 SQLite 이용하는 부분만 다르고 나머지는 동일
  • Memory를 사용하는 경우보다 성능은 다소(?) 낮은 수준임

 

 


 

참조 : https://python.langchain.com/docs/modules/model_io/llms/streaming_llm

 

Streaming | 🦜️🔗 Langchain

All LLMs implement the Runnable interface, which comes with default

python.langchain.com

Streaming

기본적으로 straming 제공함

 

llm = OpenAI(model="gpt-3.5-turbo-instruct", temperature=0, max_tokens=512)
for chunk in llm.stream("Write me a song about sparkling water."):
    print(chunk, end="", flush=True)
  • Streaming 방식으로 표시됨

 

 


참조 : https://python.langchain.com/docs/modules/model_io/llms/token_usage_tracking

 

Tracking token usage | 🦜️🔗 Langchain

This notebook goes over how to track your token usage for specific

python.langchain.com

 

Tracking token usage

OpenAI API에서 tocken usage를 tracking 가능

 

from langchain_community.callbacks import get_openai_callback
from langchain_openai import OpenAI

llm = OpenAI(model_name="gpt-3.5-turbo-instruct", n=2, best_of=2)

with get_openai_callback() as cb:
    result = llm.invoke("Tell me a joke")
    print('Output>', cb)


with get_openai_callback() as cb:
    result = llm.invoke("Tell me a joke")
    result2 = llm.invoke("Tell me a joke")
    print('Output>', cb.total_tokens)

Output> Tokens Used: 37
        Prompt Tokens: 4
        Completion Tokens: 33
Successful Requests: 1
Total Cost (USD): $7.2e-05
Output> 78