LangChain - 1.3.3 Caching & Streaming & Token usage

참조 : https://python.langchain.com/docs/modules/model_io/llms/llm_caching

Caching | 🦜️🔗 Langchain

LangChain provides an optional caching layer for LLMs. This is useful

python.langchain.com

Caching

API call 횟수를 줄이기 위해서 caching 사용

Method	Code	Etc.
(Prepare)	llm = OpenAI(model_name="gpt-3.5-turbo-instruct", n=2, best_of=2)
InMemoryCache	set_llm_cache(InMemoryCache())
SQLiteCache	set_llm_cache(SQLiteCache(database_path=".langchain.db"))	DB가 없으면 자동생성
(Check)	# The first time, it is not yet in cache, so it should take longer varNow = datetime.datetime.now() result = llm.invoke("Tell me a joke") print('Output>', result, datetime.datetime.now() - varNow) varNow = datetime.datetime.now() result = llm.invoke("Tell me a joke") print('Output>', result, datetime.datetime.now() - varNow)	첫번째는

InMemoryCache

import datetime
from langchain.globals import set_llm_cache
from langchain_openai import OpenAI

# To make the caching really obvious, lets use a slower model.
llm = OpenAI(model_name="gpt-3.5-turbo-instruct", n=2, best_of=2)


from langchain.cache import InMemoryCache

set_llm_cache(InMemoryCache())

# The first time, it is not yet in cache, so it should take longer
varNow = datetime.datetime.now()
result = llm.invoke("Tell me a joke")

print('Output>', result, datetime.datetime.now() - varNow)


varNow = datetime.datetime.now()
result = llm.invoke("Tell me a joke")

print('Output>', result, datetime.datetime.now() - varNow)

Output>

Why don't scientists trust atoms?

Because they make up everything. 0:00:00.769805
Output>

Why don't scientists trust atoms?

Because they make up everything. 0:00:00.000299

첫번째 수행은 caching 하기 전, 두번째 수행은 caching 후 에 cache된 결과를 이용 -> 수행시간 차이 확인 필요

SQLite Cache

import datetime
from langchain.globals import set_llm_cache
from langchain_openai import OpenAI

# To make the caching really obvious, lets use a slower model.
llm = OpenAI(model_name="gpt-3.5-turbo-instruct", n=2, best_of=2)


# We can do the same thing with a SQLite cache
from langchain.cache import SQLiteCache

set_llm_cache(SQLiteCache(database_path=".langchain.db"))


# The first time, it is not yet in cache, so it should take longer
varNow = datetime.datetime.now()
# The first time, it is not yet in cache, so it should take longer
result = llm.invoke("Tell me a joke")
print('Output>', result, datetime.datetime.now() - varNow)


varNow = datetime.datetime.now()
result = llm.invoke("Tell me a joke")

print('Output>', result, datetime.datetime.now() - varNow)

Output>

Why couldn't the bicycle stand up by itself? Because it was two-tired. 0:00:00.917748
Output>

Why couldn't the bicycle stand up by itself? Because it was two-tired. 0:00:00.072242

Cache 부분을 SQLite 이용하는 부분만 다르고 나머지는 동일
Memory를 사용하는 경우보다 성능은 다소(?) 낮은 수준임

참조 : https://python.langchain.com/docs/modules/model_io/llms/streaming_llm

Streaming | 🦜️🔗 Langchain

All LLMs implement the Runnable interface, which comes with default

python.langchain.com

Streaming

기본적으로 straming 제공함

llm = OpenAI(model="gpt-3.5-turbo-instruct", temperature=0, max_tokens=512)
for chunk in llm.stream("Write me a song about sparkling water."):
    print(chunk, end="", flush=True)

Streaming 방식으로 표시됨

참조 : https://python.langchain.com/docs/modules/model_io/llms/token_usage_tracking

Tracking token usage | 🦜️🔗 Langchain

This notebook goes over how to track your token usage for specific

python.langchain.com

Tracking token usage

OpenAI API에서 tocken usage를 tracking 가능

from langchain_community.callbacks import get_openai_callback
from langchain_openai import OpenAI

llm = OpenAI(model_name="gpt-3.5-turbo-instruct", n=2, best_of=2)

with get_openai_callback() as cb:
    result = llm.invoke("Tell me a joke")
    print('Output>', cb)


with get_openai_callback() as cb:
    result = llm.invoke("Tell me a joke")
    result2 = llm.invoke("Tell me a joke")
    print('Output>', cb.total_tokens)

Output> Tokens Used: 37
Prompt Tokens: 4
Completion Tokens: 33
Successful Requests: 1
Total Cost (USD): $7.2e-05
Output> 78

저작자표시 비영리 변경금지

이것저것이반네

LangChain - 1.3.3 Caching & Streaming & Token usage

Caching

InMemoryCache

SQLite Cache

Streaming

Tracking token usage

티스토리툴바