참조 : https://python.langchain.com/docs/modules/data_connection/document_loaders/csv
Document loaders
Source에서 document를 load
Simplest loader
from langchain_community.document_loaders import TextLoader
loader = TextLoader("./index.md")
result = loader.load()
print(result)
[
Document(page_content='---\nsidebar_position: 0\n---\n# Document loaders\n\nUse document loaders to load data from a source as `Document`\'s. A `Document` is a piece of text\nand associated metadata. For example, there are document loaders for loading a simple `.txt` file, for loading the text\ncontents of any web page, or even for loading a transcript of a YouTube video.\n\nEvery document loader exposes two methods:\n1. "Load": load documents from the configured source\n2. "Load and split": load documents from the configured source and split them using the passed in text splitter\n\nThey optionally implement:\n\n3. "Lazy load": load documents into memory lazily\n', metadata={'source': '../docs/docs/modules/data_connection/document_loaders/index.md'})
]
HTML
참조 : https://python.langchain.com/docs/modules/data_connection/document_loaders/html
Unstructured HTML Loader
from langchain_community.document_loaders import UnstructuredHTMLLoader
loader = UnstructuredHTMLLoader("example_data/fake-content.html")
data = loader.load()
print(data)
BeautifulSoup4
# from langchain_community.document_loaders import UnstructuredHTMLLoader
from langchain_community.document_loaders import BSHTMLLoader
loader = BSHTMLLoader("sample/file.html")
data = loader.load()
print(data)
JSON
참조 : https://python.langchain.com/docs/modules/data_connection/document_loaders/json
JSONLoader
from langchain_community.document_loaders import JSONLoader
import json
from pathlib import Path
from pprint import pprint
file_path='sample/json_sample.json'
data = json.loads(Path(file_path).read_text())
print(data)
jq_schema 사용
from langchain_community.document_loaders import JSONLoader
import json
from pathlib import Path
from pprint import pprint
loader = JSONLoader(
file_path='sample/json_sample.json',
jq_schema='.message[].content',
text_content=False
)
data = loader.load()
print(data)
'ML&DL and LLM' 카테고리의 다른 글
LangChain - 2.5 Vector stores GetStarted (0) | 2024.04.02 |
---|---|
LangChain - 2.3 Text Splitter (0) | 2024.04.02 |
LangChain - 2.1 Retrieval concept (1) | 2024.03.29 |
LangChain 1.5.1 Types of output parser (0) | 2024.03.29 |
LangChain - 1.3.1 LLM QuickStart (0) | 2024.03.28 |