我让langchain使用PyPDFLoader加载 pdf,再切割后,转成向量,最后取出交给chatGPT,
发现在表格的数据上, 有些答案会出现数据不正确,请问该怎样增加正确度?要換loader嗎?
(部份代碼)
loader = PyPDFLoader(PATH)
embeddings = OpenAIEmbeddings()
db = Chroma.from_documents(texts, embeddings, persist_directory=db_path)
db.persist()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=10)
texts = text_splitter.split_documents(documents)
persist_directory = ‘PATH’
db2 = Chroma(persist_directory=persist_directory, embedding_function=embeddings)
llm = ChatOpenAI(
temperature=0.2,
openai_api_key=openai_api_key,
model_name=“gpt-3.5-turbo”,
max_tokens = 500
)
qa = RetrievalQA.from_chain_type(llm, chain_type=“map_reduce”, retriever=db2.as_retriever())
因為費用,使用gpt-3.5-turbo, 請問pdf 表格要如何分析,才能正確取用?