80% of enterprise data exists in difficult-to-use formats like HTML, PDF, CSV, PNG, PPTX, and more. Unstructured effortlessly extracts and transforms complex data for use with every major vector database and LLM framework.
If you are in the devcontainer
then unstructured is already running other wise start unstructured with docker.
Warning this image is around 3GB!!!
Replace test.pdf
with the name of a PDF file on your local machine
In the devcontainer
.
Outside the devconatiner