80% of enterprise data exists in difficult-to-use formats like HTML, PDF, CSV, PNG, PPTX, and more. Unstructured effortlessly extracts and transforms complex data for use with every major vector database and LLM framework.
If you are in the
devcontainer then unstructured is already running other wise start unstructured with docker.
Warning this image is around 3GB!!!
test.pdf with the name of a PDF file on your local machine