Unstructured for document processing

80% of enterprise data exists in difficult-to-use formats like HTML, PDF, CSV, PNG, PPTX, and more. Unstructured effortlessly extracts and transforms complex data for use with every major vector database and LLM framework.


Running the Unstructured API

If you are in the devcontainer then unstructured is already running other wise start unstructured with docker.

Warning this image is around 3GB!!!

docker run -it -p 8000:8000 --rm quay.io/unstructured-io/unstructured-api:latest --port 8000 --host

Call the API

Replace test.pdf with the name of a PDF file on your local machine

In the devcontainer.

curl -X 'POST' 'http://unstructured:8000/general/v0/general' -H 'accept: application/json' -H 'Content-Type: multipart/form-data' -F '[email protected]'

Outside the devconatiner

curl -X 'POST' 'http://localhost:8000/general/v0/general' -H 'accept: application/json' -H 'Content-Type: multipart/form-data' -F '[email protected]'