During development you may be constrained with what model you can use based on the hardware you have. In production hopefully you have the resources to run larger models and therefore increase performance.
There is a lot of infrastructure built around the Open AI API, for example client libraries for each programming language. We can leverage this.
The endpoints we'd ideally like to have implemented for us are the following.
/completions
/chat/completions
/embeddings
/engines/<any>/embeddings
/v1/completions
/v1/chat/completions
/v1/embeddings
We've packaged the gpt4all
model along with local AI into a container.
To start the API
docker run -p 8080:8080 -it --rm ghcr.io/purton-tech/bionicgpt-model-api
and you should get
7:23AM DBG no galleries to load
7:23AM INF Starting LocalAI using 4 threads, with models path: /build/models
7:23AM INF LocalAI version: v1.22.0 (bed9570e48581fef474580260227a102fe8a7ff4)
┌───────────────────────────────────────────────────┐
│ Fiber v2.48.0 │
│ http://127.0.0.1:8080 │
│ (bound on host 0.0.0.0 and port 8080) │
│ │
│ Handlers ............ 31 Processes ........... 1 │
│ Prefork ....... Disabled PID ................. 7 │
└───────────────────────────────────────────────────┘
Then Try
curl http://localhost:8080/v1/models