install something from docker

the docker build that comes with the apt-install from ubuntu does not always cut the cake

using portainer:

sudo docker volume create portainer_data sudo docker run -d -p 8000:8000 -p 9443:9443 –name portainer –restart=always -v /var/run/docker.sock:/var/run/docker.sock -v portainer_data:/data portainer/portainer-ce:2.21.3

using github to build containers:

see github actions (and actions.rst in the doc)

Project Dockerfiles:

the project dockerfiles are within their own directory: - leecher - embedder - postgres

leecher:

this container waits for an incoming file in the dockervolume : and then converts pdf to txt

embedder:

this container chops the txt-file and inserts into a postgres database

postgres:

this a combination of management and a vectordatabase (contains the script to create the ‘document_chunks’ table

Containers

  • the Dockerfiles help to build docker images

  • the docker compose file help to build containers

compose

there are docker-compose.yml files in : - postgres - rag

The one in postgres : - creates 2 containers (database + management) - inits the postgres database as a vectordatabase and creates a table

The one in rag : - creates a volume where you can copy pdf files - creates a volume where converted text files are stored - defines environment variables to access the database (to be changed on your environment!) docker compose up -d

how to copy pdf files to the container?

  • the dockervolumes are created using the docker-compose files

  • dockervolumes link a directory in docker to a directory on the filesystem

cp 270123.pdf /var/snap/docker/common/var-lib-docker/volumes/rag_leech_data/_data

using github actions to build docker containers

each time there is is a push toward the github repository, automatically a build of the docker images gets triggered.

I use multiple Dockerfiles, thus multiple Docker images, and I couldn’t not figure out the easy-way how to build them with a single script.

So … multiple scripts, which each build a single image.

By default the image gets a name like this repo:main. This can be modified!

TRICK : multiple containers with github

  • copy docker-publish.yml to docker-publish2.yml

  • change IMAGE_NAME: ‘najnesnaj/embed’

(./embedder is de directory in the repo that contains the Dockerfile)

and change :
# Build and push Docker image for embedder

first login to github

echo “ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxzK” | docker login ghcr.io -u najnesnaj –password-stdin

Login Succeeded

now I can download the image naj@naj-Latitude-5520:/usr/src/ragtime$ sudo docker pull ghcr.io/najnesnaj/embed:main

elasticsearch:

  • sudo docker run -d –name elasticsearch -p 9200:9200 -e “discovery.type=single-node” docker.elastic.co/elasticsearch/elasticsearch:8.15.3

  • sudo docker run –name kibana -p 5601:5601 –link elasticsearch:elasticsearch kibana:8.15.3

Kibana has not been configured.

Go to http://0.0.0.0:5601/?code=003763 to get started.

in the elasticsearch container generate token (bin/elasticsearch-create-enrollment-token -s kibana) elasticsearch-users useradd test elasticsearch-users passwd test elasticsearch-users roles -a kibana_admin test