I don’t usually reach out via blog post, but I’ve entered every permutation that I can think of into Google’s search box, and nothing seems to be working out for me. I am hoping that someone who reads this post can provide some actual insight into why some of my Docker containers, on the same custom network, are not talking to one another while others are.
This project is basically a test wherein I have a three-container setup: database, API server, and website. The database is using the MSSQL-2017 image, the API server is using the Node:16-alpine image, and the website is using the nginx:alpine image. These containers are running on a Ubuntu Linux machine on the other side of the room, and I have a public domain pointing to it. My goal is to use a sub-domain to access the website via HTTPS, which should then contact the API server to get and put data from and to the database. Simple!
I have been running the MSSQL container for some time with great success. It’s opened port 1433 so I can use SSMS on my desktop machine to connect to it for development purposes. I’ve been developing a non-containerized API server which can connect to the containerized MSSQL server, and a non-containerized website which can connect to the non-containerized API.
Ultimately, I will need to convert my ongoing Star Citizen org website project into this or similar design. It’s using the same database and the same Node-based API server, but right now the API server and website are running locally, outside of any containers, during development. I want to be able to separate the three layers so I can update them without having to affect the others; were I to serve the web content from the same server as the API, if I made web changes I’d have to rebuild and redeploy both the web and API server content, which is not something I’d want to do. And because I ultimately do not know where this project will be hosted, I cannot guarantee that the host environment will be amenable to binding host-visible paths from the containers.
The API and web containers have custom Dockerfiles, and all three targets have their own docker-compose.yml files.
And finally, I am not a Linux guy. I know as much as I need to know in order to do what I need to do in the moment, and through that I am learning, but please know that any assumptions that I will be able to understand something like “ggh -xG | /ferk/murp/dubi/sh .” without context or links to an “idiots guide to Linux” are grossly out of whack.
The API server is running Node, and the server is using Express and the MSSQL library for accessing the database. Those particulars should be well beyond this immediate issue, however, so I will not include my server code here.
# Use Node 16 alpine as parent image FROM node:16-alpine # Change the working directory on the Docker image to /app WORKDIR /app # Copy package.json and package-lock.json to the /app directory COPY package.json package-lock.json ./ # Install dependencies RUN npm install # Copy the rest of project files into this image COPY . . # Expose application port. The API server listens here. EXPOSE 4791 # Start the application CMD npm start
The website is built using CRA and features two components: one which contacts the API endpoint on load, and one which accepts a custom URL to attempt a connection. I added the latter after the former failed, and I wanted to be able to reshape the endpoint to test several iterations in case I had gotten the hardcoded test endpoint wrong.
# Multi-stage # 1) Node image for building frontend assets # 2) nginx stage to serve frontend assets # Name the node stage "builder" FROM node:16 AS builder # Set working directory WORKDIR /app # Copy all files from current directory to working dir in image COPY . . # install node modules and build assets (custom NPM script setup) RUN npm install && npm run build:staging #----- # nginx state for serving content FROM nginx:alpine # Set working directory to nginx asset directory WORKDIR /usr/share/nginx/html # Remove default nginx static assets RUN rm -rf ./* # Copy static assets from builder stage COPY --from=builder /app/build . # Containers run nginx with global directives and daemon off ENTRYPOINT ["nginx", "-g", "daemon off;"]
These are extremely basic, and the web server Dockerfile was cribbed from a website which explained how to do exactly what I want to do, so I just ran with it as is…I realize this is probably not the best option (I know it’s not the only option) but for testing purposes I assume it should work fine unless this setup is somehow contributing to the overall issue.
The MSSQL Dockerfile is not included because I only pulled the image via the compose file.
Each of these “work” in that they’ll spin up containers from the specified images, and everything seems in order. I know the MSSQL file is OK, and the API file seems OK as it results in a container which can communicate with the database container. However, the web server container, or the web server container in relation to the API container, may include points of failure which are contributing to my issues.
version: '2' services: mssql: image: mcr.microsoft.com/mssql/server:2017-latest container_name: mssql_2017 environment: - ACCEPT_EULA=y - MSSQL_SA_PASSWORD=[MY_AWESOME_PASSWORD] ports: - 1433:1433 volumes: - mssql-data:/var/opt/mssql/data - ./backup:/var/opt/mssql/backup restart: unless-stopped networks: - web_bridge volumes: mssql-data: networks: web_bridge: external: true
version: '2' services: test_api: image: test_api:latest container_name: test_api hostname: test_api restart: unless-stopped networks: - web_bridge ports: - "4791:4791" networks: web_bridge: external: true
version: "2" services: webtest: image: test_web container_name: test_web networks: - web_bridge restart: "unless-stopped" labels: - "traefik.enable=true" - "traefik.docker.network=web_bridge" #Port 443 secure - "traefik.http.routers.webtest.rule=Host(`sub.domain.com`)" - "traefik.http.routers.webtest.entrypoints=websecure" - "traefik.http.routers.webtest.tls=true" - "traefik.http.routers.webtest.tls.certresolver=letsEncrypt" networks: web_bridge: external: true
The Wrench in the Works
You might notice in the web server’s compose file that there are labels pointing back to Traefik, the edge-router software. Because I want to use a single domain pointed at this machine, I’m using Traefik to take in all port 80 and 443 traffic and distribute it to the appropriate container. I won’t get into how Traefik works — I am not an expert — but know that I do have other containers behind Traefik on this machine which are similarly configured, and which are working just fine. I have confidence that Traefik is configured correctly, but I wanted to let you know, dear reader, that it’s involved in this process but in case you want edification, here’s the compose file I’m using for Traefik.
version: '3' services: reverse-proxy: # The official v2 Traefik docker image image: traefik:v2.9 # Name it container_name: traefik_proxy # Tells Traefik to listen to docker command: - "--accesslog=true" - "--accesslog.filepath=[MY_PATH]/access.log" - "--providers.docker=true" - "--entrypoints.web.address=:80" - "--entrypoints.websecure.address=:443" - "--providers.docker.exposedbydefault=false" # Lets Encrypt - "--certificatesresolvers.letsEncrypt.acme.tlschallenge=true" - "--certificatesresolvers.letsEncrypt.firstname.lastname@example.org" - "--certificatesresolvers.letsEncrypt.acme.storage=/acme.json" - "--certificatesresolvers.letsEncrypt.acme.tlschallenge=true" networks: - web_bridge ports: # The HTTP port - "80:80" # Secure HTTP port - "443:443" restart: unless-stopped labels: - "traefik.enable=true" - "traefik.http.routers.traefik-http.entrypoints=web" volumes: # So that Traefik can listen to the Docker events - /var/run/docker.sock:/var/run/docker.sock networks: web_bridge: external: true
I have run the database and API server containers and have been able to hit a test API endpoint via a web browser on port 4791 to verify that the API server and the database are, indeed, communicating. The connection between those two containers seems solid.
The web server, however, cannot resolve the container-slash-host name http://test_api of the API server. Here’s the dev console and network traffic output from the browser when I am trying to access the site https://sub.domain.com (obviously not the actual URL) which is, in turn, attempting to access the API endpoint http://test_api/dbtest/list which I have verified connects to the database through other means.
Oddly enough, when I enter into the running web server container, I can ping the API server by name.
There’s a few gotchas which probably aren’t helping my situation. The first is that the web server is only accepting traffic on port 443, and only does so thanks to Traefik’s ability to take instruction from an associated container which has the labels set up to demand secure traffic. Referring to the web server compose file above; you can see that it’s using Let’s Encrypt, which Traefik helps to facilitate, so it’s pretty black-box and simple to set up. However, the API server, which I had hoped could live quietly within the Docker network and not have to open itself to the outside world, is not accepting traffic on 443. Because of this, the website throws errors when attempting to contact an unsecure endpoint from a secure application. In an attempt to get around this (for testing purposes), I’ve added the following meta tag to the web application’s header:
<meta http-equiv="Content-Security-Policy" content="upgrade-insecure-requests">
I suspect that, while this might not be the root cause of my name resolution issues, it’s certainly not helping the situation. I haven’t yet gotten to the point where I’ll need to investigate how to get Traefik, or how to get working period, a secure connection for the API server, or if that’s possible or even necessary when the idea is that the web server is meant to talk to the API server strictly through the Docker network.
I have read that Docker custom networks attempt to use the host’s DNS info for name resolution, but I have also read that Docker is smart enough to use its own internal networking DNS, which is how it manages to allow containers to communicate via container names. I have set the hostname for the API server in the docker-compose file so there’s no ambiguity. I admit that I have not tried the IP, but I don’t know if I could rely on the same IP being assigned to the API container in the future, or in another Docker environment.
How You Can Help
My goal is to expose the web server to the world on port 443, through Traefik and using built-in Let’s Encrypt certificate assignment. The web server should be able to get/post data from/to an API server hosted in another container within the same Docker network.
The immediate issue is that the web server apparently cannot resolve the Docker network name of the API server container despite the Docker documentation and independent blog posts around the Internet claiming that this is entirely possible; indeed, I have two containers in play which are doing this without issue.
I’m hoping I’ve dumped enough info here for someone to tap their screen and say out loud “ah, you moron. There’s your issue!”, but to comment on it in a nicer fashion. I am aware that this might not be the best way to go about achieving what I am trying to achieve, and that’s OK; this is mainly a test that covers a lot of bases that I want to see working. If there’s a piece of the puzzle as explained that you might see out of order, or missing, please let me know.