I don’t usually reach out via blog post, but I’ve entered every permutation that I can think of into Google’s search box, and nothing seems to be working out for me. I am hoping that someone who reads this post can provide some actual insight into why some of my Docker containers, on the same custom network, are not talking to one another while others are.

The Idea

This project is basically a test wherein I have a three-container setup: database, API server, and website. The database is using the MSSQL-2017 image, the API server is using the Node:16-alpine image, and the website is using the nginx:alpine image. These containers are running on a Ubuntu Linux machine on the other side of the room, and I have a public domain pointing to it. My goal is to use a sub-domain to access the website via HTTPS, which should then contact the API server to get and put data from and to the database. Simple!

History

I have been running the MSSQL container for some time with great success. It’s opened port 1433 so I can use SSMS on my desktop machine to connect to it for development purposes. I’ve been developing a non-containerized API server which can connect to the containerized MSSQL server, and a non-containerized website which can connect to the non-containerized API.

Ultimately, I will need to convert my ongoing Star Citizen org website project into this or similar design. It’s using the same database and the same Node-based API server, but right now the API server and website are running locally, outside of any containers, during development. I want to be able to separate the three layers so I can update them without having to affect the others; were I to serve the web content from the same server as the API, if I made web changes I’d have to rebuild and redeploy both the web and API server content, which is not something I’d want to do. And because I ultimately do not know where this project will be hosted, I cannot guarantee that the host environment will be amenable to binding host-visible paths from the containers.

The API and web containers have custom Dockerfiles, and all three targets have their own docker-compose.yml files.

And finally, I am not a Linux guy. I know as much as I need to know in order to do what I need to do in the moment, and through that I am learning, but please know that any assumptions that I will be able to understand something like “ggh -xG | /ferk/murp/dubi/sh .” without context or links to an “idiots guide to Linux” are grossly out of whack.

Dockerfiles

API Server

The API server is running Node, and the server is using Express and the MSSQL library for accessing the database. Those particulars should be well beyond this immediate issue, however, so I will not include my server code here.

# Use Node 16 alpine as parent image
FROM node:16-alpine

# Change the working directory on the Docker image to /app
WORKDIR /app

# Copy package.json and package-lock.json to the /app directory
COPY package.json package-lock.json ./

# Install dependencies
RUN npm install

# Copy the rest of project files into this image
COPY . .

# Expose application port. The API server listens here.
EXPOSE 4791

# Start the application
CMD npm start

Web Server

The website is built using CRA and features two components: one which contacts the API endpoint on load, and one which accepts a custom URL to attempt a connection. I added the latter after the former failed, and I wanted to be able to reshape the endpoint to test several iterations in case I had gotten the hardcoded test endpoint wrong.

# Multi-stage
# 1) Node image for building frontend assets
# 2) nginx stage to serve frontend assets

# Name the node stage "builder"
FROM node:16 AS builder

# Set working directory
WORKDIR /app

# Copy all files from current directory to working dir in image
COPY . .

# install node modules and build assets (custom NPM script setup)
RUN npm install && npm run build:staging

#-----

# nginx state for serving content
FROM nginx:alpine

# Set working directory to nginx asset directory
WORKDIR /usr/share/nginx/html

# Remove default nginx static assets
RUN rm -rf ./*

# Copy static assets from builder stage
COPY --from=builder /app/build .

# Containers run nginx with global directives and daemon off
ENTRYPOINT ["nginx", "-g", "daemon off;"]

These are extremely basic, and the web server Dockerfile was cribbed from a website which explained how to do exactly what I want to do, so I just ran with it as is…I realize this is probably not the best option (I know it’s not the only option) but for testing purposes I assume it should work fine unless this setup is somehow contributing to the overall issue.

The MSSQL Dockerfile is not included because I only pulled the image via the compose file.

Compose Files

Each of these “work” in that they’ll spin up containers from the specified images, and everything seems in order. I know the MSSQL file is OK, and the API file seems OK as it results in a container which can communicate with the database container. However, the web server container, or the web server container in relation to the API container, may include points of failure which are contributing to my issues.

MSSQL

version: '2'
services:
  mssql:
    image: mcr.microsoft.com/mssql/server:2017-latest
    container_name: mssql_2017
    environment:
      - ACCEPT_EULA=y
      - MSSQL_SA_PASSWORD=[MY_AWESOME_PASSWORD]
    ports:
      - 1433:1433
    volumes:
      - mssql-data:/var/opt/mssql/data
      - ./backup:/var/opt/mssql/backup
    restart: unless-stopped
    networks:
      - web_bridge

volumes:
    mssql-data:

networks:
  web_bridge:
    external: true

API Server

version: '2'
services:
  test_api:
    image: test_api:latest
    container_name: test_api
    hostname: test_api
    restart: unless-stopped
    networks:
      - web_bridge
    ports:
      - "4791:4791"
      
networks:
  web_bridge:
    external: true

Web Server

version: "2"
services:
  webtest:
    image: test_web
    container_name: test_web
    networks:
      - web_bridge
    restart: "unless-stopped"
    labels:
      - "traefik.enable=true"
      - "traefik.docker.network=web_bridge"
     
      #Port 443 secure
      - "traefik.http.routers.webtest.rule=Host(`sub.domain.com`)"
      - "traefik.http.routers.webtest.entrypoints=websecure"
      - "traefik.http.routers.webtest.tls=true"
      - "traefik.http.routers.webtest.tls.certresolver=letsEncrypt"

networks:
  web_bridge:
    external: true

The Wrench in the Works

You might notice in the web server’s compose file that there are labels pointing back to Traefik, the edge-router software. Because I want to use a single domain pointed at this machine, I’m using Traefik to take in all port 80 and 443 traffic and distribute it to the appropriate container. I won’t get into how Traefik works — I am not an expert — but know that I do have other containers behind Traefik on this machine which are similarly configured, and which are working just fine. I have confidence that Traefik is configured correctly, but I wanted to let you know, dear reader, that it’s involved in this process but in case you want edification, here’s the compose file I’m using for Traefik.

version: '3'

services:
  reverse-proxy:
    # The official v2 Traefik docker image
    image: traefik:v2.9
    # Name it
    container_name: traefik_proxy
    # Tells Traefik to listen to docker
    command:
      - "--accesslog=true"
      - "--accesslog.filepath=[MY_PATH]/access.log"
      - "--providers.docker=true"
      - "--entrypoints.web.address=:80"
      - "--entrypoints.websecure.address=:443"

      - "--providers.docker.exposedbydefault=false"
      # Lets Encrypt
      - "--certificatesresolvers.letsEncrypt.acme.tlschallenge=true"
      - "--certificatesresolvers.letsEncrypt.acme.email=me@email.com"
      - "--certificatesresolvers.letsEncrypt.acme.storage=/acme.json"
      - "--certificatesresolvers.letsEncrypt.acme.tlschallenge=true"
    networks:
      - web_bridge
    ports:
      # The HTTP port
      - "80:80"
      # Secure HTTP port
      - "443:443"

    restart: unless-stopped
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.traefik-http.entrypoints=web"
    volumes:
      # So that Traefik can listen to the Docker events
      - /var/run/docker.sock:/var/run/docker.sock

networks:
  web_bridge:
    external: true

The Problems

I have run the database and API server containers and have been able to hit a test API endpoint via a web browser on port 4791 to verify that the API server and the database are, indeed, communicating. The connection between those two containers seems solid.

The web server, however, cannot resolve the container-slash-host name http://test_api of the API server. Here’s the dev console and network traffic output from the browser when I am trying to access the site https://sub.domain.com (obviously not the actual URL) which is, in turn, attempting to access the API endpoint http://test_api/dbtest/list which I have verified connects to the database through other means.

Oddly enough, when I enter into the running web server container, I can ping the API server by name.

There’s a few gotchas which probably aren’t helping my situation. The first is that the web server is only accepting traffic on port 443, and only does so thanks to Traefik’s ability to take instruction from an associated container which has the labels set up to demand secure traffic. Referring to the web server compose file above; you can see that it’s using Let’s Encrypt, which Traefik helps to facilitate, so it’s pretty black-box and simple to set up. However, the API server, which I had hoped could live quietly within the Docker network and not have to open itself to the outside world, is not accepting traffic on 443. Because of this, the website throws errors when attempting to contact an unsecure endpoint from a secure application. In an attempt to get around this (for testing purposes), I’ve added the following meta tag to the web application’s header:

<meta http-equiv="Content-Security-Policy" content="upgrade-insecure-requests">

I suspect that, while this might not be the root cause of my name resolution issues, it’s certainly not helping the situation. I haven’t yet gotten to the point where I’ll need to investigate how to get Traefik, or how to get working period, a secure connection for the API server, or if that’s possible or even necessary when the idea is that the web server is meant to talk to the API server strictly through the Docker network.

I have read that Docker custom networks attempt to use the host’s DNS info for name resolution, but I have also read that Docker is smart enough to use its own internal networking DNS, which is how it manages to allow containers to communicate via container names. I have set the hostname for the API server in the docker-compose file so there’s no ambiguity. I admit that I have not tried the IP, but I don’t know if I could rely on the same IP being assigned to the API container in the future, or in another Docker environment.

How You Can Help

My goal is to expose the web server to the world on port 443, through Traefik and using built-in Let’s Encrypt certificate assignment. The web server should be able to get/post data from/to an API server hosted in another container within the same Docker network.

The immediate issue is that the web server apparently cannot resolve the Docker network name of the API server container despite the Docker documentation and independent blog posts around the Internet claiming that this is entirely possible; indeed, I have two containers in play which are doing this without issue.

I’m hoping I’ve dumped enough info here for someone to tap their screen and say out loud “ah, you moron. There’s your issue!”, but to comment on it in a nicer fashion. I am aware that this might not be the best way to go about achieving what I am trying to achieve, and that’s OK; this is mainly a test that covers a lot of bases that I want to see working. If there’s a piece of the puzzle as explained that you might see out of order, or missing, please let me know.

1 Comment

  • Nimgimli

    November 27, 2022 - 9:30 AM

    I am not an expert. but when I run, say, a WordPress site in a docker instance, and I specify the database hostname in wp-config, I have to use a weird hostname.

    For example my compose file says

    mysql_wp:
    image: mysql:latest
    volumes:

    Since I’m not setting a hostname it seems to default to mysql_wp as hostname.

    But in wp-config I had to set the hostname to be: wordpress_mysql_wp_1

    WordPress being the name of the container, mysql_wp being the hostname and 1 is coming from I dunno where.

    But I got that value by doing a “docker ps” to list the running containers and I grabbed the hostname from there.

    Disclaimer: Self taught and not an expert and I dunno if this is even the issue.

Sound off!

This site uses Akismet to reduce spam. Learn how your comment data is processed.