Right now, I am still working on the data management app. Currently, it runs a server as a Node app to serve the endpoints and connects to a Dockerized SQLServer instance via the mssql library using Tedium under the hood. I had originally used bare Tedium, but it’s a bit…overwrought? When I switched to the mssql wrapper it cut down the amount of code in my factory by at least 2/3 so I’m happy with that switch.

The issue is that while I generally live and breathe SQLServer, having used it professional almost exclusively for almost 20 years, it is proprietary, top-heavy, and expensive. I couldn’t ever assume to release anything related to this project relying on this platform. I could switch to mySQL, of course, and I have tried, but it’s too close to SQLServer in word and deed while also being just a bit off center from what I am familiar with that I keep trying to force it to behave like SQLServer, and then get frustrated when it takes me longer to learn how to do the same thing I normally could, but differently.

Now I am re-considering switching to Firebase’s Firestore, their NoSQL data storage option. I am not a fan of NoSQL, as I might have mentioned before, not because I think there’s anything inherently wrong with it — I’ve used Mongo at my day job before and the paradigm is fun and flexible — but my issue stems from my relational thinking being “persona non grata” when it comes to the file-based mechanisms of NoSQL.

Both relational and NoSQL databases store data, but data rarely exists in such a small quantum that a single table is sufficient. Data is spread out across several different silos within the same database or collection, but again, data elements represented by individual silos are rarely disconnected from one another.

In a relational database, we connect data silos (tables) via keys, primary — the identifier in the “master” table — and foreign — the same identifier, but recorded in another table with other, relatable data. We can then link primary and foreign data via these keys. If one table holds user demographic information, and another table holds user addresses, we can link the two via a shared “user ID” field. This opens a lot of opportunities for modeling data, as we don’t have to have just one-to-one relations between the users and the addresses; one user could have multiple addresses if the address table has several records with the same user ID value. The act of splitting up data so that one table equals one concern is called “normalization”. Data should not be duplicated; if our user table contained addresses, and a user needed to have more than one address, we’d have to duplicate the user info for every unique address info we added. This causes all kinds of headaches for managing the user portion of the data, since changing one aspect like a last name or suffix would require all of that user’s records to be updated. Miss one, and everything becomes out of sync.

In a NoSQL database, relationships are available, but in limited quantities. NoSQL databases tend to store their data in “collections”, and even then, the collections are written to files on disk. NoSQL imposes limits on how much data can or should be put into a collection, because when all of the rules are followed, NoSQL can be blazingly fast. Part of that speed is due to the practice of de-normalizing data; what is a no-no in a relational data world is pretty much the “golden rule” for NoSQL. Using our users and address example, we would store the user and address data together, and we would have to update all user records for the same ID if we changed a last name. Because NoSQL only has to query that one collection from a limited number of entries within that collection, we trade organization for speed and flexibility. With a relational database we have to create tables ahead of time, and if we need to add or remove a column, we have to re-scaffold the table, modify any stored procedures that use it, and any entities outside of the server that consume it. With NoSQL, it’ll take any field we pass to any collection, and it will record it. If we want to add a middle name to our existing collection, we simply pass in a middle name for new records going forward. This is done without having to manipulate the database itself. It does not, however, back-fill previous records, so we would have X number of fields for some records, and X+1 for others, all in the same collection, making bookkeeping regarding which fields are available and which are not kind of a headache.

So with such a glowing review of relational databases and a rather “wet noodle” recounting of NoSQL, why would I consider switching from the former to the latter? Convenience, for one. I had spent a lot of time about a year or two ago setting up a factory for working with Firestore, so I wouldn’t have to reinvent the wheel, just refresh myself on the physics of it. Secondly, I wrote about how I was messing around with PlayCanvas for visualization. I have to put this on the back burner because currently, I would need to host my Node project and the SQLServer somewhere “in the cloud” in order to develop PlayCanvas against it, and that’s a whole set of pain points in and of themselves. Moving to Firestore I would just need to host my Node project somewhere, which is relatively easy to do, as Firestore is hosted by Google and already “in the cloud”. Finally, all of the logic would need to be baked into the server; right now, I have a suite of stored procs which are internal to the SQLServer instance and while I could export and migrate them (and the table schema) to another SQLServer instance, it’s still within the MSSQL ecosystem. Putting the business logic completely within the server would allow for greater portability should I need to move away from Firestore and pick up another NoSQL implementation like Mongo.

I will finish up my bastardized admin utility so I can edit planets and stations (I can currently edit systems, for all the good it’ll do me), and at that point I’ll stop and see about switching to Firestore. Thankfully I have thought ahead: the server is set up in such a way that I can swap out my data access layer for a different implementation and not have to screw around with the core logic, so making the change would only be time consuming, not tedious (pardon the geeky pun).

Sound off!

This site uses Akismet to reduce spam. Learn how your comment data is processed.