Node is very easy to start with. Especially when you are using Express. During development you are runnign your app and your database on the same server, as well as anything else that you need.
In fact, when you start you often have the same setup in production. Everything is on the same machine. To modify your configuration you just log in to the server and quickly change whatever you need.
When your app grows, your needs grow too. You begin to scale. You have multiple front end servers running your application and you also have multiple other servers running your database and many other services.
It is no more easy. You cannot just log into one server. You have to make the change everywhere and it should happen quickly.
What you are facing are two very well known problems called configuration management and service discovery.
Configuration management tells your Node app how it should be configured and also how all other services, like the database for example, are configured.
Service discovery informs your Express app what kind of services and where they are available. For example, it can tell you that there are 3 Redis instances, one available only for writing and two available only for reading.
What is so hard?
Initially, you have all this information in your code. It knows everything it needs to know. In your code you store what database you have, their passwords, their ip addresses and anything else that you might need.
Security
Even when everything is running on the same machine there is already a problem of security. Your git repository is not a place to hold secure stuff. Even when removed it will still be available in the older commits.
This can be easily solved by using a separate file or environment variables, both outside of your source control. This works with one or two servers, but when you have hundred it is not more sufficient.
Each time you have to change something you need to log in everywhere, which is impossible when you have hundreds or even more servers.
Immediate Configuration Changes
This brings us to the next problem. When you make a change to your configuration like for example setting a new database password all front end servers should know about it immediately.
It is just not an option to go and change each and every server manually.
Service Availability & Discovery
Often the bottleneck of a web application is the database. Especially at peak traffic you might be required to add more database replicas for reading.
How do you let your front end servers know about those new replicas? How do they let them know when they are no more available?
Again, changing all files manually is super slow and error prone.
Wouldn’t it be nice your front end servers to automatically and immediately be aware when some configuration changes or when you add another database server?
Fortunately, there are some tools that will help us with that. They are ZooKeeper and zkfarmer.
ZooKeeper
ZooKeeper is a simple solution that can help you with all of the problems I mentioned above.
First of all, if you are working with Node & Express, you are probably not a big fan of Java, but ZooKeeper runs on Java.
Don’t worry it will not mess up your scalability. Not only ZooKeeper is fast, but in your solution it will never be touched by any user requests and it will contain very little data. Therefore it will not use too much memory or processing power.
Actually, we will use ZooKeeper to keep many different files in sync.
What is ZooKeeper?
ZooKeeper is a simple database, with a few very simple features.
First, it stores its data like a file tree. The base node is / then you can have child node /child and even deeper children /child/deep.
Each node can have children, just like a folder, but each node can also contain data.
However, a node is not intented to have lots of data but it is limited to 1MB, and usually ZooKeeper nodes have much less than that. This is more than enough for a simple json structure for example.
Another critical feature is ephemeral nodes. An ephemeral node exist only while the client that created it is still connected to ZooKeeper. As soon as the client is disconnected the node ceases to exist. Those nodes cannot have children, but they can have data.
Also with ZooKeeper it is easy to setup multiple instances which automatically synchronize between themselves.
These are just some of the great features that ZooKeeper provides and there are a few more, which you can read about if you are interested at zookeeper.apache.org.
How do I connect to ZooKeeper?
You can connect directly from Node, just like with any other database. On the ZooKeeper web site they talk only about Java and C interfaces, but there are actually for many more languages and platforms, including Node and JavaScript.
Another option is to use the client utility which ships with ZooKeeper.
zkfarmer
However, today we are not going to do any programming against ZooKeeper, instead we are going to use another great, tool developed by the nice people at Dailymotion called zkfarmer
It is a python tool, but don’t worry again it will not mess up your scalability as no traffic will ever know that it exists.
It is a tool which will sync json files and also folders with different nodes on ZooKeeper, without any programming on your side. Isn’t this nice?
Getting your hands dirty
No more talking, let’s do something. Our example project will have the following servers:
- 1 ZooKeeper server
- 1 Mongo server for writing
- 2 Mongo servers for reading
- 2 Front end servers for running your Node & Express app
It might sound a little bit complicated, but actually ZooKeeper really starts to shine when you are using it with more machines.
We are not going to look how to install Mongo or your node code here. You can find most of this information in my other articles.
Installing ZooKeeper
Let’s install ZooKeeper on your ZooKeeper server.
Java
First you need to install Java. This depends on your machine, so I will not get into details how it is done, but if you are on linux you will have to do one of the following
These are both just the runtime without the development kit, which is all you need to run ZooKeeper.
ZooKeeper
Once you have Java, just download ZooKeeper and extract it from its archive.
When you are on your Linux server here is an example how to do it
You can see the latest version at ZooKeeper Download.
Once you are inside the ZooKeeper folder create a file conf/zoo.cfg with the following content
It instructs where ZooKeeper should listen for incoming connections and where it should store its data. tickTime is the base time in milliseconds used by ZooKeeper when performing various operations, don’t worry about it.
Return to the ZooKeeper folder and run the following command
Congratulations, ZooKeeper should be up and running!
Let’s connect with it by issueing the following command from the ZooKeeper folder
Then do the following
You just played with ZooKeeper. First, you list all nodes under the base / node. Then your create a /services node with a string just_small_data as data. Then you create a child of that node called mongo with a simple {} string.
We’ve just prepared our zkfarmer data for the rest of this article.
Installing zkfarmer
Next we are going to install zkfarmer on the ZooKeeper server and also on each of the three Mongo servers.
Python
Fortunately, python comes with every Linux installation so installing it on your server should not be difficult. You will also need pethon setuptools.
Then clone the zkfarmer repository and install zkfarmer
You are done. You have everything installed. I know it was not fun, but it is worth it.
Service discovery
Go to the Mongo server for writing and to the following.
The enabled file tells that the server is accepting connections. The write that it accepts write and the read file tells that this mongo server doesn’t accept reads, therefore it should be used only for writing.
Now run the following
Right now, when you check your ZooKeeper server you will see that it has the
a node /services/mongo/<mongo-write-server-ip>
and it’s content is
Cool! Right?
When you Ctrl+C your running zkfarmer from above you will see that the node disappears. So when your server dies for any reason, ZooKeeper will immediately know about it.
Now let’s go to one of the mongo read servers and do the following
Then run zkfarmer
Repeat the exact same commands on the other mongo reader server. As you can see this time zookeeper/read contains 1 and zookeeper/write 0, showing that those two servers are only for reading.
Front end servers
On the front end servers, let’s have the foolowing application
where app.js is the starting point of your application. Now let’s run the following in the root folder of your project. This command will not return.
Checkout the content of mongo.json and it should be something like
When one of these server dies, this file will be automatically updated. When you change the value for example of enabled or of anything else, it will also be immediately reflected in this json file.
To use the file all you have to do is include it in your app.js file with
It is that simple. However, we are not ready yet. When the conf/mongo.json is updated, how do you notify your app that something has changed?
Let’s imagine that you are using systemd to run your node application. Just like we did in Hosting & Deploying NodeJS Apps on Ubuntu and Hosting & Deploying NodeJS Apps on CentOS
Then replace the zkfarmer from above with
Each time the conf/mongo.json changes systemd will restart your application and the new file will be loaded.
Adding and removing Mongo servers is a breeze and super easy and your app will always be aware of it.
Configuration Management
The same abilities can be used to propagate configuration changes. Imagine that your node application is using S3 to store files. You can use ZooKeeper to provide your credentials for S3.
Go the server where ZooKeeper is installed and create the following
Notice the --common flag at the end. Unlike the before this time a new node named /services/common is created which just like before contains a json data.
Again you can use it on your front end servers with the following
Each time your configuration changes it will be automatically updated in ZooKeeper and on your front end servers, which then will be restarted.
Next
Right now I can see several places where you can go from here.
First, you can use systemd to manage both your ZooKeeper and your zkfarmer instances.
Second, you can try to run multiple instances of ZooKeeper on different servers. It is really easy, and this makes your ZooKeeper installation much more reliable.
Third, you can make ZooKeeper bind to a host ip for a private network. As a result your ZooKeeper and your configurations will be safe from outside access.
Last, the simple ZooKeeper can help in many more situations. Try to identify a few in your current Node & Express infrastructure.
Other articles that you may like