Here's The Issue:
Loading any web application takes time .
Clearly this was not acceptable, so he set about doing everything he could think of to make things faster as faster can be.
But how does a browser determine when a script has run for too long? As you’d expect, the top 5 vendors implement different techniques and limits…
Firefox uses a timed limit of 10 seconds.
Safari uses a timed limit of 5 seconds.
Chrome does not limit execution but detects when the browser crashes or becomes unresponsive.
Several of the browsers allow you to configure the execution limit parameters, but that’s not something I’d personally recommend. I won’t publish the details here because someone, somewhere will use it as a “fix” for their unresponsive page! Google it if you like, but tweaking browser settings for badly-behaved code does not address the root of the problem.
Here's the Solution:
The best solution is to avoid long-running client-side tasks. Ideally, no event handler should take longer than a few dozen milliseconds. Intensive processing jobs should normally be handled by the server and retrieved with a page refresh or an Ajax call.
In order to render the HTML page for any web app, the node.js application needs to retrieve a lot of data for the aplication in question.
At minimum this means it needs to retrieve the data from the user’s current browsing session to check they’re logged in and it needs to pull in data about the user (e.g. the user’s name, which sites they have access to, their API key and the parameters of their subscription), and about the site in question for the app (site name, unique token etc).
In order to retrieve this data, the application needs to make several calls to internal API functions, many of which could take up to 1 or more seconds to complete. Each request was made by a separate Express middleware, which meant they were running in series. Each request would wait for the previous one to complete before starting.
Since node.js is perfectly suited to running multiple asynchronous functions in parallel, and since a lot of these internal API requests didn’t depend on each other, it made sense to parallelize them — fire off all the requests at once and then continue once they’ve all completed. I achieved this with the aid of the (incredibly useful) async module.
Now solution for this is to cache any data which has already been fetched.
Loading JS and CSS
One way around this problem would be to break each individual component into its own file and include them all individually — that way any files that don’t get changed frequently can sit in the browser’s HTTP cache and not be requested. The problem with this, though, is that there would be a lot of files, some of them incredibly small. And (especially on mobile browsers), the overhead of loading that many individual resources vastly outweighs the extra overhead we had before of re-downloading unchanged content.
Along with this, he also switched all of our static (JS and CSS) asset loading to be served through CloudFront, Amazon Web Service’s content delivery network. This means content is served from the nearest possible geographic location to the user.
We also found some optimizations to prevent loading or storing duplicate code. By de-duplicating the caching and requests based on a digest of each resource’s contents, we were able to cut out unnecessary requests and storage.
With these intelligent changes to resource loading we were able to cut down the total number of HTTP requests necessary to render the app to one (just the page itself), which meant that for users quickly switching between web app for different sites, each page would load within a few seconds.
But I always believe that could do even better, there are now new technologies out like gulp , grunt to minify js and css and used for faster loading them.
Directly Fetching Data
All the user, site and subscription data described in the first two steps was being fetched via a secure internal HTTP API to our internal account system, we can able to cut out the internal HTTP component completely, instead including a node module directly in the application and requesting our databases directly. This allowed us much finer-grained control over exactly what data we were fetching, as well as eliminating a huge amount of overhead
Give More Importance in Client Side
Thanks to all these changes, all that was different between different app for different sites was a config object passed to the loader on initialisation. It didn’t make sense, therefore, to be reloading the entire page when simply switching between sites or between Now and Trends, if all of the important resources had already been loaded.
With a little bit of rearranging of the config object, we were able to include all of the data necessary to load any of the web page accessible to the user. Throw in some HTML5 History with pushState and popState, and we’re now able to switch between sites or pages without making a single HTTP request or even fetching scripts out of the localStorage cache. This means that switching between pages now takes a couple of hundred milliseconds, rather than several seconds.
So far all this has been about reducing load times and getting to a usable web app in the shortest time possible. But we’ve also done a lot to optimise the application itself to make sure it’s as fast as possible.
Avoid Big Complex Libraries — for example, jQuery UI is great for flexibility and working around all manner of browser quirks, but we don’t support a lot of the older browsers so the code bloat is unnecessary. We were able to replace our entire usage of jQuery UI with some clever thinking and 100-or-so lines of concise JS (we also take advantage of things like HTML5’s native drag-and-drop).
Check Weak Spots In Popular Libraries — for example, you can use moment with moment-timezone for a lot of our date and time handling. However moment-timezone is woefully inefficient (especially on mobile) if you’re using it a lot. With a little bit of hacking we added a few optimizations of our own and made it much better for our use-case.
Don't Use Slow Animations — a lot of studies have been posted about this in the past, and it really makes a difference. Simply reducing some CSS transition times from 500ms to 250ms, and cutting others out entirely, made the whole app ui feel snappier and more responsive
Visual Feedback — one of the big things I found when using Trends was that switching between time frames just felt slow. It took under a second, but because there was a noticeable delay between clicking on the timeframe selector and anything actually happening, things felt broken. Fetching new data from our API is always going to take some time — it’s not going to be instant. So instead I used the loading spinner on each widget. Nothing is actually any faster, but the whole experience feels more responsive. There is immediate visual feedback when you click the button, so you know it’s working properly.
Use Flat design For Steady Performance — it may well just be a design trend, but cutting out superficial CSS gradients and box shadows does wonders for render performance. If the browser doesn’t have to use CPU power to render all these fancy CSS effects, you get an instant boost to render performance.
Even after all these optimizations and tweaks, there’s still plenty of room for improvement. Especially on mobile, where CPU power, memory, rendering performance, latency and bandwidth are all significantly more limited than they are on the desktop.
I’ve been using MongoDB in production since mid-2013 and have learned a lot over the years about scaling the database. I do run multiple MongoDB clusters but the one storing the historical data does the most throughput and is the one I shall focus on in this article, going through some of the things we’ve done to scale it.
Use Dedicated Hardware, and SSDs
All my MongoDB instances run on dedicated servers across . I’ve had bad experiences with virtualisation because I have no control over the host sometime, and databases need guaranteed performance from disk i/o. When running on shared storage (e.g., a SAN) this is difficult to achieve unless you can get guaranteed throughput from things like AWS’s Provisioned IOPS on EBS (which are backed by SSDs).
MongoDB doesn’t really have many bottlenecks when it comes to CPU because CPU bound operations are rare (usually things like building indexes), but what really causes problem is CPU steal - when other guests on the host are competing for the CPU resources.
The way we can combat these problems is to eliminate the possibility of CPU steal and noisy neighbours by moving onto dedicated hardware. And we can avoid problems with shared storage by deploying the dbpath onto locally mounted SSDs.
Use Multiple Databases To Benefit From Improved Concurrency
Running the dbpath on an SSD is a good first step but you can get better performance by splitting your data across multiple databases, and putting each database on a separate SSD with the journal on another.
Locking in MongoDB is managed at the database level so moving collections into their own databases helps spread things out - mostly important for scaling writes when you are also trying to read data. If you keep databases on the same disk you’ll start hitting the throughput limitations of the disk itself. This is improved by putting each database on its own SSD by using the directoryperdb option. SSDs help by significantly alleviating i/o latency, which is related to the number of IOPS and the latency for each operation, particularly when doing random reads/writes. This is even more visible for Windows environments where the memory mapped data files are flushed serially and synchronously. Again, SSDs help with this.
The journal is always within a directory so you can mount this onto its own SSD as a first step. All writes go via the journal and are later flushed to disk so if your write concern is configured to return when the write is successfully written to the journal, making those writes faster by using an SSD will improve query times. Even so, enabling the directoryperdb option gives you the flexibility to optimise for different goals (e.g., put some databases on SSDs and some on other types of disk, or EBS PIOPS volumes, if you want to save cost).
It’s worth noting that filesystem based snapshots where MongoDB is still running are no longer possible if you move the journal to a different disk (and so different filesystem). You would instead need to shut down MongoDB (to prevent further writes) then take the snapshot from all volumes.
Use Hash-based Sharding For Uniform Distribution
Every item we monitor (e.g., a server) has a unique MongoID and we use this as the shard key for storing the metrics data.
The query index is on the item ID (e.g. the server ID), the metric type (e.g. load average) and the time range; but because every query always has the item ID, it makes it a good shard key. That said, it is important to ensure that there aren’t large numbers of documents under a single item ID because this can lead to jumbo chunks which cannot be migrated. Jumbo chunks arise from failed splits where they’re already over the chunk size but cannot be split any further.
To ensure that the shard chunks are always evenly distributed, we’re using the hashed shard key functionality in MongoDB 2.4. Hashed shard keys are often a good choice for ensuring uniform distribution, but if you end up not using the hashed field in your queries, you could actually hurt performance because then a non-targeted scatter/gather query has to be used.
Let MongoDB Delete Data With TTL Indexes
The majority of our users are only interested in the highest resolution data for a short period and more general trends over longer periods, so over time we average the time series data we collect then delete the original values. We actually insert the data twice - once as the actual value and once as part of a sum/count to allow us to calculate the average when we pull the data out later. Depending on the query time range we either read the average or the true values - if the query range is too long then we risk returning too many data points to be plotted. This method also avoids any batch processing so we can provide all the data in real time rather than waiting for a calculation to catch up at some point in the future.
Removal of the data after a period of time is done by using a TTL index. This is set based on surveying our customers to understand how long they want the high resolution data for. Using the TTL index to delete the data is much more efficient than doing our own batch removes and means we can rely on MongoDB to purge the data at the right time.
Inserting and deleting a lot of data can have implications for data fragmentation, but using a TTL index helps because it automatically activates PowerOf2Sizes for the collection, making disk usage more efficient. Although as of MongoDB 2.6, this storage option will become the default.
Take Care Over Query And Schema Design
The biggest hit on performance I have seen is when documents grow, particularly when you are doing huge numbers of updates. If the document size increases after it has been written then the entire document has to be read and rewritten to another part of the data file with the indexes updated to point to the new location, which takes significantly more time than simply updating the existing document.
As such, it’s important to design your schema and queries to avoid this, and to use the right modifiers to minimise what has to be transmitted over the network and then applied as an update to the document. A good example of what you shouldn’t do when updating documents is to read the document into your application, update the document, then write it back to the database. Instead, use the appropriate commands - such as set, remove, and increment - to modify documents directly.
Consider Network Throughput & Number Of Packets
Assuming 100Mbps networking is sufficient is likely to cause you problems, perhaps not during normal operations, but probably when you have some unusual event like needing to resync a secondary replica set member.
When cloning the database, MongoDB is going to use as much network capacity as it can to transfer the data over as quickly as possible before the oplog rolls over. If you’re doing 50-60Mbps of normal network traffic, there isn’t much spare capacity on a 100Mbps connection so that resync is going to be held up by hitting the throughput limits.
Also keep an eye on the number of packets being transmitted over the network - it’s not just the raw throughput that is important. A huge number of packets can overwhelm low quality network equipment - a problem we saw several years ago at our previous hosting provider. This will show up as packet loss and be very difficult to diagnose.
Optimizing and Scaling application is an incremental process - there’s rarely one thing that will give you a big win. All of these tweaks and optimisations together will help us to perform application load quickly and back-end operations becomes more faster.
Ultimately, all this ensures that our clients can get excellent product and behind the scenes we could know that data is being written quickly, safely and that we can scale it as we continue to grow.
Thanks for reading!