Table of contents of the article:
A Content Delivery Network (CDN) is a geographically distributed network of servers and their associated data centers that help deliver content to users with minimal delay.
It does this by bringing content closer to users' geographic location through strategically placed data centers called Point of Presence (PoP). CDNs also involve caching servers that store and deliver cached files to speed web page load times and reduce bandwidth consumption. Below we will go into more details on how exactly CDNs work.
CDN services are essential for businesses that rely on delivering content to users.
Consider the following:
- Great news publications with readers in many countries
- Social media sites that need to deliver multimedia content on user feeds
- Entertainment websites such as Netflix that offer high definition web content in real time
- E-commerce platforms with millions of customers
- Gaming companies with highly graphic content accessed by geographically distributed users
All of these companies must ensure accelerated content delivery, service availability, resource scalability, and web application security. This is where CDN services shine as a unique advantage.
A brief history of CDNs
CDNs (or CDNs) were created nearly twenty years ago to address the challenge of quickly sending huge amounts of data to end users over the Internet. Today they have become the driving force behind website content distribution and continue to be researched and improved upon by academia and commercial developers.
The first content distribution networks were built in the late 90s and are still responsible for 15-30 percent of global internet traffic. Subsequently, the growth of broadband content and associated audio, video and data streaming over the Internet has seen the development of more CDNs. In general, the evolution of CDNs can be classified into four generations:
Pre-training period: Before the actual creation of the CDNs, the necessary technologies and infrastructures were developed. This period was characterized by the growth of server farms, hierarchical caching, improvements in Web servers, and the deployment of caching proxies. Mirroring, caching, and multihoming were also technologies that paved the way for the creation and growth of CDNs.
Generation XNUMX: Early iterations of CDN focused primarily on delivering dynamic and static content, as these were the only two types of content on the web. The main mechanism was therefore the creation and implementation of replicas, intelligent routing and edge computing methods. The apps and information have been split across the servers.
Second Generation: CDNs focused on streaming audio and video content or Video-on-Demand services such as Netflix for users and news services have arrived. This generation has also pioneered the provision of website content to mobile users and has seen the use of P2P techniques and cloud computing.
Third Generation: The third generation of CDNs is where we are now and is still evolving with new research and development. We can expect CDNs to be increasingly shaped for the community in the future. This means that the systems will be driven by average users and normal individuals. Self-configuration should be the new technological mechanism, as well as self-management and autonomous delivery of content. The quality of experience for end users is expected to be the main driver for the future.
CDNs initially evolved to cope with extreme bandwidth pressures, as video streaming was growing in demand along with the number of CDN service providers. With connectivity advances and new consumer trends in each generation, the price of CDN services has dropped, enabling it to become a mass-market technology. And as cloud computing has become widely adopted, CDNs have played a key role in all levels of business operations. They are critical for models such as SaaS (Software as a service), IaaS (Infrastructure as a service), PaaS (Platform as a service), and BPaaS (Business Process as a service).
How does a CDN work?
CDNs work by reducing the physical distance between a user and the origin (a web or an application server). It is a globally distributed server network that stores content much closer to the server than the source. To understand this better, it is useful to examine how a user accesses web content from a website with and without a CDN.
Without CDN
When a user accesses the website in their browser, they establish a connection similar to the one in the following figure. The website name resolves to an IP address using local DNS or LDNS (such as the DNS server provided by the ISP or a public DNS resolution server). If the DNS or LDNS cannot resolve the IP address, it recursively requests resolution from the upstream DNS servers. Ultimately, the request can go to the authoritative DNS server where the zone is hosted. This DNS server resolves the address and returns it to the user.
Then the user's browser connects directly to the source and downloads the content of the website. Each subsequent request is served directly from the source and static resources are cached locally on the user's computer. If another user from a similar or other location tries to access the same site, they will perform the same sequence. Each time, user requests will hit the source and the source will respond with the content. Each step along the way adds a delay, or "latency". If the source is far from the user, response times will suffer from significant latency, resulting in a poor user experience.
With a CDN
In the presence of a CDN, however, the process is slightly different. When user-initiated DNS requests are received by its LDNS, it forwards the requests to one of the CDN's DNS servers. These servers are part of the Global Server infrastructure Load balancer (or "GSLB"). The GSLB helps with load balancing functionality which literally measures the entire internet and tracks information about all available resources and their performance. With this knowledge, the GSLB resolves the DNS request using the best performing edge address (usually near the user). An “edge” is a collection of servers that caches and serves web content.
Once the DNS resolution is complete, the user sends the HTTPS request to the edge. When the edge receives the request, the GSLB servers help the edge servers forward requests along the optimal path to the origin. Then the perimeter servers retrieve the requested data, deliver it to the end user who requested it and store it locally. All subsequent user requests will be served by the local dataset without having to query the origin server . Content stored on the edge can be delivered even if the source becomes unavailable for any reason.
Why use a CDN?
CDNs help businesses deliver content to end users effectively by minimizing latency, improving website performance, and reducing bandwidth costs.
Another unique feature of CDNs is that it allows edge servers to pre-prefect content in advance. This ensures that the data you deliver is stored across all CDN data centers. In CDN parlance, these data centers are called Point of Presence (or “POP”). PoPs help minimize round-trip time by bringing web content closer to the website visitor.
For example, suppose you run an advertising campaign and advertise your service or product to millions of potential customers. You might expect a large number of customers to rush to your site after reading the post. If you're dealing with influencers who have good audience engagement rates, traffic volume can see an even greater spike. Can you be sure that your origin server will be able to handle this volume spike all at once?
In such a scenario, CDNs can help distribute the load among the edge servers and all will receive the response. Since only a small fraction of the requests will reach the origin, your servers will not experience massive traffic spikes, 502 errors and overloaded upstream network channels.
Advantages of CDNs
Depending on the size and needs of your business, the benefits of CDNs can be divided into 4 different components:
Improvement of the loading times of the pages of the site
Web By enabling web content delivery closer to website visitors using a nearby CDN server (among other optimizations), visitors experience faster web page load times. Visitors are generally more likely to click or bounce off a website with a high page load time. This can also negatively affect the web page ranking on search engines. So having a CDN can reduce bounce rates and increase the amount of time people spend on the site. In other words, a website that loads quickly will keep more visitors around for longer.
Reduction of bandwidth costs
Whenever an origin server responds to a request, bandwidth is consumed. The costs of bandwidth consumption are a major expense for companies. Through caching and other optimizations, CDNs are able to reduce the amount of data an origin server has to provide, thereby reducing hosting costs for website owners.
Increased availability and redundancy of content
Large amounts of web traffic or hardware failure can disrupt normal website operation and cause downtime. Due to their distributed nature, a CDN can handle more web traffic and resist hardware failure better than many origin servers. Also, if one or more CDN servers go offline for some reason, other operational servers can pick up web traffic and keep the service uninterrupted.
Website security improvement
The same process by which CDNs handle traffic spikes makes it ideal for mitigating DDoS attacks. These are attacks in which malicious actors overwhelm your application or origin servers by sending out a huge amount of requests. When the server goes down due to volume, the downtime can affect the availability of the website to customers. A CDN essentially acts as a DDoS protection and mitigation platform with the GSLB and edge servers distributing the load equally across the entire network capacity. CDNs can also provide certificate management and automatic generation and renewal of certificates.
How else can a CDN be useful?
A CDN is not limited to the benefits explained above. A modern CDN platform offers many more benefits to your business and technical teams.
It can be used to manage access from different regions of the planet. While you allow access for some regions, you can deny access for others.
You can easily download the application logic on the perimeter and close to your customers. Request / response headers and body can be processed and transformed, requests can be routed between different sources based on request attributes, or authentication tasks can be delegated to the edge.
Large amounts of traffic require an infrastructure for logging and processing for further analysis. CDNs collect logs and provide an interface to conveniently analyze visitor generated data.
It is only natural that something becomes easy to use when you are already familiar with it. For this reason, CDN edges are normally based on NGINX. This means that you can perform tasks using standard NGINX directives.
For example, our team spent thousands of hours extending NGINX.
Data security and CDN
Information security is an integral part of a CDN. CDNs help protect a website's data in the following ways.
By providing TLS / SSL certificates
CDN can help secure a site by providing Transport Layer Security (TLS) / Secure Sockets Layer (SSL) certificates that ensure a high standard of authentication, encryption, and integrity. These are certificates that guarantee compliance with certain protocols in the transfer of data between a user and a website.
When data is transferred over the Internet, it becomes vulnerable to interception by malicious actors. This is solved by encrypting the data using a protocol such that only the intended recipient can decrypt and read the information. TSL and SSL are such protocols that encrypt data sent over the Internet. It is a more advanced version of Secure Sockets Layer (SSL). You can know if a website uses TLS / SSL certification if it starts with https: // instead of http: //, suggesting that it is secure enough for communication between a browser and a server.
Mitigate DDoS attacks
Because the CDN is deployed at the edge of the network, it acts as a high-security virtual barrier against attacks on your website and web application. The sparse infrastructure and on-edge location also make a CDN ideal for stop DDoS floods . Since these floods need to be mitigated outside of your core network infrastructure, the CDN will process them across different PoPs based on their origin, preventing server saturation.
Blocking of bots and crawlers
CDNs are also capable of blocking threats and preventing abusive bots and crawlers from using bandwidth and server resources. This helps limit other spam and hacking attacks and keeps bandwidth costs low.
Static and dynamic acceleration
Static content refers to those assets of yours that do not need to be generated, processed or modified before being delivered to end users. It could be images or other multimedia files, binary files of any kind or static parts of your application such as HTML, CSS, JavaScript libraries or even JSON, HTML or any type of dynamic response that doesn't change often. It is possible to preload such content in advance, as mentioned above. Then, when you need to invalidate that content and remove it from Edge Servers, you can delete the desired paths.
Dynamic acceleration applies to something that cannot be cached on the edge due to its dynamic nature. Imagine a WebSocket application listening for events from a server or API endpoint whose response is different, depending on credentials, geographic location, or other parameters. It is difficult to exploit the edge caching mechanism in a way similar to caching static content. In some cases, tighter integration between the app and the CDN can help; however, in some cases, something other than caching should be used. For dynamic acceleration, the optimized network infrastructure of CDNs and advanced request / response routing algorithms are used.
Billing model or "What do I pay?"
Conventionally in a CDN, you pay for the traffic consumed by your end users and the amount of requests. Additionally, HTTPS requests require more processing resources than HTTP requests, which creates a greater load on the CDN provider's equipment. Because of this, you may pay additional fees for HTTPS requests, while HTTP requests are not billed at an additional cost.
When the calculation moves to the limit, the CPU becomes a billing object. Requests might have different processing pipelines and, as a result, will require different amounts of CPU time. It is not practical to invoice based on the request count; it is more practical to bill by amount of traffic + CPU time used.
Who Uses CDN?
CDN is used by companies of various sizes to optimize network presence and availability and provide a superior user experience for customers. A CDN is particularly popular in the following industries:
- Advertising
- Digital publishing
- Online video and audio
- Game CDN
- Online education
- E-commerce
- Public sector
- Government
- Financial services
- SaaS