Table of contents of the article:
WP Rocket is one of the most used WordPress plugins to improve website performance. According to the commercial declarations reported on their website, the plugin is currently used by over 3 million WordPress installations.
However, considering the huge circulation, multiple license plans and the fact that WP Rocket is one of the most coveted and most pirated WordPress plugins ever, it is safe to say that WP Rocket is currently running on at least 12 million WordPress sites worldwide.
This plugin offers many useful features to optimize site caching and improve page load times. WP Rocket saves the cache as files on disk and offers many configuration options to optimize site performance.
Although WP Rocket is a very popular caching solution compared to other competitors, has a serious bug that could affect the ranking and indexing of your site. This bug affects the "last-modified" field which is used by search engines to understand if a page has been modified recently. If this field is not updated correctly, search engines may not index your site correctly and this could lead to a decrease in organic traffic.
Therefore, it is important to take this WP Rocket bug into consideration when considering using this plugin for your online business. In this text, we will take a detailed look at this bug and the consequences it could have on your website. If you want to get the most out of your online business and make sure your site ranks well in search engines, you should avoid using WP Rocket until this bug is fixed permanently.
An overview of the Last Modified HTTP Header.
The "Last-Modified" HTTP header is a header field used in client-server communication to provide information on the date and time of the last modification of a web resource. This field is sent by the server to the client as part of the HTTP response e contains the date and time the resource was last modified.
Its main purpose is to help clients determine if a web resource has changed since it was last accessed, so that they can avoid downloading the same resource again if it hasn't changed.
In this way, the “Last-Modified” field helps to reduce network traffic and improve website performance. When a client requests a resource, it can include an “If-Modified-Since” header field in its request, containing the date and time the resource it saved was last modified. If the resource has not been modified since it was last accessed, the server responds with a “304 Not Modified” status, thus avoiding transferring the resource again.
Last Modified and Google crawlers.
The "Last-Modified" HTTP header field provides information about the date and time a web resource was last modified. However, if this field is configured incorrectly and always returns the current date, disregarding the effective date the resource was last modified, some unintended consequences may occur for Google crawlers.
In particular, if the "Last-Modified" header always returns the current date and time, Google crawlers could misinterpret the resource as a new page, even if it hasn't actually been modified. This could lead to a decrease in website performance in terms of indexing and positioning in search results.
Also, if the "Last-Modified" field is set incorrectly, the server may send a "200 OK" response even though the resource hasn't been modified since it was last accessed. This could lead to an increase in network traffic and a decrease in website performance, as Google's crawlers could download the same resource again, even though it has not actually been modified.
Incorrect Last-Modified and Google Bot Crawl Budget.
Google's crawl budget represents the amount of a website's resources that the search engine is willing to devote to crawling and indexing the website. This budget depends on many factors, including the quality of the website, how often the content is updated, and the speed of the website.
An always up-to-date Last-Modified HTTP header can be detrimental to Google's crawl budget as it could cause the search engine to devote unnecessary resources to crawling resources that haven't actually been modified. In particular, if the Last-Modified header is updated every time the cache is regenerated, even if the content hasn't changed, Google could interpret the resource as a new page and spend unnecessary resources crawling that page.
This would lead to ineffective use of Google's crawling budget, which could be better used to crawl other more relevant and up-to-date resources. Also, ineffective use of Google's crawl budget could slow down the indexing of the website's most important resources.
To avoid this problem, it is important to ensure that the Last-Modified HTTP header is only updated when the resource has actually changed. In this way, it is possible to guarantee correct management of the Google crawl budget and improve the overall indexing of the website.
WP Rocket and the Last-Modified bug in detail.
When WP Rocket caches WordPress posts and pages to disk, replaces the Last-Modified date of the article with that of the generation or regeneration of the cache. This means that every time the cache is regenerated, WP Rocket updates the Last-Modified field, even if the post or page may not have been modified or updated in years.
Specifically, WP Rocket determines the date of the Cache file produced on disk and proposes that date as the Last modified HTTP header. The offending code is the following:
As we can see from the above code, WP Rocket returns the Last-Modified header, formatting the date retrieved from the cache date on disk via the filemtime() PHP function.
PHP's filemtime() function is a native function that returns the last modification date of a file. This feature is very useful for checking if a file has changed since a previous version and for updating the file cache.
This bug may cause some problems for search engines and other crawlers that use the Last-Modified field to determine if a resource has been modified since it was last accessed. If the Last-Modified field is updated every time the cache is regenerated, even if the post or page hasn't been modified, then crawlers could misinterpret the resource as a new page, even if it hasn't actually been modified.
To avoid the Last-Modified bug, WP Rocket should have updated the HTTP header in the produced cache files by retrieving the last modified date of the post or page directly from the WordPress database. This way, the Last-Modified field would always be updated only when the resource was actually modified, ensuring proper indexing by search engines.
To implement this solution, WP Rocket should have used the WordPress “get_post_modified_time()” function to retrieve the last modified date of the post or page. This function returns the date and time the post or page was last modified, which can be used to properly update the Last-Modified field in the HTTP header.
We have officially reported the bug to WP Rocket.
The error we have discovered is very serious especially considering that it is not an oversight by the developers, but precisely the lack of fundamentals and cornerstones on what the Last-Modified HTTP header is and what it is for. Therefore, developing a plugin intended for millions, tens of millions of websites without having in mind the basic logic of how header management should work is not a serious but very serious fact.
We therefore wanted to report the matter directly to WP Rocket, including in the reporting request a hypothetical example, in which a hypothetical post published on December 24th to wish the readers Christmas wishes, would result in a subsequent Last-Modified header, and after the December 24th itself, although in no way the post has been edited.
They have ignored us for now and instead of thanking us …
It may seem absurd that in the face of such a serious problem reported to technical support, they limited themselves to informing us that they cannot respond to our request because our Wp Rocket license has expired.
It's not a joke, but it's really what they replied to us in the email that we report below.
One wonders if it is right that a similar company with such carelessness should continue to deserve the full trust of their customers, who unaware of the seriousness of their behavior go to create real damage to their customers' sites without even knowing it or at least want to know.
For now we have solicited the request using the right names and specifying that we are not the ones asking them for support, but we are the ones offering it free of charge to them given their ignorance on the subject.
We are hopeful of a FIX as soon as possible and in the meantime we advise you to monitor the situation on Google's crawl statistics and measure any negative effects if you are already using it.