Table of contents of the article:
Introduction
Over the past few decades, the Internet has become an essential part of the daily lives of billions of people. Surfing the web is a common activity for both personal and professional purposes, involving individuals of all ages and backgrounds. However, this growing dependence on the Internet has also led to an increase in illegal behavior, ranging from simple insults on social media to more serious crimes such as computer fraud, online scams, and child pornography crimes.
Behind every digital action there are traces, often recorded in web server logs, or files that collect information about the activities carried out by users during navigation. This data can reveal who visited a particular page, uploaded a file or left a comment.
For this reason, webmasters, hosting providers and systems engineers sometimes find themselves having to respond to requests from judicial authorities, who need to access web server logs to identify those responsible for illegal acts. However, the management of this information must be in accordance with the law, in particular respecting Italian legislation and the General Data Protection Regulation (GDPR).
This article looks at how to keep web server logs in a legally compliant manner, both in compliance with the requirements of the authorities and those of personal data protection, offering practical guidance to IT professionals.
What is GDPR?
Il General Data Protection Regulation (GDPR), introduced by the European Union and operational since 25 May 2018, represents one of the most important regulatory frameworks on the protection of personal data. Its main objective is to ensure that the data collected and processed by organizations are managed in a transparent, secure and respectful manner of the rights of individuals.
The GDPR applies to any entity that processes personal data of EU citizens, whether operating within the Union or outside. Personal data means any information that can identify a natural person, directly or indirectly, such as a name, IP address, email or location information.
What is a web server log?
Un web server is a software designed to receive and respond to HTTP and HTTPS requests from clients, typically browsers or other applications. In other words, it is the component that allows the viewing of websites and web applications on the Internet. Among the most used and well-known web servers we find:
- Apache HTTP Server: One of the longest-running and most popular web servers, very versatile and supported by a large open source community.
- Nginx: Famous for its efficiency in handling simultaneous connections, often used as both a web server and a reverse proxy.
- LiteSpeed: Valued for its high performance, especially in shared hosting environments and with CMS like WordPress.
Every web server generates files called log, which record the activities performed on the server itself. Web server logs contain a detailed history of what happens when the server handles requests from users. Typically recorded information includes:
- User's IP address: Identifies the device from which the request comes.
- Date and time of the request: Indicates when the user interacted with the server.
- URLs requested: Specifies which resources were requested (pages, files, images, etc.).
- HTTP method used: Reports the type of operation performed, such as GET, POST, or DELETE.
- Server response: Reports the status code returned by the server, such as 200 (success), 404 (page not found), or 500 (server error).
- Useragent: Contains information about the browser, operating system and device used by the user.
This data is essential for the operation, security and optimization of websites, allowing you to diagnose problems, monitor traffic and detect suspicious activity. However, much of this information, such as IP addresses, can be considered personal data under the GDPR, requiring careful and regulatory-compliant management.
Italian law and log retention
In Italy, the management and storage of web server logs are regulated by a complex set of regulations that are intertwined with the General Data Protection Regulation (GDPR). These rules often add specific details and obligations to the European framework, especially in relation to security needs, judicial investigations and prevention of crime.
The Electronic Communications Code
A central reference is the Electronic Communications Code (Legislative Decree 259/2003), which establishes precise obligations for providers of telecommunications and internet access services. In particular, the Code provides that traffic data, collected for the purposes of ascertaining and repressing crimes, must be retained for a minimum period of 12 months. This data includes:
- Source and destination of the communication (IP addresses and port numbers).
- Date, time and duration of the connection.
- Protocol used (for example, HTTP, HTTPS, or FTP).
The requirement to retain for at least one year is designed to ensure that information needed for an investigation can be retrieved by the competent authorities.
Obligations for service providers
Internet service providers, hosting and telecommunications are subject to more stringent rules than other operators. In Italy, these entities must retain logs not only for technical or administrative purposes, but also to respond to any requests from judicial authorities. Failure to comply with these obligations may result in administrative or criminal sanctions, depending on the severity of the violation.
As regards the Hosting Providers, while not expressly equated to telecommunications providers, there are indirect obligations to cooperate with the authorities. For example, web server logs could be requested to identify who has uploaded illegal content to a hosted platform or to track down the perpetrator of a fraud.
Prolonged storage in specific cases
In special situations, such as investigations into serious crimes (e.g. terrorism or child pornography), the retention of logs may be extended beyond the standard terms. In such cases, upon request of judicial authorities, data may be retained for a longer period, provided that the request is justified and occurs in compliance with the legislation on the protection of personal data.
Differences between GDPR and Italian law
Unlike the GDPR, which leaves room for more general interpretations regarding the duration of retention, the Italian regulation is often more detailed and prescriptive. This creates a complex framework, where IT professionals must balance:
- Minimum duration: In Italy, the retention of traffic data for at least 12 months is mandatory for public safety purposes and for the detection of illegal activities.
- Maximum duration: The GDPR imposes the principle of “storage limitation”, whereby data must not be retained longer than is necessary for the stated purposes.
- Compatibility between regulations: Italian law can expand the requirements of the GDPR, but it cannot contradict them. For example, it cannot require the retention of logs for an indefinite period without a solid legal basis.
Other regulatory references
Other legislative instruments that influence log retention in Italy include:
- Personal data protection code (Legislative Decree 196/2003): Integrated with the GDPR, it specifies the criteria for the conservation of personal data at national level.
- Law against terrorism (Legislative Decree 155/2005): Introduces specific cooperation obligations for the monitoring and retention of data related to potential terrorist threats.
- AGCOM Regulation: For digital content providers, it imposes indirect responsibilities in the management of data relating to users and online activities.
Mistakes to avoid in managing logs and IPs when using CDN and Reverse Proxy
When using a CDN (Content Delivery Network) or a technology of reverse proxies including Varnish, Nginx, Cloudflare o Caddy, an intermediate layer is introduced between the user and the originating server that, if not configured correctly, can alter the data collected in the logs. A common mistake is to record the IP address of the reverse proxy in the log files instead of the real IP of the user who made the request. This happens because, in the default configuration, the originating server sees as the source address that of the proxy or CDN node that forwarded the request, not that of the final client.
This issue can have significant implications. Recording the proxy IP address compromises the validity of the logs for several purposes. From the perspective of security, makes it difficult to identify the actual origin of any attacks, such as intrusion attempts or suspicious activities. In terms of legal compliance, this error can impair the ability to comply with requests from law enforcement authorities, who often need logs to trace the identity of the user responsible for an illegal online activity. Furthermore, in the context of traffic analysis, failure to record users' real IPs can lead to inaccurate data, reducing the reliability of key metrics for website optimization and management.
This challenge becomes even more relevant in complex scenarios, where traffic is distributed across multiple CDN nodes or through multiple proxy levels. To avoid these errors, it is essential to configure the system to preserve the user's original IP, using specific headers provided by the adopted technology, such as X-Forwarded-For
o CF-Connecting-IP
. Without such a configuration, the logs not only become unusable for investigative purposes, but can also expose the service provider to potential regulatory violations, especially in regulated environments such as GDPR or of Italian legislation.
Why does this error occur?
Reverse proxy technologies and CDNs act as intermediaries between the user and the originating server. When a request passes through a proxy like Varnish or a CDN like Cloudflare, the originating web server only sees the IP address of the proxy and not the actual client. This is because the connection between the proxy and the originating server does not preserve the user's IP unless it is configured correctly.
How to avoid logging proxy IP
With Varnish
Varnish, being a reverse proxy caching technology, does not directly handle HTTPS traffic, but relies on a server such as Nginx o Caddy for SSL termination. To ensure that the user's real IP is logged, you need to configure logging during HTTPS termination. For example:
- Configuration Nginx o Caddy to add the header
X-Forwarded-For
on the incoming request. This header includes the client's original IP address. - On Varnish, configure logs to read and log the IP from the header
X-Forwarded-For
, rather than the IP address of the direct connection.
Example configuration in Varnish VCL:
sub vcl_recv { set req.http.X-Real-IP = client.ip; }
With Cloudflare and other CDNs
Cloudflare and similar services (Akamai, Imperva's Incapsula, Sucuri, Fastly) provide specific guides to restore the user's real IP. You need to:
- Configure the native web server (e.g. NGINX or Apache) to read the custom headers added by the CDN, such as
CF-Connecting-IP
(Cloudflare) orTrue-Client-IP
(Akamai). - Enable the Real IP Module in NGINX or the equivalent module in Apache, specifying the CDN IP ranges as trusted proxies.
Example configuration for NGINX with Cloudflare:
set_real_ip_from 173.245.48.0/20; # Cloudflare IP Ranges set_real_ip_from 103.21.244.0/22; real_ip_header CF-Connecting-IP;
With other CDNs
This approach also applies to other CDNs that operate as reverse proxies. For example:
- fast: Use the header
Fastly-Client-IP
. - Sucuri: Use the header
X-Forwarded-For
. - Encapsulate (Imperva): Use the header
True-Client-IP
.
Implications and risks
Failure to properly configure real IP address logging can have significant consequences:
- GDPR Compliance and Italian Law: If the logs only contain proxy IPs, they may not be considered valid for reconstructing a user's activities in case of legal requests.
- SafetyIntrusion detection or anomaly analysis tools can be ineffective if they don't work with users' real IPs.
- Debugging and traffic analysisVisibility into real user behavior is compromised.
To avoid errors in log management with the use of reverse proxies and CDNs, it is essential to configure the servers to record the real IPs of users, using the specific headers provided by each technology. Whether you use Varnish, Nginx, Cloudflare, Akamai, Incapsula, or other solutions, following IP recovery best practices is essential to ensure reliable and compliant logs.
Conclusion
Web server log management is a crucial aspect to ensure legal compliance, security and transparency. In Italy, regulations require careful log storage, respecting both the stringent rules of the Electronic Communications Code both the general principles of the GDPR, such as data minimization and data retention limitation. This balance must be carefully maintained, especially when using advanced technologies such as reverse proxies e CDN.
The use of services such as Cloudflare, Akamai, fast o Varnish introduces complexity in managing real IP addresses, which can be replaced by proxy IPs if not configured correctly. This error can invalidate logs for legal or security purposes and reduce their usefulness for technical analysis. It is essential to implement the necessary configurations to correctly log real IPs, for example by using specific headers of reverse proxy technologies.
Additionally, it is essential to establish clear communication with customers, especially when they manage their own domains. Hosting providers should inform customers about the potential impacts of CDNs or reverse proxies on logging in the preliminary stages of the contract. If a customer decides to use technologies such as Cloudflare, it is important to notify the provider to ensure the system is configured correctly.