squid-the transparent proxy server

Taking care of your Linux box.

squid-the transparent proxy server

Postby theone » Thu Aug 15, 2002 4:06 am


What is proxy server?

Proxy servers keeps record of www requests and save the request result in hard drive portion called as cache for that server.

Proxy servers have two main purposes:

Improve Performance: Proxy servers can dramatically improve performance for groups of users. This is because it saves the results of all requests for a certain amount of time. Consider the case where both user X and user Y access the World Wide Web through a proxy server. First user X requests a certain Web page, which we'll call Page 1. Sometime later, user Y requests the same page. Instead of forwarding the request to the Web server where Page 1 resides, which can be a time-consuming operation, the proxy server simply returns the Page 1 that it already fetched for user X. Since the proxy server is often on the same network as the user, this is a much faster operation. Real proxy servers support hundreds or thousands of users. The major online services such as Compuserve and America Online, for example, employ an array of proxy servers.

Filter Requests: Proxy servers can also be used to filter requests. For example, a company might use a proxy server to prevent its employees from accessing a specific set of Web sites.
What is transparent proxy server?
In ``ordinary'' proxying, the client specifies the hostname and port number of a proxy in his web browsing software. The browser then makes requests to the proxy, and the proxy forwards them to the origin servers. This is all fine and good, but sometimes one of several situations arise. Either
·You want to force clients on your network to use the proxy, whether they want to or not.
·You want clients to use a proxy, but don't want them to know they're being proxied.
·You want clients to be proxied, but don't want to go to all the work of updating the settings in hundreds or thousands of web browsers.

This is where transparent proxying comes in. A web request can be intercepted by the proxy, transparently. That is, as far as the client software knows, it is talking to the origin server itself, when it is really talking to the proxy server. (Note that the transparency only applies to the client; the server knows that a proxy is involved, and will see the IP address of the proxy, not the IP address of the user. Although, squid may pass an X-Forwarded-For header, so that the server can determine the original user's IP address if it groks that header).
Linux includes the "transparent proxy" feature at the kernel level, starting with kernel version 2.0. Using this feature, along with Linux fire walling, it is possible to redirect all connections (originating from the local network and destined to a remote host on the Internet) to a local server called a "transparent proxy server". This process is completely transparent for the local computer (thus the name). The local computer thinks it is talking to the remote Internet host, while in fact it is connected to the local proxy server. The redirection can be made in such a way that any port can be re-routed to some other server/port.

·Squid is a http proxy and cache. As such it keeps a lot of temporary data on the hard drive. There is no point in backing that up. Insert "--exclude /var/spool/squid" into the appropriate tar command in your second stage backup script. Then, get squid to rebuild its directory structure for you. Tack onto the tail end of the second stage restore script a command for squid to initialize itself.

·Squid is an Open source Proxy Caching server designed to run on Unix systems. Funded by the National Science Foundation, Squid is now a high performance caching server with its presence in numerous IPS's and corporations around the globe.

·Squid is software that caches Internet data. It does this by accepting requests for objects that people want to download and handling their requests in their place. In other words, if a person wants to download a web page, they ask squid to get the page for them. Squid then connects to the remote server (for example http://squid.nlanr.net/) and requests the page. It then transparently streams the data through itself to the client machine, but at the same time keeps a copy. The next time someone wants that page, squid simply reads it off disk, transferring the data to the client machine almost immediately. Squid currently handles the HTTP, FTP, GOPHER, SSL and WAIS protocols. It doesn't handle things like POP, NNTP, RealAudio and others.

Squid is...
·a full-featured Web proxy cache
·designed to run on Unix systems
·free, open-source software
·the result of many contributions by unpaid (and paid) volunteers
Squid supports...
·proxying and caching of HTTP, FTP, and other URL's
·proxying for SSL
·cache hierarchies
·ICP, HTCP, CARP, Cache Digests
·transparent caching
·WCCP (Squid v2.3 and above)
·extensive access controls
·HTTP server acceleration
·caching of DNS lookups

Squid proxy server configuration notes

squid Proxy Server Configuration Introduction
The utility squid is an internet proxy server that can be used within a network to distribute an internet connection to all the computers within the network. One central computer is connected to the internet through any means such as dial-up, cable modem, ISDN, DSL, or T1, runs squid, and thus acts as the firewall to the internet. Because it is a proxy, it has the capabilities to log all user actions such as the URLs visited. There are many features that can be configured in squid. This guide is meant to be a quick start guide for those who are eager to get squid working and then configure it from there.

squid Configuration

Squid uses the configuration file squid.conf. It is usually located in the /etc/squid directory. Access through the proxy can be given by individual IP addresses or by a subnet of IP addresses. In squid.conf search for the default access control lists(acl) and add the following line below them: acl mynetwork src (for subnet)acl mynetwork src (for individual IP)Then add the access control list named "mynetwork" to the http_access list with the following line: http_access allow mynetworkThe default port for the proxy is 3128. Uncomment the following line and replace 3128 with the desired port :http_port 3128
Starting, stopping, and restarting squid
Assuming you have the runlevel scripts installed you can use the following commands as root:Start squid /etc/rc.d/init.d/squid start
Restart squid /etc/rc.d/init.d/squid restart
Stop squid /etc/rc.d/init.d/squid stop
or issue the following TWO commands as root: squid -z squid or configure squid to start at boot time using your runlevels.
Configuring squid Clients
To configure any application including a web browser to use squid, modify the proxy setting with the IP address of the squid server and the port number (default 3128).

Detailed how to


This section explains ALL the network address parameters relevant to a Squid installation. Generally speaking, a Squid instance will need to communicate with:
oLocal or remote web servers
oOther Cache servers
oClients (desktop browsers or gateways)

Squid configuration needs to define the addresses (IP address + port) for every relevant server and gateway This section focuses on communication with clients and web servers. The next section will detail the parameters required for communication with other cache servers in the network.
A quick note on inter-server communication. Squid listens for TCP or ICP communication on specific ports. It uses TCP to communicate with web servers and clients, and ICP to talk to other cache servers. For every such server (or client), Squid configuration needs to assign a unique port number over which Squid would send requests (TCP or ICP) and listen for responses. An IP address is simply the network address at which the server is running. In short, the complete address for any server-to-server communication is determined by an IP-address+port combination. The following network parameters are relevant to Squid configuration:

II.Peer cache servers and Squid hierarchy

The parameters described in this section are relevant when there is a Squid hierarchy in the network (i.e. more than one Squid instance running in the network with well-defined rules regarding which instance talks to which other instance, and so forth). Parameters of interest here are: number of cache servers, type of configuration (which instance communicates with which instance(s)), defining the primary cache server, mapping of specific domains to specific cache server instances, Timeouts, specification of objects that should not be cached locally etc. Relevant parameters covered by this section are:

III.Cache size

Squid supports more than one Cache replacement policy. This section also touches briefly on cache interaction with disk.

IV.Support for External functions

Squid has the ability to invoke certain "externally defined' functions that are NOT part of the Squid binary. Such "external" executables (programs) are usually placed in a contrib directory for source code distribution. The most common "external" programs are FTPUser, DNS, Redirectors and Authenticators, and are usually contributed by sources other than Squid.

External programs are invoked by Squid through the standard fork() and exec(). The number of such fork-able child processes for specific "external" processes can also be defined.


Timeout parameters in Squid can be based on overall connection timeouts, peer-specific timeouts, site/domain-specific timeouts, request-specific timeouts etc. Proper setting of timeout values is critical to optimal Squid performance. Relevant parameters for timeout settings are listed here.

VIII.Access controls

Access control settings are among the most important features of Squid. You can configure Squid to set filters for various entities and at different granularities (e.g. filters for specific protocols, filters for certain types of commands, filters for specific routers, filters for specified domains, etc). The relevant parameters are described below:

X.Administrative parameters

The parameters in this section allow the Squid admin to specify, for example, which users and groups have the right to run Squid , what host name should be displayed while displaying errors, which users have the authority to view Cache activity details, etc.

XI.Httpd-accelerator options

Squid can act as a load balancer or load reducer for a particular webserver. Generally squid not only keeps clients happy but also the web servers by reducing load on server side. Some cache servers can act as web servers (or vice versa). These servers accept requests in both the standard web-request format (where only the path and filename are given), and in the proxy-specific format (where the entire URL is given). The Squid designers have decided not to let Squid to be configured in this way. This avoids various complicated issues, and reduces code complexity, making Squid more reliable. All in all, Squid is a web cache, not a web server.
By adding a translation layer into Squid, we can accept (and understand) web requests, since the format is essentially the same. The additional layer can re-write incoming web requests, changing the destination server and port. This re-written request is then treated as a normal request: the remote server is contacted, the data requested and the results cached. This lets Squid to pretend to be a web server, rewriting requests so that, they are passed on to some other web server.
For Transparent caching, Squid can be configured to magically intercept outgoing web requests and cache them. Since the outgoing requests are in web-server format, it needs to translate them to cache-format requests. Transparent redirection is prohibited by internet standard #5 "Internet Protocol". And HTTP assumes no transparent redirection is taking place.
This section allows various configurations related to the accelerator mode and the transparent mode.


As the title suggests, this section covers parameters that could not be explicitly bundled in with any of the previous categories. Examples of features covered here are:
oLimiting the growth of log fils.
oDisplaying customized information to clients upon error conditions or access denial.
oDefining memory pools for Squid.
oNetwork management by enabling SNMP.
oCo-ordination with neighbor caches by enabling WCCP and
oDirecting the requests either to the origin server or to the neighbor cache.
The relevant parameters are:
· Delaypool parameters (all require delay_pools compilation options)
Conceptually, Delay pools are bandwidth limitations - "pools" of bandwidth that drain out as people browse the Web, and fill up at a rate we specify - this can be thought of as a leaky bucket that is continually being filled. This is useful when bandwidth charges are in place, if we want to reduce bandwidth usage for web traffic.

Delay Pools can do wonders when combined with ACLs. These tags permit us to limit the bandwidth of certain requests, based on any criteria. Delay behavior is selected by ACLs (low and high priority traffic, staff Vs students or student Vs authenticated student or so on). In ISPs, delay pools can be implemented in a particular network to improve the quality of service. To enable this, Squid needs to be configured with the --enable-delay-pools option.


Security Issues

Many of the current firewall designs rely on the combination of packet filtering and the proxy technology (especially "transparent proxying" technology). Today, Proxy systems can manage the different operation authorizations that users have when surfing (for example: who is allowed to use which protocol), blocking unwanted surfers outside the local net from going in, and run a log file containing users operations. Of course that's all besides the filtering on the basis of IP address.
However, the caching ability, which makes the Web run faster, has its security disadvantages. It could be bad for business advertising at Web sites. It might even violate copyright law.
Advertisers behind a site have a problem with the caching proxy servers. They have no way of knowing the number of readers behind a hit-it could be one or hundred thousand - they can't tell without looking at the log files of the proxies. Furthermore, every copyrighted document sitting in the proxy's cache is, in fact, an unauthorized copy.
The wrong solution would be to disable the caching. It will hurt the performance, causing fewer visitors at the advertisers sites. A good solution would be letting a caching proxy to keep a copy of a Web page if the proxy promises in return, to tell the Web server the number of hits it got for that page over a reasonable time period. Nodoubtly, advertisers would prefer more specific information of the readers, but that's something to argue about.
Other problems arise when using the Internet Cache Protocol (ICP) - a lightweight format message used for communication among Web proxy caches, implemented on top of UDP. ICP is used for object location, and can be used for cache selection. Because of its connectionless nature, it has vulnerability to some methods of attack. By checking the source IP address of an ICP message-certain degree of protection is accomplished. ICP queries should be processed only if the querying address is allowed to access the cache. ICP replies should only be accepted from known neighbors, otherwise ignored. Trusting the validity of address in the IP level makes ICP susceptible to IP address spoofing which has many problematic consequences (for example: inserting bogus ICP queries, inserting bogus ICP replies thereby preventing a certain neighbor from being used or forcing a certain neighbor to be used). In fact, only routers are able to detect spoofed addresses, hosts can't do it. But still, the IP Authentication Header can be used to provide cryptographic authentication for the IP packet with the ICP in it.
A very important issue is quantitative assessments on the influence different caching strategies have on the behavior of a proxy server with respect to values like latency, bandwidth consumption, and overall error rates. These values depend on things like document popularity, cache hit rates, and error rates. By using trace-based simulations of the behavior of a WWW cache, using different caching parameters, algorithms, and heuristics, some interesting results were gained. A most important question is how often old objects were retrieved from a cache instead of the latest version on the original server. The stale rate gives a good indication of that.

Posts: 1
Joined: Thu Aug 15, 2002 5:26 am

Re: squid-the transparent proxy server

Postby theone » Thu Aug 15, 2002 5:40 am

mess with the best die like the rest !!!!
get ready to rock n roll.................

Return to “Administration”

Who is online

Users browsing this forum: No registered users and 1 guest