The other day I was trying to understand about different HTTP Cache headers and how it’s working. The answers I found was varied for different people. So I spend some hours reading some various articles and made this one. Correct me if I am wrong. Most of the ideas I got is from below article which is an archived blog.
Let’s say you hosted a website called ourfamily.com and you created the site in
Now, what’s cache had to do with all this? The internet is slow that the browser and server have to talk to each other about everything and it can’t take that much time and bandwidth. So for most of this talking, if the browser already knows the answer, it shouldn’t ask the question to the server. Just like, whenever you refresh the family photo page, the browser doesn’t have to wait or ask the server for the same photo to be sent through the network.
In this big world, when browser and server sitting on two different continents, this talking is more expensive. And if the number of pictures is high and everyone may want the same picture, like in the case of a newspaper, talking cannot be this much expensive. So people installed some small intermediate servers. The browser will ask for information to these proxies which will get the information from the original server first, and whenever the second request comes, maybe from same browser or another browser for the same information, they will give the information instantly without going back to the original servers. This added an extra speed to the communication between servers and browsers.
This architecture has some security issues also. These Proxy Caches aren’t supposed to cache everything browser and server talking because they may be passing some sensitive information also, like username and passwords. These talking are based on the HTTP and we could control caching by setting some headers on the HTTP requests and responses.
1.Pragma: no-cache
We had Pragma: no-cache
in HTTP 1.0 requests (Not response). This is for
telling intermediate cache ‘not to cache this request’ maybe this communication
contains sensitive information like username and passwords or caching this had
no advantage for a proxy-cache.
2.Expires
Another HTTP 1.0 header to tell browsers when the page will be expired. This is a response directive. ie the server set this header in the response. Basically, by this header, browser can know that how long this information is intact. Browser and Proxy caches can cache it in their memory.
Then HTTP 1.1 came out in 1997 with upgraded headers.
3.cache-control: public
Let’s say you have a public page, like a login page. or a public resource, like family logo which can be stored by any cache, browser or not. So most of the time people try to get the resource will hit the cache and won’t direct hit the server and we get a performance boost.
cache-control: private
cache-control: private
tells the proxies not to cache. Like your family photo
page of site ourfamily.com. You wouldn’t want anyone else to see this photo
but your browser can cache this photo for performance and save bandwidth and
time.
cache-control: no-cache
Now let’s say you changed the login page design at the time of the New year. The
users should be seeing new page instead of old one. But you already set the page
as public and the browser’ll only show that instead of your new one. Now we’ll
have the most badly named spec (I think) in cache directives to rescue. We could
use cache-control: public; no-cache;
for the login page. no-cache will tell
the browser that ‘Hey, you can cache this page, but before you show this to the
user, just let the server know. If server changed the page server will give you
the updated one.' Now we can save bandwidth instead of downloading the new page
every time.
This naming is so bad that some browsers even start to implement cache
architecture that if no-cache
is present, they won’t use the cache at all,
instead, they will download a fresh copy from server every time.
cache-control: no-store
Now you want a completely no-cached page. Like some page where your family
business revenue is shown And you know no-cache
is not the answer. no-store
to the rescue. It tells the caches (both browser and other caches) not only “not
to cache the page”, but also “not to even store the page in its cache folder”.
So it’s safe to use cache-control: private,no-cache,no-store
for highly
sensitive and fresh pages.
Now we want to cache the background image of the site for 1 week. We’re not
going to change it so often and even if you change it, it’s okay some user see
the old background for some time. Because it’s not important. We could add the
max-age
in the header. max-age
is like the old expires
header telling the
browser, ‘Hey you can cache this until the max-age is expired, after you’ve to
revalidate the asset from server.’ So what’ll happen you have a max-age=0
header? It’ll revalidate every time with server for each request. Pretty much
same as no-cache
. (?🤨)
Using
max-age
instead ofexpires
is better. Becausemax-age
is relative time(How long the cache should stay) whereexpires
is an absolute one which will take a date as the value, when cache should be stale. (It’s hard to set the date because the time should be same on server and client and we should keep updating this date more often.)
This is all in theory. Different browsers and proxy servers can implement or
honour this headers or not do anything. Just like no-cache
for some browsers.
And some browsers may show stale responses if the network is down and all. If
you want to revalidate the cache every time no matter what, we could achieve
that with an extra header value called must-revalidate
. This will tell the
browser in any circumstances browser should revalidate the resource. There’s a
proxy-revalidate
value too but for proxy servers.
Browser Validators
Now, we’ve talked about re-validating the resource by
contacting server. How will the browser know that the resource is still valid?
We’ve two ways to do that. ETag
and Last-Modified
headers are used as these
validators. Every resource are send with these values from the server. And when
a browser wants to validate whether this is fresh/stale it’ll send a request
with headerif-modified-since
with a value of last-modified
OR a header named
if-none-match
with an etag
value. And if those matches, the browser could
tell that resource is still not changed and can continue to use that.
I think I kind of explained these things. I got the information from different articles and talked about here. Please let me know if I am incorrect or things have been changed over time. Thank you.