For a scraping project, I wondered if there was a way to find out whether a webpage had been updated or not, without keeping track of it myself. Someone suggested the "last-modified" header that supplies a time and date of last modification; there is also an "etag" header that supplies a hash of the page content.
How do you examine HTML headers? I wrote a little piece of code to report them: on line at https://bitbucket.org/dpb/report_headers .
Conclusion: There are no headers that are always reliable for keeping track of updated webpages. Some sites provide both "last-modified" and "etags"; some provide only one of the two. Some provide none — even major US sites like the Wall Street Journal (wsj.com) don't send "last-modified". As a matter of fact, WSJ doesn't even report "content-length". So all bets are off with headers. One can use them with sites where they are known to be supplied, but other tools are still needed for sites where they are not.
And when headers are indeed supplied, there's also the question of whether they are accurate.