This post starts with a simple question:
How do I ensure this blog is still here in 200 years time?
It’s a simple question but this is actually a pretty significant engineering challenge which I want to explore in this post.
This blog has been going for around 22 years as I write this. I don’t update the blog that regularly but it’s been a constant presence in my life over the last two decades.
My thoughts occasionally turn to what happens to the blog when I die?
The first and most important thing to recognise is that HTML and CSS is probably going to last 100s of years. Obviously, there is little real evidence I can muster in favour of this opinion, other than a gut feeling and some handwaving. That said, we can try to make the argument.
It is still possible to visit the very first website ever and have it render correctly. Even though the site is now using markup not considered standard.
Equally, images encoded as JPEGs, PNGs and other image formats remain readable over the multi-decade time span. There is no reason to suspect these formats will not survive for decades to come. There are images taken using digital cameras decades ago that are still easily viewed today.
I think it’s also reasonable to assume that any dynamic content, like Python, PHP etc is going to run in to significant issues in the longer term.
They have complex tool chains and ecosystems that support them. Once those dependencies start to become unsupported, the rest of the ecosystem will start to collapse and eventually become insecure.
Equally, I feel like any client side scripting is a bad bet for similar reasons. Web presentation frameworks like React/Angular etc come in and out of fashion regularly.
Another thing to consider is that a blog that depends on a database underneath is going to need to be maintained and migrated to newer versions. You’d expect at some point to have a migration that results in your data either getting lost or being non-exportable between systems.
This actually happened in my blog. The very first post is talking about how the Microsoft Access database got corrupted. These posts probably weren’t particularly high quality - but they’re now lost forever.
The only way to maintain dynamic content over a many decade time-span is to have a company or trust that is actively maintaining the software.
One solution is just to use a managed free service. This way, you’re using dynamic content but you already have a company wrapped around the maintenance problem.
If you use a service like LiveJournal, Blogger, BlogSpot, Wordpress or some other similar blogging service it’s likely your posts will last a fair amount of time.
Take for example this blog:
The author was thought to be the oldest blogger in the world at the time and died May 20, 2009. Now over 16 years ago. Yet it seems reasonably likely that in 16 years time, this blog will still be about.
A quick Google search shows you many examples of blogs abandoned around this time and that are still operational today.
If you are going to use a managed service, it makes sense to pick one where content preservation is a key principle of the offering. For example, it’s harder to imagine a service like neocities deleting content or withdrawing their service than something like GitHub pages. GitHub pages is a side-feature of GitHub, for Neocities it’s the entire point of the service.
While these blogging services seem to offer some promise, there are still challenges. Even on the blog I showed above, we notice something interesting. She posted a lot of Adobe Flash content which is now no longer supported.
The Adobe Flash problem observed above speaks to a much deeper problem and it’s something I alluded to in the sections above.
Products that were once “forever” products get withdrawn, deprecated, end-of-lifed. Geocitied, as it were.
Over a multi-decade time horizon, what guarantee is there that say GitHub Pages will remain a product offering? That Blogger will not be shut down? etc. Over decades, the chance has to approach 100%.
If you want something that has a reasonable chance of making it to a couple of hundred years, you’re going to need to have a transgenerational organisation to maintain your site.
At its root, the real challenge here is having a software and hardware stack that is guaranteed to be here in hundreds of years.
This seems an almost ridiculous ask at first but my guess is that, somewhere in the world, there must be systems that are already around that will last hundreds of years. Possible candidates are systems inside banks or inside government departments such as defence and taxation.
Indeed, in the United States the MOCAS system has been running continuously since 1958.
Even in the private sector, it’s possible that things like AWS and Azure will have parts of their computing stack that runs for hundreds of years.
Of course, no software engineer today ever expects they’re building a system that will be owned by their distant descendants and they’re not really designing for that reality. These systems get to a great age largely by accident. The system becomes essential, deeply welded in to an organisation, and quickly becomes hard to replace.
Is there a way to be more intentional about designing a system for longevity?
One thought on how to design systems that last many decades is to base them on technologies that have already been around for decades. The logic here being that if the technology has already survived for a few decades and is still in use, it’s likely to still be here in a few decades. Will these technology choices go on to survive centuries? Honestly, we don’t know but right now so this approach is the best we can do.
If we think about processor architectures, the x86 architecture first debuted in 1978. There’s been lots of upgrades along the way but it is still possible to directly execute machine code from this era on a modern processor.
The first Unix system was created in 1971 and the POSIX standard was created in 1988. Linux is now basically everywhere in the server space. Debian was first released in 1993, some 32 years ago. So it seems that reasonable that picking something like Debian would probably give you an upgrade path measured in the decades. Debian would be “close enough” to any POSIX compliant system such that a migration could be effected if it ever stopped being maintained.
The Apache webserver was first launched in 1995, some 31 years ago. Estimates put current marketshare at 25% of all web traffic being served by Apache. It seems a reasonable bet that this webserver will continue to be around decades from now.
Finally, we have our hypertext documents written in HTML with images served in JPEG, GIF and PNG formats. HTML came about in 1993, JPEG in 1992, GIF in 1987 and PNG in 1995. All of these technologies are already over 30 years old.
We avoid any sort of active content in our solution to avoid the management of tool chains, security and vulnerability management in our libraries, databases and other challenges that come out of that complexity.
We can think of our system as a series of layers:
Notice that with the exception of the static content, which is ultimately the service we’re trying to protect, we can swap the other components at will.
We can swap the processor architecture for ARM, and use Debian for ARM and the service still works. We can swap Apache for Nginx and the service will still work. We can keep Apache and x86 but swap Debian for Redhat.
The fact that we have this loose coupling means that if circumstances change in the future we have the ability to swap these components out with their modern replacements and still maintain our service.
One problem is that if you’re not using some sort of managed service, you need a maintenance programme that will preserve your site in the long term.
One option is to create a trust in your will that manages the devices that support your blog. This trust would handle domain name renewals, equipment upgrades, operating system upgrades and other activities required to keep your site online.
This trust needs to be sufficiently funded such that interest from its investments will cover the costs of maintaining the blog in the long term.
How much would this cost?
Debian has a 5 year support cycle. Suppose, we paid for £5 a month for hosting and then every five years, we paid someone £500 a day for a few days to migrate services. We then need to factor in domain renewal and the need for someone to administer the trust.
Service | Duration | Unit Cost | Total Charged |
---|---|---|---|
Hosting | 5 years | £60 | £300 |
Maintenance | 3 days | £500 | £1500 |
Domain Renewal | 5 years | £50 | £50 |
Annual Trust Fund Fee | 1 Year | £300 | £1500 |
Trust Fund Admin Fees | 3 hours per year | £135 | £675 |
Total | £4025 |
This gives us a £4025 cost for five years hosting or £805 a year. We can do some maths and work out how much capital this needs in our trust fund. The safe withdrawal rate for capital preservation is 3%. So we simply divide £805 by 3% to get our answer.
\[\frac{£805}{0.03} = £26,833\]So in theory, you could preserve your flat HTML website forever with an initial investment of £25-30k. Expensive, but not completely out of reach of a normal person.
If you really want a blog that’s going to be around for many decades or perhaps centuries, I don’t see an alternative to the trust fund option. Any free service can be altered or withdrawn, as Geocities shows.
However, if you merely want to stick around for a few decades then this is easily achievable by using a managed service. It may even turn out that these managed services preserve your content in to the distant future - we don’t really know at this point.
I do think it’s an important topic. So much of our culture is recorded online and preserving that so future generations can study it will become increasingly important over time.