The Economics of Text Storage

I’d like to convince you that it is rarely in your best interest to delete text for space reasons. The logic goes like this:

  • Assume a very good secretary might be able to type 150 words per minute, or  90,000 words an hour.
  • Let’s assume that the average word length is somehow ten letters – this is high.
  • This secretary then types 90,000 words an hour, or 900,000 characters per hour. Let’s round that up to one million characters per hour.
  • Now let’s say that you want to store not only the character, which takes about a byte, but also tons of other metadata – for each character, store
    • The character (1 byte)
    • The timestamp (8 bytes)
    • The full file path (max 260 bytes in Windows)
    • The user name (100 bytes max?)
    • The place in the file (max 8 bytes)
    • Some other stuff
  • Note that if you’re storing it this way, you’re recording it as a journal and can store every single micro-change made to the file.
  • Let’s say you somehow want to store a thousand bytes of data for each character
  • One thousand bytes per character times one million characters per hour totals to one gigabyte of data per hour. That may sound like a lot, but consider this: modern hard drives cost as little as five cents per gigabyte. You can find a 3 TB hard drive for about $170 here. That’s just five cents an hour to record every micro-change made.

I think that businesses should strongly consider this option. I should also note that this doesn’t apply to other kinds of files, like videos or pictures or audio, nor does it apply to storing machine-generated data, like system logs.

I should also say that there must be an easy way to replace these old drives; a rather large two terabyte drive would last 2,000 hours – about a year of standard office weeks. I suggest putting these drives in a hot-swappable machine on the network, and putting the full drives in storage.