Saturday, May 12, 2007

The "Forget This" Algorithm

This week, I encountered a recurring theme in my web surfing/reading. That theme is the human psychological necessity to forget things and the potential mental trauma that massive and efficient data storage technologies represent.

The first instance of this theme was when I stumbled (via Boing Boing) across a reprint of a 1983 SF story by Spider Robinson, entitled "Melancholy Elephants." I will not comment on the story specifically, because I don't want to spoil it for those of you who haven't read it.

The second instance of this theme happened while I was listening to Episode (05/10/07 - Broadband in every pot) of Buzz Out Loud. Molly was discussing the possibility of a "forget feature" for computers and echoing some of the concerns voiced in "Melancholy Elephants." It got me thinking about how we frequently attempt to mimic human processes/behaviours, such as facial recognition or dancing, with computer programs . . . so how hard would it be to teach a computer to simulate forgetfulness?

Tom and Molly were critical of the idea of using meta data (and rightfully so, IMO, because meta data is notoriously inconsistent from person to person or even day to day). I'm thinking it should be a formula based on the frequency that a particular file/piece of information is used, possibly how long ago it was first learned, and how long it has been since that piece of information was last used.

In other words, you access/use your own phone number every day (i.e. giving it out to other people, not necessarily calling your own number), so you should never forget that. However, the phone number for your CPA you probably only use once a year around tax season, which is why you always have to look it up.

Percentage chance of "forgetting" (i.e. archiving off of primary storage) = number of days since that file/database record was last used divided by the number of times that file and/or database record has ever been used.

EXAMPLE:

200 days since last used / 100 accesses = 2% change it gets "forgotten"

20 days since last used / 100 accesses = 0.2% chance it gets "forgotten"

20 days since last used / 10 accesses = 2% chance it gets "forgotten"

20 days since last used / 1 access = 20% chance it gets "forgotten"

It needs some tweaking, definitely, because right now those stats are significantly better than real human memory. For instance, twenty days ago if you looked up the number for a business that you have never dealt with before or since, I doubt you are 80% likely to remember that phone number now.

No comments: