This is my response to a friends questions about Packrat.
From jeske@... Mon Nov 30 16:50:25 1998 Date: Mon, 30 Nov 1998 16:50:25 -0800 From: David JeskeTo: Paul Bleisch Subject: Re: killer app Message-ID: <19981130165025.O5324@home.chat.net> References: <8744DF3002FBD011BDDF000092970B465B95E8@iron.digitalanvil.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.94.13i In-Reply-To: <8744DF3002FBD011BDDF000092970B465B95E8@iron.digitalanvil.com>; from Paul Bleisch on Mon, Nov 30, 1998 at 04:51:50PM -0600 Status: RO Content-Length: 7331 On Mon, Nov 30, 1998 at 04:51:50PM -0600, Paul Bleisch wrote: > I have been thinking about the work you did on your > uber-information manager (packrat). Then, the other > day I was reading something in some magazine where > some columnist predicts that end users will want > personal firewalls in the next year. I don't doubt > that, but it made me think more. On top of that, > Oracle's Internet Server 8i (or whatever it is called) > has moved to replace file system services, web, and > application serving services from the OS to the DB. > (Basically, there is a built in JavaVM, web server, > and you can store java applets in the DB.) Yeah, that new Oracle DB/filesystem thing is interesting. It can do more than serve java applets, it can serve as a versioned NFS drive with database like searching capabilities. > I really think that PackRat is the ultimate killer > app for the wired. If someone could set up a package > that contained a good collection of goodies and a > nice API to access the DB easily, information management > becomes very easy. Agreed, and that was my motivation... I'd really like to see the 'heirarchial filesystem' go by the wayside altogether. After all, everything in the world is relative to something else, not to "/". I'm talking with someone at Be who is looking over their installation and package management, and I'm going to see what they think of dropping the heirarchy. (i.e. Be is a heirarchy of files which can have optional attributes, wheras I'd like it to be a collection of files with attributes, any of which could be a heirarchy) > Specifically, I find myself wanting to do the following tasks daily. > Some of these are the same as you had, but some are more complex. > All of them seem to be solved problems, just not integrated. > > o e-mail. get it. sort it. catalog it. grok it for > 'important content'. The use here is obvious. The > hard part is overcoming the nice features of most > mailers. Then again, most mailers suck. Most mailers suck bigtime. It surprised me how quick it was for me to wack up a basic mailreader with server side html generation. The biggest impediment to me using it is that there is no way for me to get it to let me edit my mail in emacs. > o usenet. get what i want. catalog it. The use here > is to build my knowledge base. This could be something > as simple as an app that connects to Deja News and > pulls down useful articles based on keywords and then > prunes. agreed... and IMO this app should be the same as the mailreader. > o web. basically the same thing as the usenet. Added > functionality of sticking a page into packrat while > browsing. I'd prefer it just to always stick every page I ever see into packarat. Derived from Alan's ideas about "personal proxy server". > o digital library. this is the most recent addition. I > now have over 500 megs of papers/documentation/whatever > that I need to keep track of... pain in the ass. yup.... and they are all just filenames and completely out of context. > o scheduling/to-do. obvious. Yeah, although I tend not to use electronic schedulers or todo lists, paper is more obtrusive for me, and that's what I need out of a scheduler. > o packrat replicator. allow packrat to travel easily. > this should be as easy as connecting and hitting replicate. yeah... I would really like an easy way to put all my information into one big information store (I'm thinking mostly contact information, but anything applies) and flag only certain things to sync with my pilot, but have it be the same information store. Then I could make some password protected webpage to access the same information, and have it sync to my pilot automatically. > o personal information management. e-wallet management. > purchase tracking, etc. (I've bought alot of books and > stuff lately and would like to keep track of it in > one place. Yeah... I havn't thought much about this. I don't use Quicken or anything yet, I guess I don't pay much attention to my personal finances. However, it would be nice if there was one place to put it all. > o publishing. publish data to friends and coworkers. Yes... we could have the 'tuna contact publishing link' and everyone's information would be avialable and up to date. There is really no reason that allowing group publishing of data like this should have anything specific to do with contact information either. Just a network replication strategy for stored data. > Along with the data, there would need to be access (remote). > Enter the 'attached' (builtin, whatever) webserver. Which > brings up firewalls. Enter the personal firewall. ahh... gotcha... > Hmm... anyway... I am just rambling. Sounds like you're rambling along the same lines I've been thinking. I see this really as a movement from storing unstructured data (files) in an unstructured world (pathnamespace) to storing structured data (records) in a structured world (database). My interesting thought about this is: - traditional databases impose the structure. When a client asks for a column (field), the database already knew about it, and it spits up the information passively out of it's datastore. - Packrat shouldn't 'impose' the structure. Data is structured whether we recognize it or not. Today's systems dosn't have mechanisms in place to remember information about data-structure and uniqueness. Today's systems also don't have a mechanism to connect questions to answers. So what I'd like to do is setup a 'type-relationship' system. If you get a jpeg file, the system can do work to make a guess at the filetype. If it finds something which makes sense, it can remember it. If you then ask for all the 'pictures' in the system, it should easily be able to bring up this jpeg file. If you ask for all the pictures which are at least 240x200 big, it should be able to run the appropriate software to figure out the dimensions of the jpeg file to decide if it meets the criteria. If you either (a) do searches on a field more often than you add records or (b) care about search speed more than storage space, it should derive these fields and store them in the cache when you insert the data in the system. The hope is that as the data-mining capabilities of a system like this demonstrated their worth, applications would expose their data in more interesting (i.e. more structured) ways. Some of this has already begun on BeOS. When you download files, it attaches the source-URL to them as an attribute. Email messages are stored with attributes on them for the important headers. However, BeOS missed quite a few things: (a) there is no ownership, or identity information for attribute names/types themselves, so the information is only marginally reliable (b) files are still primarily stored in a heirarchy and only secondarily have attributes. (c) you have to pre-dictate the 'structure', because you have to tell the FS mechanism which attributes to index. (d) last I checked, there wasn't a way to write software which could access the index information so you could create your own types of searches, and their searches are pretty limited. -- David Jeske (N9LCA) + http://www.chat.net/~jeske/ + jeske@...
From jeske@home.chat.net Mon Nov 30 17:16:39 1998 Date: Mon, 30 Nov 1998 17:16:39 -0800 From: David JeskeTo: Paul Bleisch Subject: Re: killer app Message-ID: <19981130171639.R5324@home.chat.net> References: <8744DF3002FBD011BDDF000092970B465B95E9@iron.digitalanvil.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.94.13i In-Reply-To: <8744DF3002FBD011BDDF000092970B465B95E9@iron.digitalanvil.com>; from Paul Bleisch on Mon, Nov 30, 1998 at 06:57:04PM -0600 Status: RO Content-Length: 3198 On Mon, Nov 30, 1998 at 06:57:04PM -0600, Paul Bleisch wrote: > The viewer should be the same, but the 'groper' > (the app that inputs news into the db) is obviously > different. yeah... although I'm beginning to think of most of the parts of this as little mini-data-handling-components, not really applications. Something would go out and understand how to talk to deja-news, and "inject" information with whatever type inforamtion it could attach, then another collection of little data-mining scripts would come by and collection information from the text. Some of them would be specific (like something made to deal with news headers), some of them would be generic (like a text indexer). I think it's really important to separate the injection from the data-mining, because information stored at inject time can't be 'recovered' but information derived from the data itself can easily be discarded and reproduced as often as necessary. > >I'd prefer it just to always stick every page I ever see into > >packarat. Derived from Alan's ideas about "personal proxy server". > > Hmm... that is interesting. It would have to > auto prune older data or something?? The first big point of my whole packrat thing was that I wanted to never (manually) delete anything. I wanted to set it up to use a collection of auto-prune and auto-backup to get rid of older data to make space for new data. > Part of my scheduling is done by taking my work log (a text file > that I do syslog style work logging to). I want all of this in one > place. Currently, I have to take this text file wherever I go. :( gotcha.. >> - Packrat shouldn't 'impose' the structure. Data is structured >> whether we recognize it or not. Today's systems don't have >> mechanisms in place to remember information about data-structure >> and uniqueness. Today's systems also don't have a mechanism to >> connect questions to answers. > > Hmm.. interesting. FWIW, this thinking is along the same lines as my thoughts on language typing. That is 'code has static types whether we like it or not'. Using dynamic languages is just ignoring the static type relatinships which do exist. In addition, dynamic typing is really just conforming to a static typed object reflection interface. Worse, when you bury code one level 'behind' that object reflection interface, you often lose access to the 'second order statics' of the code. I want not to lose the ability to record these static 'type requirements', even if they are second or third order. The packrat problem is the same data-keeping in the opposite direction. Wheras in languages, you take source and compile it down, losing information as you go, in packrat you start with a data-source, and try to data-mine 'up' looking for (a) specific data points, and (b) connections to other data. > I am slowly picking up DB skills... slooooowly. Too busy > really to do much work on this stuff. I have this feeling that relational databases may be a good place to explore this stuff, but that they are far more bloated with things required to implement the 'SQL Standard' than packrat needs. -- David Jeske (N9LCA) + http://www.chat.net/~jeske/ + jeske@...