Personal Document Archive Management System

This was an attempt at making a web interface to manage all my comics, and books and anything else i could reasonably fit into it, thus calling it an “archive” seemed appropriate.

Planning

My first thought was to use python, as it was the language I was most familiar with at the time.

I wanted to have the archive accessible over a network, so I decided a web interface was the best choice for my application.

I looked up different python web servers, and Django fit my needs the best, so that is was I decided to use.

The Start

The first thing I did was sketch out the basic database structure I wanted. This started with the base unit that the user would be most interested in, the content. Each piece of content would contain a few basic things: a title, some creators, a couple of tags, and id, a files path to serve¸ a source url and a source retrieved time. The creators and tags fields would be many-to-many with th eir own tables, and each would have a name and an id, with creators also having an optional URL to their pages.

I started with four views: a content, creator, tag, and home view. Each peice of content would be put in the respective database tables using a script to list a directory tree and add each subfolder to the directory.

Results

This first iteration worked with my existing library, but every time I wanted to add more items to the database, it would have to do a full reconstruction of the content, author and tag tables, which is not ideal, so my next task was making a format that allowed items to be “remembered” by the scraper script.

The .meta format

As I was writing my application in python, I decided that JSON was the easiest data format to import, and that each content folder should contain a file called .meta, which should hold all the saved metadata. This does have an obvious-from-hindsite issue that means that all tags and authors need to be defined in their respective content files, and that keeping all of them up-to-date would be an issue, but that was something I learnt later.

2nd attempt

I updated the content models to have a whole bunch of extra metadata (custom path-like IDs, time fields, language and copywrite, an archiver field, and a custom html preview field). I also added a seperate archiver model, that takes charge of all the content of <x> type of content.

Results

This fixed a few issues, but a main sticking point was that the preview field was bare html being added into a page, thus being a big security vulnerability, as well as the whole content belonging to a specific archiver is not a flexible model and so the whole software suffers as a result.

Learnings

My lack of using javascript or an API system really did not help this software. If I was making this again (which I may do in the future), I would base it around a main protocol / API stack that actually manages the content, and then make a client frontend to that instead of being a fully server-side-rendered web server.

I believe that properly looking into other database and archival / library software would help me design a better content-creator-tag system / database format. I believe that there may already be a open format for this sort of thing, or at least an industry standard that I may be able to use.

I also would like to have the option to have a local software, instead of something primarily web-based, so a API that a native client could use, and/or a locally hosted cloning / syncing system would also be something cool to look into.

Hoverth

Archive

Personal Document Archive Management System

Planning

The Start

Results

The .meta format

2nd attempt

Results

Learnings

Enter your instance's address