Archive

2024-03-02

Personal Document Archive Management System

This was an attempt at making a web interface to manage all my comics, and books and anything else i could reasonably fit into it, thus calling it an “archive” seemed appropriate.

Planning

My first thought was to use python, as it was the language I was most familiar with at the time.

I wanted to have the archive accessible over a network, so I decided a web interface was the best choice for my application.

I looked up different python web servers, and Django fit my needs the best, so that is was I decided to use.

The Start

The first thing I did was sketch out the basic database structure I wanted. This started with the base unit that the user would be most interested in, the content. Each piece of content would contain a few basic things: a title, some creators, a couple of tags, and id, a files path to serve¸ a source url and a source retrieved time. The creators and tags fields would be many-to-many with th eir own tables, and each would have a name and an id, with creators also having an optional URL to their pages.

I started with four views: a content, creator, tag, and home view. Each peice of content would be put in the respective database tables using a script to list a directory tree and add each subfolder to the directory.

Results

This first iteration worked with my existing library, but every time I wanted to add more items to the database, it would have to do a full reconstruction of the content, author and tag tables, which is not ideal, so my next task was making a format that allowed items to be “remembered” by the scraper script.

The .meta format

As I was writing my application in python, I decided that JSON was the easiest data format to import, and that each content folder should contain a file called .meta, which should hold all the saved metadata. This does have an obvious-from-hindsite issue that means that all tags and authors need to be defined in their respective content files, and that keeping all of them up-to-date would be an issue, but that was something I learnt later.

2nd attempt

I updated the content models to have a whole bunch of extra metadata (custom path-like IDs, time fields, language and copywrite, an archiver field, and a custom html preview field). I also added a seperate archiver model, that takes charge of all the content of <x> type of content.

Results

This fixed a few issues, but a main sticking point was that the preview field was bare html being added into a page, thus being a big security vulnerability, as well as the whole content belonging to a specific archiver is not a flexible model and so the whole software suffers as a result.

Learnings

My lack of using javascript or an API system really did not help this software. If I was making this again (which I may do in the future), I would base it around a main protocol / API stack that actually manages the content, and then make a client frontend to that instead of being a fully server-side-rendered web server.

I believe that properly looking into other database and archival / library software would help me design a better content-creator-tag system / database format. I believe that there may already be a open format for this sort of thing, or at least an industry standard that I may be able to use.

I also would like to have the option to have a local software, instead of something primarily web-based, so a API that a native client could use, and/or a locally hosted cloning / syncing system would also be something cool to look into.


Enter your instance's address