Archaeology and Django: mind your jargon

I have been writing small Django apps for archaeology since 2009 ‒ Django 1.0 had been released a few months earlier. I love Django as a programming framework: my initial choice was based on the ORM, at that time the only geo-enabled ORM that could be used out of the box, and years later GeoDjango still rocks. I almost immediately found out that the admin interface was a huge game-changer: instead of wasting weeks writing boilerplate CRUD, I could just focus on adding actual content and developing the frontend. Having your data model as source code (under version control) is the right thing for me and I cannot go back to using “database” programs like Access, FileMaker or LibreOffice.

Previous work with Django in archaeology

There is some prior art on using Django in the magic field of archaeology, this is what I got from published literature in the field of computer applications in archaeology:

I have been discussing this interaction with Diego Gnesi Bartolani for some time now and he is developing another Django app. Python programming skills are becoming more common among archaeologists and it is not surprising that databases big and small are moving away from desktop-based solutions to web-based

The ceramicist’s jargon

There is one big problem with Django as a tool for archaeological data management: language. Here are some words that are either Python reserved keywords or very important in Django:

  • class (Python keyword)
  • type (Python keyword)
  • object (Python keyword)
  • form (HTML element, Django module)
  • site (Django contrib app)

Unfortunately, these words are not only generic enough to be used in everyday speak, but they are very common in the archaeologist’s jargon, especially for ceramicists.

Class is often used to describe a generic and wide group of objects, e.g. “amphorae”, “fine ware”, “lamps”, ”cooking ware” are classes of ceramic products ‒ i.e. categories. Sometimes class is also used for narrower categories such as “terra sigillata italica”, but the most accepted term in that case is ware. The definition of ware is ambiguous, and it can be based on several different criteria: chemical/geological analysis of source material; visible characteristics such as paint, decoration, manufacturing; typology. The upside is that ware has no meaning in either Python or Django.

Form and type are both used within typologies. There are contrasting uses of these two terms:

  •  a form defines a broad category, tightly linked to function (e.g. dish, drinking cup, hydria, cythera) and a type defines a very specific instance of that form (e.g. Dragendorff 29); sub-types are allowed and this is in my experience the most widespread terminology;
  • a form is a specific instance of a broader function-based category ‒ this terminology is used by John W. Hayes in his Late Roman Pottery.

These terminology problems, regardless of their cause, are complicated by translation from one language to another, and regional/local traditions. Wikipedia has a short but useful description of the general issues of ceramic typology at the Type (archaeology) page.

Site is perhaps the best understood source of confusion, and the less problematic. First of all everyone knows that the word site can have a lots of meanings and lots of archaeologists survive using both the website and the archaeological site meaning everyday. Secondly, even though the sites app is included by default in Django, it is not so ubiquitous ‒ I always used it only when deploying, una tantum.

Object is a generic word. Shame on every programming language designer who ever thought it was a good idea to use such a generic word in a programming language, eventually polluting natural language in this digital age. No matter how strongly you think object is a good term to designate archaeological finds, items, artifacts, features, layers, deposits and so on, thou shalt not use object when creating database fields, programming functions, visualisation interfaces or anything else, really.

The horror is when you end up writing code like this:

class Class(models.Model)
    '''A class. Both a Python class and a classification category.'''

    pass

class Type(models.Model)
    '''A type. Actually, a Python class.

    >>> t = Type()
    >>> type(t)
    <class '__main__.Type'>
    '''

Not nice.

Is there a solution to this mess? Yes. As any serious Pythonista knows…

Explicit is better than implicit.
[…]
Namespaces are one honking great idea — let’s do more of those!

The Zen of Python

Since changing the Python syntax is not a great idea, the best solution is to prefix anything potentially ambiguous to make it explicit (as suggested by the honking idea of namespaces ‒ a prefix is a poor man’s namespace). If you follow this, or a likewise approach, you won’t be left wondering if that form is an HTML form or a category of ceramic items.

# pseudo-models.py

class CeramicClass(models.Model):
    '''A wide category of ceramic items, comprising many forms.'''

    name = models.CharField()

class CeramicForm(models.Model):
    '''A ceramic form. Totally different from CeramicType.'''

    name = models.CharField()

class CeramicType(models.Model):
    '''A ceramic type. Whatever that means.'''

    name = models.CharField()
    ceramic_class = models.ForeignKey(CeramicClass)
    ceramic_form = models.ForeignKey(CeramicForm)
    source_ref = models.URLField()

class ArcheoSite(models.Model):
    '''A friendly, muddy, rotting archaeological site.'''

    name = models.CharField()

class CeramicFind(models.Model):
    '''The real thing you can touch and look at.'''

    ceramic_type = models.ForeignKey(CeramicType)
    archeo_site = models.ForeignKey(ArcheoSite)
    ... # billions of other fields

Pubblicato da

Stefano Costa

Archaeologist, I study the Late Antique and Early Medieval/Byzantine period on the northern side of the Mediterranean, focusing on pottery usage patterns. I'm also involved in open source and open knowledge communities, like OSGeo, the IOSA project and the Open Knowledge Foundation.

5 pensieri riguardo “Archaeology and Django: mind your jargon”

  1. I have also been working with django to build an archeological database and data entry/analysis interface. Which database backend are you using? Is there a particular reason why you went with a relational database rather than an ontology/RDF format?

  2. Thanks for adding your views here. I usually choose SQLite/Spatialite during the early development phase and then move to PostgreSQL/PostGIS in production (with the big caveat that it’s often running on a local network). I have been exploring ontology-based formats a few years ago, but in the end I chose to use the standard Django backend because I found nothing that seemed to have a sustainable future (e.g. https://code.google.com/p/django-rdf/ stopped in 2008?) and I can’t afford the maintenance of such a dependency on my own. Furthermore, tools that I consider essential like South for migrations do not support these exotic formats. I found similar issues for document-based databases (so called NoSQL). I found Postgres-specific Django extensions to be extremely useful. Of course shared ontologies (e.g. CIDOC CRM or more human alternatives) must be used as the reference conceptual model. Did you find any way to integrate those into Django with a reasonable effort (e.g. no need to rewrite the admin from scratch, some GeoDjango support)?

  3. I use PostgreSQL as well. There is a django package that allows integration with Neo4j called Neo4django (https://github.com/scholrly/neo4django). However, it still needs to be used with another database backend and doesn’t have the type of spatial capabilities that PostgreSQL offers out of the box. PostgreSQL/PostGIS appears to be the best backend for archeological purposes. The complexity of archeological typology makes it difficult to use the default admin interface for artifact data entry so I’m going to have to design my own soon.

  4. Re: complexity of typology, I see what you mean ‒ there should be a collective effort towards shared repositories of reference typologies, otherwise we’re just going to reinvent the wheel every few years. On the Postgres side, I just found this new project geared towards much better coverage of Postgres features in Django: arrays could prove very useful.

Rispondi