Total Open Station: a specialised format converter

It’s 2017 and nine years ago I started writing a set of Python scripts that would become Total Open Station, a humble GPL-licensed tool to download and process data from total station devices. I started from scratch, using the Python standard library and pySerial as best as I could, to create a small but complete program. Under the hood, I’ve been “religiously” following the UNIX philosophy of one tool that does one thing well and that is embodied by the two command line programs that perform the separate steps of:

  1. downloading data via a serial connection
  2. converting the raw data to formats that can be used in GIS or CAD environments

And despite starting as an itch to scratch, I also wanted TOPS to be used by others, to provide something that was absent from the free software world at the time, and that is still unchallenged in that respect. So a basic and ugly graphical interface was created, too. That gives a more streamlined view of the work, and largely increases the number of potential users. Furthermore, TOPS can run not just on Debian, Ubuntu or Fedora, but also on macOS and Windows and it is well known that users of the latter operating systems don’t like too much working from a terminal.

Development has always been slow. After 2011 I had only occasional use for the software myself, no access to a real total station, so my interest shifted towards giving a good architecture to the program and extending the number of formats that can be imported and exported. In the process, this entailed rewriting the internal data structures to allow for more flexibility, such as differentiating between point, line and polygon geometries.

Today, I still find GUI programming out of my league and interests. If I’m going to continue developing TOPS it’s for the satisfaction of crafting a good piece of software, learning new techniques in Python or maybe rewriting entirely in a different programming language. It’s clear that the core feature of TOPS is not being a workstation for survey professionals (since it cannot compete with the existing market of proprietary solutions that come attached to most devices), but rather becoming a polyglot converter, capable of handling dozens of raw data formats and flexibly exporting to good standard formats. Flexibly exporting means that TOPS should have features to filter data, to reproject data based on fixed base points with known coordinates, to create separate output files or layers and so on. Basically, to adapt to many more needs than it does now. From a software perspective, there are a few notable examples that I’ve been looking at for a long time: Sphinx, GPSBabel and Pandoc.

Sphinx is a documentation generator written in Python, the same language I used for TOPS. You write a light markup source, and Sphinx can convert it to several formats like HTML, ePub, LaTeX (and PDF), groff. You can write short manuals, like the one I wrote for TOPS, or entire books. Sphinx accepts many options, mostly from a configuration file, and I took a few lines of code that I liked for handling the internal dictionary (key-value hash) of all input and output formats with conditional import of the selected module (rather than importing all modules that won’t be used). Sphinx is clearly excellent at what it does, even though the similarities with TOPS are not many. After all, TOPS has to deal with many undocumented raw formats while Sphinx has the advantage of only one standard format. Sphinx was originally written by Georg Brandl, one of the best Python developers and a contributor to the standard library, in a highly elegant object-oriented architecture that I’m not able to replicate.

GPSBabel is a venerable and excellent program for GPS data conversion and transfer. It handles dozens of formats in read/write mode and each format has “suboptions” that are specific to it. GPSBabel has also advanced filtering capabilities, it can merge multiple input files and since a few years there is a minimal graphical interface. Furthermore, GPSBabel is integrated in GIS programs like QGIS and can work in a variety of ways thanks to its programmable command line interface. A strong difference with TOPS is that many of the GPS data formats are binary, and that the basic data structures of waypoints, tracks and routes is essentially the same (contrast that with the monster LandXML specification, or the dozens of possible combinations in a Leica GSI file). GPSBabel is written in portable C++, that I can barely read, so anything other than inspiration for the user interface is out of question.

Pandoc is a universal document converter that reads many markup document formats and can convert to a greater number of formats including PDF (via LaTeX), docx, OpenDocument. The baseline format for Pandoc is an enriched Markdown. There are two very interesting features of Pandoc as a source of inspiration for a converter: the internal data representation and the Haskell programming language. The internal representation of the document in Pandoc is an abstract syntax tree that is not necessarily as expressive as the source format (think of all the typography and formatting in a printed document) but it can be serialised to/from JSON and allows filters to work regardless of the input or output format. Haskell is a functional language that I have never programmed, although it lends to creating complex and efficient programs that are easily extended. Pandoc works from the command line and has a myriad of options – it’s also rather common to invoke it from Makefiles or short scripts since one tends to work iteratively on a document. I could see a future version of TOPS being rewritten in Haskell.

Scriptability and mode of use seem both important concepts to keep in mind for a data converter. For total stations, a common workflow is to download raw data, archive the original files and then convert to another format (or even insert directly into a spatial database). With the two programs totalopenstation-cli-connector and totalopenstation-cli-parser such tasks are easily automated in a single master script (or batch procedure) using a timestamp as identifier for the job and the archived files. This means that once the right parameters for your needs are found, downloading, archiving and loading survey data in your working environment is a matter of seconds, with no point-and-click, no icons, no mistakes. Looking at GPSBabel, I wonder whether keeping the two programs separate really makes sense from a UX perspective, as it would be more intuitive to have a single totalopenstation executable. In fact, this dual approach is a direct consequence of the small footprint of totalopenstation-cli-connector, that merely acts as a convenience layer on top of pySerial.

It’s also important to think about maintainability of code: I have little interest in developing the perfect UI for TOPS, all the time spent for development is removed from my spare time (since no one is paying for TOPS) and it would be way more useful if dedicated plugins existed for popular platforms (think QGIS, gvSIG, even ArcGIS supports Python, not to mention CAD software). At this time TOPS supports ten (yes, 10) input formats out of … hundreds, I think (some of which are proprietary, binary formats). Expanding the list of supported formats is the single aim that I see as reasonable and worth of being pursued.

Debian Wheezy on a Fujitsu Primergy TX200 S3

Debian Wheezy runs just fine on a Fujitsu Primergy TX200 S3 server

A few days ago I rebooted an unused machine at work, that had been operating as the main server for the local network (~40 desktops) until 3 years ago. It is a Fujitsu Primergy TX200 S3, that was in production during the years 2006-2007. I found mostly old (ok, I can see why) and contradictory reports on the Web about running GNU/Linux on it.

This is mostly a note to myself, but could serve others as well.

I chose to install Debian on it, did a netinstall of Wheezy 7.8.0 from the netinst CD image (using an actual CD, not an USB key) and all went well with the default settings ‒ which may not be optimal, but that’s another story. While older and less beefy than its current HP companion, this machine is still good enough for many tasks. I am slightly worried by its energy consumption, to be honest.

It will be used for running web services on the local network, such as Radicale for shared calendars and address books, Mediagoblin for media archiving, etc.

Archaeology and Django: mind your jargon

I have been writing small Django apps for archaeology since 2009 ‒ Django 1.0 had been released a few months earlier. I love Django as a programming framework: my initial choice was based on the ORM, at that time the only geo-enabled ORM that could be used out of the box, and years later GeoDjango still rocks. I almost immediately found out that the admin interface was a huge game-changer: instead of wasting weeks writing boilerplate CRUD, I could just focus on adding actual content and developing the frontend. Having your data model as source code (under version control) is the right thing for me and I cannot go back to using “database” programs like Access, FileMaker or LibreOffice.

Previous work with Django in archaeology

There is some prior art on using Django in the magic field of archaeology, this is what I got from published literature in the field of computer applications in archaeology:

I have been discussing this interaction with Diego Gnesi Bartolani for some time now and he is developing another Django app. Python programming skills are becoming more common among archaeologists and it is not surprising that databases big and small are moving away from desktop-based solutions to web-based

The ceramicist’s jargon

There is one big problem with Django as a tool for archaeological data management: language. Here are some words that are either Python reserved keywords or very important in Django:

  • class (Python keyword)
  • type (Python keyword)
  • object (Python keyword)
  • form (HTML element, Django module)
  • site (Django contrib app)

Unfortunately, these words are not only generic enough to be used in everyday speak, but they are very common in the archaeologist’s jargon, especially for ceramicists.

Class is often used to describe a generic and wide group of objects, e.g. “amphorae”, “fine ware”, “lamps”, ”cooking ware” are classes of ceramic products ‒ i.e. categories. Sometimes class is also used for narrower categories such as “terra sigillata italica”, but the most accepted term in that case is ware. The definition of ware is ambiguous, and it can be based on several different criteria: chemical/geological analysis of source material; visible characteristics such as paint, decoration, manufacturing; typology. The upside is that ware has no meaning in either Python or Django.

Form and type are both used within typologies. There are contrasting uses of these two terms:

  •  a form defines a broad category, tightly linked to function (e.g. dish, drinking cup, hydria, cythera) and a type defines a very specific instance of that form (e.g. Dragendorff 29); sub-types are allowed and this is in my experience the most widespread terminology;
  • a form is a specific instance of a broader function-based category ‒ this terminology is used by John W. Hayes in his Late Roman Pottery.

These terminology problems, regardless of their cause, are complicated by translation from one language to another, and regional/local traditions. Wikipedia has a short but useful description of the general issues of ceramic typology at the Type (archaeology) page.

Site is perhaps the best understood source of confusion, and the less problematic. First of all everyone knows that the word site can have a lots of meanings and lots of archaeologists survive using both the website and the archaeological site meaning everyday. Secondly, even though the sites app is included by default in Django, it is not so ubiquitous ‒ I always used it only when deploying, una tantum.

Object is a generic word. Shame on every programming language designer who ever thought it was a good idea to use such a generic word in a programming language, eventually polluting natural language in this digital age. No matter how strongly you think object is a good term to designate archaeological finds, items, artifacts, features, layers, deposits and so on, thou shalt not use object when creating database fields, programming functions, visualisation interfaces or anything else, really.

The horror is when you end up writing code like this:

class Class(models.Model)
    '''A class. Both a Python class and a classification category.'''

    pass

class Type(models.Model)
    '''A type. Actually, a Python class.

    >>> t = Type()
    >>> type(t)
    <class '__main__.Type'>
    '''

Not nice.

Is there a solution to this mess? Yes. As any serious Pythonista knows…

Explicit is better than implicit.
[…]
Namespaces are one honking great idea — let’s do more of those!

The Zen of Python

Since changing the Python syntax is not a great idea, the best solution is to prefix anything potentially ambiguous to make it explicit (as suggested by the honking idea of namespaces ‒ a prefix is a poor man’s namespace). If you follow this, or a likewise approach, you won’t be left wondering if that form is an HTML form or a category of ceramic items.

# pseudo-models.py

class CeramicClass(models.Model):
    '''A wide category of ceramic items, comprising many forms.'''

    name = models.CharField()

class CeramicForm(models.Model):
    '''A ceramic form. Totally different from CeramicType.'''

    name = models.CharField()

class CeramicType(models.Model):
    '''A ceramic type. Whatever that means.'''

    name = models.CharField()
    ceramic_class = models.ForeignKey(CeramicClass)
    ceramic_form = models.ForeignKey(CeramicForm)
    source_ref = models.URLField()

class ArcheoSite(models.Model):
    '''A friendly, muddy, rotting archaeological site.'''

    name = models.CharField()

class CeramicFind(models.Model):
    '''The real thing you can touch and look at.'''

    ceramic_type = models.ForeignKey(CeramicType)
    archeo_site = models.ForeignKey(ArcheoSite)
    ... # billions of other fields

The OSGeo umbrella

The Open Source Geospatial Foundation ‒ OSGeo ‒ is an umbrella organization for a lot of free and open source software projects focused on geospatial technology, that is, maps 🙂

Artistic closeup of a coloured umbrella, steve trigg

Projects range from low-level programming libraries like GDAL/OGR (that is also used by ESRI software) to full-fledged desktop apps like QGIS and gvSIG and powerful web-mapping frameworks (GeoServer and MapServer). But there’s much more, really. Take a look for yourself on the OSGeo wiki.

Being an umbrella means that

OSGeo Projects are freestanding entities, handled by their own Project Steering Committees.

so I have been wondering how much interaction there is among OSGeo projects, looking at people who are members of Project Steering Committees (PSC). PSCs are the governing body of each project, and they vary a lot in membership size, structure and activity, but ultimately their purpose is the make sure that nothing happens in isolation and that decisions are consensus-driven in democratic way, as required by the OSGeo rules.

My initial idea was simple: look at PSC membership as a graph, with members and projects as nodes, all converging into OSGeo. That makes for a nice umbrella-shaped graph!

The OSGeo umbrella - smallSmall black dots represent PSC members. You will notice that several members are part of more than one project. That’s the OSGeo cabal! Almost like a Swedish conspiracy!

In yellow: the cabal, revealed!
In yellow: the cabal, revealed! Click for the larger version.

Jokes aside, there is of course some connectedness, namely in two clusters: the webmapping cluster and the “founders” cluster: GDAL, UMN MapServer, PROJ.4 (part of MetaCRS), GEOS and PostGIS are all OSGeo founding projects and are in many cases the core components of the OSGeo software stack.

The (not so) nice graphs were put together in the DOT language and plotted with the following command:

sfdp osgeo-projects-psc.gv -Tpng -o osgeo-big.png

The source file is found in this gist and is free to use if you want to develop new visualizations.

Of course this is by no means informative of the actual interaction of the larger OSGeo community (that is orthogonal to projects), and even within projects there is much more to look at: mailing lists, code repositories, issue trackers, etc. The nice thing here is that to obtain the data I needed I took the opportunity to review the available information and even fixed a few things in the wiki page linked above. Take it as a first step in developing a wider understanding of the “hidden structure” in the OSGeo community.

Innovation: Ubuntu Edge vs FairPhone

“We’re fighting for free software running on top of hardware that was manufactured by slaves.” – Vinay Gupta at #ohm2013

I have a smartphone, a Samsung Gingerbread device that does its job and allows me to use a mobile phone, check my email, use social networks and take decent pictures on the go (it does much more than that, actually). I have no need for an expensive alternative, so funding the Ubuntu Edge campaign was out of question. The Edge crowdfunding campaign will fail, even though it has been the biggest ever looking at the amount that was raised (but the Pebble campaign had more than twice the funders, and ultimately it was successful),

Is the Edge really driving innovation? My answer is no, quoting Vinay Gupta above. Sorry, Canonical, but I want innovation-that-makes-Earth-a-better-place, not innovation-that-makes-me-more-productive. Innovation for human beings, that is.

I think the Fairhone is a much more interesting project. The campaign shares many characteristics with the Edge, but it has had much less media coverage (because there is no Mark Shuttleworth here). The Fairphone aims at creating a more fair device that respects workers, using conflict-free resources (such as tin and tantalum), reducing the amount of waste in the production process. Interesting points:

  • they needed to reach a certain amount of pre-orders to be able to enter production;
  • they reached their goal and were able to produce 20,000 devices;
  • the final price is 325 €;
  • 13k phones sold so far ‒ more phones than the Edge campaign (see below);
  • you can still buy one, even if you didn’t contribute.

The Edge campaign raised more than 12M $. The actual number of Edge devices “bought” by backers is 5300-ish. A lot of people only donated 20 or 50 $: again, I don’t see the rationale in funding the making of a device that you can’t afford or don’t want to buy.

I didn’t even touch on the topic of Canonical, the Ubuntu community and the free and open source community. Or software at all. My next smartphone is likely going to be a FirefoxOS device. Maybe a Fairphone 3?

GFOSS news

I’ve been meaning to do this for a long time now (since 2009 apparently). I am proud to announce the first issue of GFOSS news, a newsletter from and about the GFOSS community with a focus on Italy.

GFOSS news

The tagline is “software, dati, persone” (software, data, people) and it is exactly like that. It’s a short summary of what has been going on in the past month: software releases, community events (including the much-discussed resignation of the association’s board) and other relevant news. News items are very short, which is in itself kind of unusual for the average Italian written text.

It is nothing original of course: all news come from OSGeo and GFOSS mailing lists, Planet OSGeo, Planet GIS Italia, Twitter and other sources (with links to the original announcements).

Spread the word, and submit your ideas for the next issue!

What is coming in Total Open Station 0.4

More than one year has passed since the first release of Total Open Station (TOPS). Version 0.3 already brought support for multiple data formats and devices, the ability to export your data to common standard formats and a programming library to create scripts around the core functionality of TOPS. We were very proud when TOPS was added to OpenSUSE and it is now being added to Debian and Fedora, three among the most popular GNU/Linux distributions.

Feedback from users of TOPS 0.3 has not been as significant as we would have expected, even though we have been providing a stack component for survey professionals that was totally missing on GNU/Linux and on Mac OS too in many cases. Nevertheless, we have continued developing TOPS, admittedly at a slower pace.

TOPS 0.4 is going to feature support for new raw data formats (including initial support for the popular Leica GSI) and the core data types are being completely rewritten in order to allow handling of polylines and polygons. The lines of code are fewer, making it easier to find new bugs and to start hacking on your own if you want. Thanks to our contributors, we will make more languages available for the program interface.

Being a volunteer-driven project, developer time is a critical resource, but we found out that user feedback and involvement is actually the most valuable resource. With this in mind, we are going to change the project governance to make the role of contributing users more prominent.

If you use a total station as part of your daily work and you care about software freedom, please consider donating to support the development of TOPS, and submitting a bug report about the models and formats you need.

Total Open Station is a free and open source program to download, manage and export survey data from total stations. It runs on all major operating systems and supports a growing number of raw data formats.