Stefano Costa

There's more than potsherds out here

Faccio l’archeologo e vivo a Genova

Categoria: Free Software

  • All my source code repositories are now on Codeberg

    All my source code repositories are now on Codeberg

    I have moved all my personal source code repositories to Codeberg! Codeberg is the free home for your free projects. Because free and open source software deserves a free and open source platform.

    Codeberg.org is a provider for git source code hosting, based on the Gitea software. Gitea is open source, so there are many servers around, large and small, and there is no centralization. Gitea is very easy to use and it follows the same visual and conceptual paradigms that are common on other platforms. Git itself of course is and remains decentralized. For this reason I could easily switch many providers since I started using Git around 2008: repo.or.cz, gitorious. But that’s the technical side of it.

    Codeberg is run by Codeberg e.V., a no profit organization based in Germany. That means the platform is not run with the purpose of making money (not a bad thing per se, of course) or gathering user data, or accumulating enough social klout to sell the entire platform. It’s a service for the community, just like Wikipedia and OpenStreetMap. And there’s more: you can join Codeberg e.V. and become an active member. You’ll need to pay a small annual fee (which helps cover the costs of infrastructure) and you can participate in the discussions and decisions that shape the future of Codeberg. When I decided to move my repositories to Codeberg, I thought it was an excellent opportunity to give a strong support to this initiative and encourage more users to join us.

    Gitea is open source and it’s not difficult to run your own server. But, as any sysop knows, running thousands of separate servers doesn’t scale, that is, the costs of hosting 1000 repositories on 1000 separate servers are a lot higher that keeping those same 1000 repositories on 1 server only. Keeping a server running and up to date takes time, too. So I think it’s better to keep some of the advantages of centralized platforms and change what is really wrong with them: the software must be free, the governance must be open, the funding should come from users.

    .org hosting on .com was a historical mistake we all made. Time to correct it!

    Andreas Shimokawa

    I took the move to Codeberg as an opportunity to improve my good practice in software development and repository management:

    • I’m not creating “vanity” organizations like I did in the past: all repositories are under the steko user and I will consider creating an organization only when there is an actual community around a project (like we did for Total Open Station)
    • I’m trying to avoid overengineering with planned releases, issues, and anything that puts stress where fun and passion should be
    • I’ll be signing all Git commits with my GPG key
    • I will never again cite a source code repository in an academic paper: instead I will upload the snapshot to Zenodo, get a DOI and cite that (I did this mistake too many times in the past, even recently, only to realize how volatile these URLs can be)

    If you are an open source developer, I encourage you to join me on Codeberg!


    What’s wrong with GitHub?

    In the past 10 years the (apparently) unstoppable trend has been of concentrating the development of most open source software on GitHub, a proprietary service that was acquired by Microsoft last year. When that happened, some people became worried upon realizing that one of the giant monopolies had taken possession of their favourite platform. However, GitHub was a proprietary service all along, and it was made possible by venture capital. It was silly to make it a sacred place, as was teaching students Git as merely the tool behind GitHub, or using GitHub Pages as the “best” option for your website. Yes, there are millions of developers using GitHub, and it makes a lot of things easy, for free… but that comes at a price. I had started deleting repositories from GitHub last year, now only a few ones remain but I will delete those as well. While my personal repositories are moving to Codeberg, at this time Total Open Station remains on GitHub (of course I will try to move it as well, but that’s not a decision I can take by myself and it’s not even on the table).

    Some people moved to Gitlab.com, the flagship website of the company with the same name. The Community edition of Gitlab is open source, and it’s used by Debian, Gnome, Framasoft and other large projects. I had been using Gitlab.com for a few years and I dislike it from many points of view, mostly descending from the fact that it’s funded by venture capital. There were many revealing episodes: asking female employees to wear high heels; putting forward a ‘no politics at workplace’ stance, in order to keep quiet how much they’re looking for big contracts with all branches of the US government including Immigration and Customs Enforcement (ICE). They messed with users’ privacy and retracted introducing telemetry after public outrage. And I’m not even touching the technical side, since GitLab started as a simple GitHub clone and is now a behemoth all about putting your projects in Kubernetes. I have deleted and rectracted almost all repositories and I will delete everything shortly (my user account may remain there, empty, because I don’t like name squatting). In short, if you self host Gitlab, it may be fine, but I would never recommend putting your open source efforts on gitlab.com.

    The old Bitbucket, where I had many of my early Mercurial repositories, is dropping support for Mercurial and suffers from all the same problems of other proprietary platforms. I converted all those repositories and I’m moving those as well, even the old ones that I consider archived, because I think there’s some value in them.

  • A new home for the Total Open Station project

    As announced in a previous blog post, the Total Open Station project has a new home!

    More specifically, we’re still on GitHub but I have moved the main repository from my personal GitHub account to the “totalopenstation” organization, with myself and @psolyca as owners.

    This seems like a simple step but, in fact, it is the most important improvement that Total Open Station has ever got. It means that what has been a personal project for 12 years is now a collective effort, with a bus factor of 2.

    Before this change, we put a lot of effort to define how the new project will work, with a detailed CONTRIBUTING document (partly inspired by the great example of the Gitea project) and a Code of Conduct (based on the Contributor Covenant). To ensure code quality, all changes must be approved through a review process before being merged into the master branch. This is enforced for all contributors, myself included! There is a process in place to expand the team with new contributors on any part of the application, including code, data samples and design.

    We also gave official status to our Matrix chat room at https://matrix.to/#/#totalopenstation:matrix.org so if you’re looking for casual tips or questions, please drop us a message there!

    We are now looking forward to sharing the next release with all our users and with this in mind we introduced another important change: sponsorship. You can now support our work with a recurring donation, either via GitHub sponsors or Liberapay. Since the project is run as a volunteer effort, sponsorships will help us cover the cost of operating websites, testing hardware devices, and eventually buy a dedicated domain name.

    If you use Total Open Station please let us know and maybe give us a star ★ on GitHub.

    This post was originally published on the Total Open Station project blog.

  • Total Open Station 0.5

    Total Open Station 0.5

    Total Open Station 0.5 is here!

    This release is the result of a short and intense development cycle.

    The application is now based on Python 3, which means an improved handling of data transfers and a general improvement of the underlying source code.

    An extensive test suite based on pytest was added to help developers work with more confidence and the documentation was reorganized to be more readable.

    There are only minor changes for users but this release includes a large number of bugfixes and improvements in the processing of data formats like Leica GSI, Carlson RW5 and Nikon RAW.

    The command line program totalopenstation-cli-parser has four new options:

    • --2d will drop Z coordinates so the resulting output only contains X and Y coordinates
    • --raw will include all available data in the CSV output for further processing
    • --log and --logtofile allow the logging of application output for debugging

    If you were using a previous version of the program you can:

    • wait for your Linux distribution to upgrade
    • install with pip install --upgrade totalopenstation if you know your way around the command line on Linux or MacOS
    • download the Windows portable app from the release page: this release is the first to support the Windows portable app from the start – for the moment this release supports 64-bit operating systems but we are working to add a version for older 32-bit systems.

    But there’s more. This release marks a renewed development process and the full onboarding of @psolyca in the team. With the 0.6 release we are planning to move the repository from the personal “steko” account to an organization account and improve the contribution guidelines so that the future of Total Open Station is not dependent on a single person. Of course we have already great plans for new features, as always listed on our issue tracker.

    If you use Total Open Station please let us know and maybe give us a star ★ on GitHub.

  • Total Open Station 0.4 release

    Total Open Station 0.4 release

    This article was originally published on the Total Open Station website at https://tops.iosa.it/

    After two years of slow development, I took the opportunity of some days off to finally release version 0.4, that was already available in beta since 2017.

    No open bugs were left and this release is mature enough to hit the repositories.

    Find it on PyPI at https://pypi.python.org/pypi/totalopenstation as usual.

    Windows users, please note that the TOPS-on-a-USB-stick version will have to wait a few days more, but the beta version is equally functional.

    What’s new in Total Open Station 0.4

    The new version brings read support for 4 new formats:

    • Carlson RW5
    • Leica GSI
    • Sokkia SDR33
    • Zeiss R5

    Other input formats were improved, most notably Nikon RAW.

    DXF output was improved, even though the default template is not very useful since it is based on an old need from the time when TOPS was developed day to day on archaeological excavations.

    The work behind these new formats is in part by the new contributor to the project, Damien Gaignon (find him as @psolyca on GitHub), who submitted a lot of other code and started helping with project maintenance as well. I am very happy to have Damien onboard and since my usage of TOPS is almost at zero, it’s very likely that I will hand over the development in the near future.

    The internal data structures for handling the conversion between input and output formats are completely new, and based on the Python GeoInterface abstraction offered by the pygeoif library. This allows going beyond single points to managing lines and polygons, even though no such feature is available at the moment. If you often record linear or polygonal features that you’re manually joining in the post-processing stage, think about helping TOPS development and you could get DXF or Shapefiles with the geometries ready to use (yes, Shapefile output is on our plans, too).

    There were many bugfixes, more than 100 commits, 64 by Damien Gaignon and 52 by myself (to be honest, many of my own commits are just merges!).

    This version is the last built on Python 2, and work is already ongoing towards a new version that will be based on Python 3: a more mature codebase will mean a better program, without any visible drastic change.

    Photo by Scott Blake on Unsplash

  • Reproducible science per archeologi

    Reproducible science per archeologi

    Il 20 febbraio 2019, a Padova, tengo un workshop su Reproducible science per archeologi dentro il convegno FOSS4G-IT 2019. Avete tempo fino a mercoledì 13 febbraio per iscrivervi.

    Cosa facciamo

    Questo workshop guida i partecipanti nella creazione di una analisi di dati archeologici, secondo i canoni della reproducible science sempre più diffusi a livello internazionale e trasversale.

    Utilizzando software di elaborazione ben noti come il linguaggio R e l’ambiente di programmazione RStudio, partiremo da alcuni dataset e affronteremo i vari passaggi analitici che vengono trasposti sotto forma di codice: è una procedura pensata per rendere esplicito il processo di ricerca con i suoi meccanismi di tentativi ed errori, secondo il principio della ripetibilità sperimentale.

    I partecipanti potranno intervenire attivamente con me nella definizione del percorso e del prodotto finale del workshop, esplorando le pratiche più attuali della open science archeologica diffuse a livello internazionale.

    Ci colleghiamo ad altri workshop svolti negli anni scorsi negli USA da Ben Marwick e Matt Harris.

    Come iscriversi

    Vi potete registrare fino al 13 febbraio 2019 su questa pagina http://foss4g-it2019.gfoss.it/registrazione

    Per l’iscrizione è richiesto un pagamento di 10 € che vanno a coprire i costi organizzativi dell’evento – non serve a pagare il sottoscritto.

    Letture e riferimenti

    Per partecipare servirà avere installato R, RStudio e se possibile anche Git:

    Di seguito qualche link a letture utili per prepararsi al workshop:

  • IOSACal on the web: quick calibration of radiocarbon dates

    IOSACal on the web: quick calibration of radiocarbon dates

    Update November 2022: the web app is now discontinued and the recommended way to run IOSACal in the browser is with Jupyter in MyBinder or Google Colab. See this issue for more details.

    The IOSA Radiocarbon Calibration Library (IOSACal) is an open source calibration software. IOSACal is meant to be used from the command line and installation, while straightforward for GNU/Linux users, is certainly not as easy as common desktop apps. To overcome this inconvenience, I dedicated some efforts to develop a version that is immediately usable.

    The IOSACal web app is online at https://iosacal.herokuapp.com/.

    This is a demo service, so it runs on the free tier of the commercial Heroku platform and it may take some time to load the first time you visit the website. It is updated to run with the latest version of the software (at this time, IOSACal 0.4.1, released in May).

    Since it may be interesting to try the app even if you don’t have a radiocarbon date at hand, at the click of a button you can randomly pick one from the open data Mediterranean Radiocarbon dates database, and the form will be filled for you.

    The random date picker in action
    The random date picker in action

    Unfortunately, at this time it is not possible to calibrate or plot multiple dates in the web interface (but the command-line program is perfectly capable of that).

    IOSACal Web is made with Flask and the Bootstrap framework, and the app itself is of course open source.

    IOSACal is written in the Python programming language and is based on Numpy, Scipy and Matplotlib. This work wouldn’t be possible without the availability of such high quality programming libraries.

  • IOSACal 0.4

    IOSACal 0.4

    IOSACal is an open source program for calibration of radiocarbon dates.

    A few days ago I released version 0.4, that can be installed from PyPI or from source. The documentation and website is at http://c14.iosa.it/ as usual. You will need to have Python 3 already installed.

    The main highlight of this release are the new classes for summed probability distributions (SPD) and paleodemography, contributed by Mario Gutiérrez-Roig as part of his work for the PALEODEM project at IPHES.

    A bug affecting calibrated date ranges extending to the present was corrected.

    On the technical side the most notable changes are the following:

    • requires NumPy 1.14, SciPy 1.1 and Matplotlib 2.2
    • removed dependencies on obsolete functions
    • improved the command line interface

    You can cite IOSACal in your work with the DOI https://doi.org/10.5281/zenodo.630455. This helps the author and contributors to get some recognition for creating and maintaining this software free for everyone.

  • Total Open Station: a specialised format converter

    Total Open Station: a specialised format converter

    It’s 2017 and nine years ago I started writing a set of Python scripts that would become Total Open Station, a humble GPL-licensed tool to download and process data from total station devices. I started from scratch, using the Python standard library and pySerial as best as I could, to create a small but complete program. Under the hood, I’ve been “religiously” following the UNIX philosophy of one tool that does one thing well and that is embodied by the two command line programs that perform the separate steps of:

    1. downloading data via a serial connection
    2. converting the raw data to formats that can be used in GIS or CAD environments

    And despite starting as an itch to scratch, I also wanted TOPS to be used by others, to provide something that was absent from the free software world at the time, and that is still unchallenged in that respect. So a basic and ugly graphical interface was created, too. That gives a more streamlined view of the work, and largely increases the number of potential users. Furthermore, TOPS can run not just on Debian, Ubuntu or Fedora, but also on macOS and Windows and it is well known that users of the latter operating systems don’t like too much working from a terminal.

    Development has always been slow. After 2011 I had only occasional use for the software myself, no access to a real total station, so my interest shifted towards giving a good architecture to the program and extending the number of formats that can be imported and exported. In the process, this entailed rewriting the internal data structures to allow for more flexibility, such as differentiating between point, line and polygon geometries.

    Today, I still find GUI programming out of my league and interests. If I’m going to continue developing TOPS it’s for the satisfaction of crafting a good piece of software, learning new techniques in Python or maybe rewriting entirely in a different programming language. It’s clear that the core feature of TOPS is not being a workstation for survey professionals (since it cannot compete with the existing market of proprietary solutions that come attached to most devices), but rather becoming a polyglot converter, capable of handling dozens of raw data formats and flexibly exporting to good standard formats. Flexibly exporting means that TOPS should have features to filter data, to reproject data based on fixed base points with known coordinates, to create separate output files or layers and so on. Basically, to adapt to many more needs than it does now. From a software perspective, there are a few notable examples that I’ve been looking at for a long time: Sphinx, GPSBabel and Pandoc.

    Sphinx is a documentation generator written in Python, the same language I used for TOPS. You write a light markup source, and Sphinx can convert it to several formats like HTML, ePub, LaTeX (and PDF), groff. You can write short manuals, like the one I wrote for TOPS, or entire books. Sphinx accepts many options, mostly from a configuration file, and I took a few lines of code that I liked for handling the internal dictionary (key-value hash) of all input and output formats with conditional import of the selected module (rather than importing all modules that won’t be used). Sphinx is clearly excellent at what it does, even though the similarities with TOPS are not many. After all, TOPS has to deal with many undocumented raw formats while Sphinx has the advantage of only one standard format. Sphinx was originally written by Georg Brandl, one of the best Python developers and a contributor to the standard library, in a highly elegant object-oriented architecture that I’m not able to replicate.

    GPSBabel is a venerable and excellent program for GPS data conversion and transfer. It handles dozens of formats in read/write mode and each format has “suboptions” that are specific to it. GPSBabel has also advanced filtering capabilities, it can merge multiple input files and since a few years there is a minimal graphical interface. Furthermore, GPSBabel is integrated in GIS programs like QGIS and can work in a variety of ways thanks to its programmable command line interface. A strong difference with TOPS is that many of the GPS data formats are binary, and that the basic data structures of waypoints, tracks and routes is essentially the same (contrast that with the monster LandXML specification, or the dozens of possible combinations in a Leica GSI file). GPSBabel is written in portable C++, that I can barely read, so anything other than inspiration for the user interface is out of question.

    Pandoc is a universal document converter that reads many markup document formats and can convert to a greater number of formats including PDF (via LaTeX), docx, OpenDocument. The baseline format for Pandoc is an enriched Markdown. There are two very interesting features of Pandoc as a source of inspiration for a converter: the internal data representation and the Haskell programming language. The internal representation of the document in Pandoc is an abstract syntax tree that is not necessarily as expressive as the source format (think of all the typography and formatting in a printed document) but it can be serialised to/from JSON and allows filters to work regardless of the input or output format. Haskell is a functional language that I have never programmed, although it lends to creating complex and efficient programs that are easily extended. Pandoc works from the command line and has a myriad of options – it’s also rather common to invoke it from Makefiles or short scripts since one tends to work iteratively on a document. I could see a future version of TOPS being rewritten in Haskell.

    Scriptability and mode of use seem both important concepts to keep in mind for a data converter. For total stations, a common workflow is to download raw data, archive the original files and then convert to another format (or even insert directly into a spatial database). With the two programs totalopenstation-cli-connector and totalopenstation-cli-parser such tasks are easily automated in a single master script (or batch procedure) using a timestamp as identifier for the job and the archived files. This means that once the right parameters for your needs are found, downloading, archiving and loading survey data in your working environment is a matter of seconds, with no point-and-click, no icons, no mistakes. Looking at GPSBabel, I wonder whether keeping the two programs separate really makes sense from a UX perspective, as it would be more intuitive to have a single totalopenstation executable. In fact, this dual approach is a direct consequence of the small footprint of totalopenstation-cli-connector, that merely acts as a convenience layer on top of pySerial.

    It’s also important to think about maintainability of code: I have little interest in developing the perfect UI for TOPS, all the time spent for development is removed from my spare time (since no one is paying for TOPS) and it would be way more useful if dedicated plugins existed for popular platforms (think QGIS, gvSIG, even ArcGIS supports Python, not to mention CAD software). At this time TOPS supports ten (yes, 10) input formats out of … hundreds, I think (some of which are proprietary, binary formats). Expanding the list of supported formats is the single aim that I see as reasonable and worth of being pursued.

  • Debian Wheezy on a Fujitsu Primergy TX200 S3

    Debian Wheezy on a Fujitsu Primergy TX200 S3

    Debian Wheezy runs just fine on a Fujitsu Primergy TX200 S3 server

    A few days ago I rebooted an unused machine at work, that had been operating as the main server for the local network (~40 desktops) until 3 years ago. It is a Fujitsu Primergy TX200 S3, that was in production during the years 2006-2007. I found mostly old (ok, I can see why) and contradictory reports on the Web about running GNU/Linux on it.

    This is mostly a note to myself, but could serve others as well.

    I chose to install Debian on it, did a netinstall of Wheezy 7.8.0 from the netinst CD image (using an actual CD, not an USB key) and all went well with the default settings ‒ which may not be optimal, but that’s another story. While older and less beefy than its current HP companion, this machine is still good enough for many tasks. I am slightly worried by its energy consumption, to be honest.

    It will be used for running web services on the local network, such as Radicale for shared calendars and address books, Mediagoblin for media archiving, etc.

  • Archaeology and Django: mind your jargon

    I have been writing small Django apps for archaeology since 2009 ‒ Django 1.0 had been released a few months earlier. I love Django as a programming framework: my initial choice was based on the ORM, at that time the only geo-enabled ORM that could be used out of the box, and years later GeoDjango still rocks. I almost immediately found out that the admin interface was a huge game-changer: instead of wasting weeks writing boilerplate CRUD, I could just focus on adding actual content and developing the frontend. Having your data model as source code (under version control) is the right thing for me and I cannot go back to using “database” programs like Access, FileMaker or LibreOffice.

    Previous work with Django in archaeology

    There is some prior art on using Django in the magic field of archaeology, this is what I got from published literature in the field of computer applications in archaeology:

    I have been discussing this interaction with Diego Gnesi Bartolani for some time now and he is developing another Django app. Python programming skills are becoming more common among archaeologists and it is not surprising that databases big and small are moving away from desktop-based solutions to web-based

    The ceramicist’s jargon

    There is one big problem with Django as a tool for archaeological data management: language. Here are some words that are either Python reserved keywords or very important in Django:

    • class (Python keyword)
    • type (Python keyword)
    • object (Python keyword)
    • form (HTML element, Django module)
    • site (Django contrib app)

    Unfortunately, these words are not only generic enough to be used in everyday speak, but they are very common in the archaeologist’s jargon, especially for ceramicists.

    Class is often used to describe a generic and wide group of objects, e.g. “amphorae”, “fine ware”, “lamps”, ”cooking ware” are classes of ceramic products ‒ i.e. categories. Sometimes class is also used for narrower categories such as “terra sigillata italica”, but the most accepted term in that case is ware. The definition of ware is ambiguous, and it can be based on several different criteria: chemical/geological analysis of source material; visible characteristics such as paint, decoration, manufacturing; typology. The upside is that ware has no meaning in either Python or Django.

    Form and type are both used within typologies. There are contrasting uses of these two terms:

    •  a form defines a broad category, tightly linked to function (e.g. dish, drinking cup, hydria, cythera) and a type defines a very specific instance of that form (e.g. Dragendorff 29); sub-types are allowed and this is in my experience the most widespread terminology;
    • a form is a specific instance of a broader function-based category ‒ this terminology is used by John W. Hayes in his Late Roman Pottery.

    These terminology problems, regardless of their cause, are complicated by translation from one language to another, and regional/local traditions. Wikipedia has a short but useful description of the general issues of ceramic typology at the Type (archaeology) page.

    Site is perhaps the best understood source of confusion, and the less problematic. First of all everyone knows that the word site can have a lots of meanings and lots of archaeologists survive using both the website and the archaeological site meaning everyday. Secondly, even though the sites app is included by default in Django, it is not so ubiquitous ‒ I always used it only when deploying, una tantum.

    Object is a generic word. Shame on every programming language designer who ever thought it was a good idea to use such a generic word in a programming language, eventually polluting natural language in this digital age. No matter how strongly you think object is a good term to designate archaeological finds, items, artifacts, features, layers, deposits and so on, thou shalt not use object when creating database fields, programming functions, visualisation interfaces or anything else, really.

    The horror is when you end up writing code like this:

    class Class(models.Model)
        '''A class. Both a Python class and a classification category.'''
    
        pass
    
    class Type(models.Model)
        '''A type. Actually, a Python class.
    
        >>> t = Type()
        >>> type(t)
        <class '__main__.Type'>
        '''

    Not nice.

    Is there a solution to this mess? Yes. As any serious Pythonista knows…

    Explicit is better than implicit.
    […]
    Namespaces are one honking great idea — let’s do more of those!

    The Zen of Python

    Since changing the Python syntax is not a great idea, the best solution is to prefix anything potentially ambiguous to make it explicit (as suggested by the honking idea of namespaces ‒ a prefix is a poor man’s namespace). If you follow this, or a likewise approach, you won’t be left wondering if that form is an HTML form or a category of ceramic items.

    # pseudo-models.py
    
    class CeramicClass(models.Model):
        '''A wide category of ceramic items, comprising many forms.'''
    
        name = models.CharField()
    
    class CeramicForm(models.Model):
        '''A ceramic form. Totally different from CeramicType.'''
    
        name = models.CharField()
    
    class CeramicType(models.Model):
        '''A ceramic type. Whatever that means.'''
    
        name = models.CharField()
        ceramic_class = models.ForeignKey(CeramicClass)
        ceramic_form = models.ForeignKey(CeramicForm)
        source_ref = models.URLField()
    
    class ArcheoSite(models.Model):
        '''A friendly, muddy, rotting archaeological site.'''
    
        name = models.CharField()
    
    class CeramicFind(models.Model):
        '''The real thing you can touch and look at.'''
    
        ceramic_type = models.ForeignKey(CeramicType)
        archeo_site = models.ForeignKey(ArcheoSite)
        ... # billions of other fields