Legacy archaeological data and storage devices

I found a CD and a floppy disk inside one of the lockers in our laboratory:


They are from 8 years ago, and they contain data about the ceramic finds from the 2003 campaign in Gortyna. This looked like a very interesting finding, from two related points of view:

  1. legacy storage devices;
  2. legacy data formats.

Storage devices

Nowadays, only certain desktop PCs are equipped with a floppy disk reader, and normal laptops are missing it since 2004 at least. I verified that the content of the floppy disk and that of the optical disk were exactly the same. My colleague made a good choice then, when she decided to backup the content of the floppy disk to the CD-R. With the exception of netbooks and tablets, all computers have an optical drive for CDs and DVDs, and they are probably going to have one for a long time.

On the other hand, optical disks are the least resistant medium for digital archiving. Good quality CDs have an average life of 10-15 years, if they are stored properly and never taken out of their case. This specific disk was manufactured by TDK, that’s a good quality manufacturer generally speaking. Magnetic disks (like 3.5″ floppy disks) can last much longer, if they are kept away from electromagnetic sources (e.g. a TV, radio, PC, electric cables in general). Magnetic tape is generally considered the most safe archival medium, but it’s not available to common users.

Whatever storage medium is chosen, the true difference is actually made by the long-term archival strategy of your team or (better) institution. Having regular backups at secure and distant locations is perhaps easier nowadays with cloud providers like Dropbox and Ubuntu One, but a research institution should (must?) be able to define its own archival strategy under full control of its own staff. What I see in a lot of universities is instead a team-wise scale for long-term storage of digital archives, with a diversity of approaches. I’ve written elsewhere about existing general recommendations, and it’s no surprise that all of them are from the UK, Germany or the EU.

Data formats

As for storage devices, the issue of data formats is two-fold:

  • the digital formats in which information is stored;
  • the way information is structured.

Digital formats are well known and usually associated with file extensions, such as .jpg for JPEG images, .odt for Open Document text files and so on. Open formats are better than proprietary, especially in the light of digital preservation. Some formats are nevertheless so widespread that it’s impossibile to to avoid them, if not for yourself from your colleagues. This is the case of Microsoft Office file formats.

The data I found on the CD are in a Microsoft Excel spreadsheet (.xls extension). Apart from that, it’s just a table that could have been saved in any other format, including a plain-text comma-separated values file (.csv) or inside a relational database. With this I mean that one thing is the format in which data is translated into bits on a machine, and another thing is how information (or data, if you prefer) is structured inside a file or any other container, and also how much it is actually structured.

Let’s take a look at this dataset then. As common in spreadsheets, there are three separate sheets:

  • drawings catalogue;
  • fabrics catalogue;
  • general catalogue of all diagnostic sherds.

The third one is the main table and contains more than 660 entries from 59 contexts. It also has pointers to the two previous tables, so this should have been a relational database if it was done right. Or not. A relational database made in 2003 would have been either Microsoft Access or FileMaker (still two incredibily popular choices). In both cases, recovering data if you don’t own a copy of the software is almost impossible. Then? Hooray for relational databases in spreadsheets! In other words, always try to use formats that can be accessed by a variety of programs, and leave the logic of your data inside the data themselves instead of assuming that other people will be able to infer them by means of conventional knowledge.

I was able to recover the entire archive and it’s now being integrated into our current documentation system.

In volo

Il cambiamento avviene ai margini, in periferia, ai confini. Non avere una identità definita, prescritta, è una condizione privilegiata. Non mi dispiace essere sempre a metà strada, sempre in discussione, mai precisamente pertinente.

Sono anarchico. Vedo e sento i miei coetanei parlare sempre al passato, con il rimpianto di non avere qualcosa di scontato ad aspettarli: un lavoro, una famiglia. Non sento mai parlare di “quello che faremo”, “voglio fare così”, perché c’è sempre qualcun altro che ti dice cosa fare, come fare. Questo è il precariato. Parlare di politica è sempre lamentarsi, e mai proporre, pianificare, cambiare le cose. Come se la realtà fosse impossibile da manipolare e l’unica possibilità sia quella di allinearsi a qualche idea già fatta e pronta. Un partito, una petizione, e mai un sogno, un progetto.

Non do niente per scontato. Non penso che sia ovvio e necessario che gli altri supportino quello che faccio, che sia degno di ricevere attenzione, denaro. Se mi chiedete cosa faccio, o perché lo faccio, non ho la risposta pronta, preconfezionata. Cerco sempre di capire a cosa serve.

Penso che l’archeologia abbia molto a che fare con il teatro, con l’identità. Plasmiamo identità per chi non ha tempo di farsene una da sola. L’importante è capire quando si è sul palcoscenico. L’importante è non cadere dal palco.

Museo archeologico di Portoferraio

Il museo civico archeologico di Portoferraio ospita reperti provenienti da tutta l’isola d’Elba e da alcuni relitti delle acque circostanti.

Roman pottery from archaeological excavations on Isola d'Elba

Fortunatamente è stato possibile scattare fotografie all’interno, anche se naturalmente la qualità non è molto buona. Purtroppo non esiste un catalogo, anche se il materiale è in gran parte pubblicato.

Scavando a Vignale, un sito di età romana di fronte all’isola d’Elba, la visita era in qualche modo un dovere. La collezione di ceramiche ellenistiche e romane è ampia: sono particolarmente interessanti i rinvenimenti subacquei e quelli degli scavi nelle villae maritimae dell’isola, come la villa delle Grotte.

A look at pollen data in the Old World

Since the 19th century, the study of archaeobotanical remains has been very important for combining “strictly archaeological” knowledge with environmental data. Pollen data enable assessing the introduction of certain domesticated species of plants, or the presence of other species that grow typically where humans dwell. Not all pollen data come from archaeological fieldwork, but the relationship among the two sets is strong enough to take an interested look at pollen data worldwide, their availability and most importantly their openness, for which we follow the Open Knowledge Definition.

The starting point for finding pollen data is the NOAA website.

The Global Pollen Database hosted by the NOAA is a good starting point, but apparently its coverage is quite limited outside the US. Furthermore, data from 2005 onwards aren’t available via FTP in simple documented formats, but are instead downloadable as Access databases from another external website. Defining MS Access databases as a Bad Choice™ for data exchange is perhaps an euphemism.

Unfortunately, a large number of databases covering single continents or smaller regions is growing, and the approaches to data dissemination show marked differences.


For both North and South America, you can get data from more than one thousand sites directly via FTP. There are no explicit terms of use. Usually, data retrieved from federal agencies are public domain data.

The README document only states NOTE: PLEASE CITE ORIGINAL REFERENCES WHEN USING THIS DATA!!!!!. Fair enough, the requirement for attribution is certainly compatible with the Open Knowledge Definition.


From the GPD website we can easily reach the European Pollen Database, that is found at another website tough (and things can be even more confusing, provided that the NOAA website has some dead links).

You can download EPD data in PostgreSQL dump format (one file for each table, with a separate SQL script create_epd_db.sql). Data in the EPD can be restricted or unrestricted. That’s fine, let’s see how many unrestricted datasets there are. Following the database documentation, the P_ENTITY table contains the use status of each dataset:

steko@gibreel:~/epd-postgres-distribution-20100531$ cat p_entity.dump \
 | awk -F "t" {' print $5 '} | sort | uniq -c 
 154 R 
 1092 U

which is pretty good because almost 88% of them are unrestricted (NB I write most of my programs in Python but I love one liners that involve awksort and uniq). We could easily create an “unrestricted” subset and make it available for easy download to all those who don’t want to mess up with restricted data.

But what do “unrestricted” mean for EPD data? Let’s take a more careful look (emphasis mine):

  1. Data will be classified as restricted or unrestricted. All data will be available in the EPD, although restricted data can be used only as provided below.
  2. Unrestricted data are available for all uses, and are included in the EPD on various electronic sites.
  3. Restricted data may be used only by permission of the data originator. Appropriate and ethical use of restricted data is the responsibility of the data user.
  4. Restrictions on data will expire three years after they are submitted to the EPD. Just prior to the time of expiration, the data originator will be contacted by the EPD database manager with a reminder of the pending change. The originator may extend restricted status for further periods of three years by so informing the EPD each time a three-year period expires.

Sounds quite good, doesn’t it? “for all uses” is reassuring and the short time limit is a good trade off. The horror comes a few paragraphs below with the following scary details:

  1. The data are available only to non-profit-making organizations and for research.

Profit-making organizations may use the data, even for legitimate uses, only with the written consent of the EPD Board, who will determine or negotiate the payment of any fee required.

Here the false assumption that only academia is entitled to perform research is taken for granted. And there are even more rules about the “normal ethics”: basically if you use EPD data in a publication the original data author should be listed among the authors of the work. I always thought citation and attribution were invented just for that exact purpose, but it looks like they have distinctly different approach to attribution. The EPD is even deciding what are “legitimate” uses of pollen data (I can hardly think of any possible unlegitimate use).


You write “Africa” but you read “Europe” again, because most research projects are from French and English universities. For this reason, the situation is almost the same. What is even worst is that in developing countries there are far less people or organizations that can afford buying those data, notwithstanding the fact that in regions under rapid development the study and preservation of environmental resources are of major importance.

Data are downloadable for individual sites using a search engine, in Tilia format (not ASCII unfortunately). The problems come out with the license:

The wording is almost exactly the same as for the EPD seen above:

Normal ethics pertaining to co-authorship of publications applies. The contributor should be invited to be a co-author if a user makes significant use of a single contributor’s site, or if a single contributor’s data comprise a substantial portion of a larger data set analysed, or if a contributor makes a significant contribution to the analysis of the data or to the interpretation of the results. The data will be available only to non-profit-making organisations and for research. Profit-making organisations may use the data for legitimate purposes, only with the written consent of the majority of the members of the Advisory board, who will determine or negotiate the payment of any fee required. Such payment will be credited to the APD.


As for dendrochronological data, there is a serious misunderstanding by universities and research centers of their role in society as places of research, innovation that is available for everyone. In other words, academia is a closed system producing data (at very high costs for society) that are only available inside its walls, but it’s all done with public money.

The only positive bit of the story, if any, is that these datasets are nevertheless available on the web, and their terms of use are clearly stated, no matter how restrictive. It would be just impossible to write a similar article about archaeological pottery, or zooarchaeological finds.

Appendix: Using pollen data

Pollen data are usually presented in forms of synthetic charts where both stratigraphic data and quantitative pollen data are easily readable. Each “column” of the chart stands for a species or genus. You can create this kind of visualization with free software tools.

The stratigraph package for R can be used for

plotting and analyzing paleontological and geological data distributed through through time in stratigraphic cores or sections. Includes some miscellaneous functions for handling other kinds of palaeontological and paleoecological data.

See the chart for an example of how they look like.

An example plot using the R stratigraph package
An example plot using the R stratigraph package

ArcheoFOSS 2010: back from Foggia

ArcheoFOSS 2010, the 5th Italian workshop on “Free software, open source e open format nei processi di ricerca archeologica” took place in Foggia, on the 6 and 7 May. First of all, it was very good. I’m satisfied with this meeting. Why? Here are some thoughts I sketched while traveling back to Siena.

Lots of talks were about the results and methods of research done by MA and PhD students (myself included) – and this means one of the most important pieces of research, perhaps the most important at all, and the most underrated at the same time. Our community shows a strong connection between education and research. Making this connection stronger is part of our habits, I believe

There was a lot of discussion about methodology, and thanks to the firm experience of our friends in Foggia we have gone beyond some stereotypes of the past years. Take for example the recognition that methodology means much more than recording, documentation or technical tools. Add the acceptance of plurality as a (positive) fact rather than a problem. End up with the epiphany that using similar tools (e.g. databases, GIS) doesn’t mean working with the same underlying methodological mindset. In Italy we have a very bad habit of not having a debate about method and theory, but with this workshop we’re clearly building a place open for discussion.

We are well distributed geographically (from many regions of Italy) and chronological/disciplinary (from prehistoric to medieval archaeology, both excavation and landscape archaeologists). Despite this variability, there are some strong groups that are references for the whole community. I firmly believe that the University of Foggia should be listed among these groups since now. Even more interestingly, there are new groups of people that look very promising for their novel approach (I am glad to see that even my department could now be listed here). The ArcheoFOSS workshop is already acting as an incubator for innovation, and in the future we will see more of that, because of the large number of young researchers involved, the friendly and encouraging environment that is perhaps even more interesting than “open archaeology” for Italian academia. Or maybe it’s just part of the “open archaeology” agenda.

Free software works. It works from a technical perspective, obviously, but also from a social one. We have been learning its limits, its potential and the ways to improve it and share it. There’s a political vein in free software, and it’s so well combined with the need for a new way of doing research in archaeology. On the technical side, I am more and more excited about how creativity is encouraged, instead of being pre-ordered. We are doing humanities – it would be so silly to lose our creativity (also when it goes towards chaos and anarchy), in the name of a pseudo-scientific strictness born out of a great misunderstanding. We already won one bet since the early 2000, but now we can play with something even more important: not just sharing software and methods, but sharing knowledge. This is our target for 2020, and what we are going to do for the next decade.

Lastly, we’re learning how to act in the real world, and not just discuss among ourselves. Take for example the creation of common tools for creating catalogues. we can do that from the bottom up, with a wide perspective that is going to comprise technical standard, conservation and research needs – all as free software and open formats. grupporicerche already proposed some work in this direction last year, and we invite again all those who have developed databases for archaeological purposes to share them.

What’s missing? Of course, we have lots of areas for improvement. This is also because of the “multidimensional” approach of this initiative. Here I list some topics that I’m particularly interested in:

  • quantitative and statistical methods: let’s take back maths into archaeology through computing! This is not to say that archaeology can be reduced in numerical terms, but on the contrary to better define the complexity we are dealing with, giving the right weight to “data” (whatever that means) and developing proper archaeological ideas
  • an inter-regional and international approach, to deal with big not-so-big research themes in a collaborative way
  • encouraging the upgrade of old databases from obsolete, proprietary formats to open and free formats, ready for dissemination on the web
  • build a technological infrastructure for sharing our work, in the many forms it can take – or at least develop best practices for doing that on our own, taking accessibility and sustainability into account since day #0

More comments, insights and excerpts from the round table to follow in the next few days.

This post was originally published at iosa.it.

Longobardi? No, grazie

A Palazzo Bricherasio, Torino, fino a gennaio c’è una mostra dal titolo Longobardi.

Se avete un fine settimana libero, e magari trovate bel tempo come abbiamo trovato noi pochi giorni fa, andate a Torino. La città è sempre più bella ed è veramente un piacere girare per le vie del centro. Alla mostra, invece, non andateci. E vi spiego perché.

Continua a leggere Longobardi? No, grazie

Castori e Cultura Materiale

E per finire, che dire dei castori che costruiscono dighe modificando l’ambiente per un beneficio non immediato (cibo e tane sicure)? Hanno un progetto? Le dighe sono manufatti? Come avviene la trasmissione del sapere tecnico fra i castori?


Forse queste sono solo provocazioni prive di risposta e le dighe non sono certo fatte con le mani, ma possono divenire spunti utili per ragionare su cosa significhi l’adozione delle tecniche…

Enrico Giannichedda, Uomini e cose, 2006.

Al castello di Bogli

Oggi ho fatto una escursione al castello di Bogli. La camminata non è brevissima e la strada è interrotta in alcuni punti, rendendo il tragitto poco agevole.

Del castello non c’è praticamente niente, se non qualche cresta di muro visibile in superficie. “Castello” è forse eccessivo come denominazione, viste le piccole dimensioni penso si tratti di una torre a controllo del fondovalle. La postazione è costruita su un tratto del pendio naturalmente rialzato, e in più c’è un ampio fossato scavato artificialmente nella roccia sul versante a monte.

Qualche “genio” ha pensato bene di scavare qui un paio di buche alla ricerca di tesori, senza rendersi conto che in queste torri di avvistamento stavano disgraziati di infima classe sociale, costretti a passare interi inverni isolati, senza alcun tipo di comfort né tanto meno di oggetti preziosi.

Ho scattato un autoritratto commemorativo:


ex novo

Bella, bella bella idea davvero! Quando ho mostrato a Laura il sito ex-novo.org, lei mi ha risposto “C’è vita su Marte!”. Sì, evidentemente, c’è vita. Incredibile a credersi, ma non siamo soli in questa valle di lacrime. Sei pezzi (giornalistici, per fortuna non sono articoli di rivista scientifica), alcuni dal vago – e dimenticato – sapore di pamphlet. E una domanda, fatta ad alta voce, che dovrebbe farsi un po’ più spesso e con un po’ più di umiltà: Perché l’archeologia?.

Non è solo voglia di farsi del male, e non serve nemmeno ripetersi stancamente che l’università fa schifo, che le soprintendenze fanno schifo, che il ministero fa schifo, che le leggi fanno schifo. Si può fare di meglio, si può cercare di uscire dalla crisi (Brogiolo), a testa bassa. Ci vuole sano e critico pessimismo sul presente, e un altrettanto sano e critico ottimismo sul futuro. Ci vuole che i professori insegnino: un mestiere, un modo di pensare e di guardare al mondo, il presente e vivo – non quel che fu.

L’articolo di Azzena strappa qualche sorriso e squarcia qualche illusione (ma chi ne ha ancora?), anche se si esaurisce in se stesso senza dare adito a critiche, ed è un difetto di tanta archeologia italiana, purtroppo.

A seguire un po’ di critica su Mario Torelli, l’archeologia e il PCI.