A look at pollen data in the Old World

Since the 19th century, the study of archaeobotanical remains has been very important for combining “strictly archaeological” knowledge with environmental data. Pollen data enable assessing the introduction of certain domesticated species of plants, or the presence of other species that grow typically where humans dwell. Not all pollen data come from archaeological fieldwork, but the relationship among the two sets is strong enough to take an interested look at pollen data worldwide, their availability and most importantly their openness, for which we follow the Open Knowledge Definition.

The starting point for finding pollen data is the NOAA website.

The Global Pollen Database hosted by the NOAA is a good starting point, but apparently its coverage is quite limited outside the US. Furthermore, data from 2005 onwards aren’t available via FTP in simple documented formats, but are instead downloadable as Access databases from another external website. Defining MS Access databases as a Bad Choice™ for data exchange is perhaps an euphemism.

Unfortunately, a large number of databases covering single continents or smaller regions is growing, and the approaches to data dissemination show marked differences.


For both North and South America, you can get data from more than one thousand sites directly via FTP. There are no explicit terms of use. Usually, data retrieved from federal agencies are public domain data.

The README document only states NOTE: PLEASE CITE ORIGINAL REFERENCES WHEN USING THIS DATA!!!!!. Fair enough, the requirement for attribution is certainly compatible with the Open Knowledge Definition.


From the GPD website we can easily reach the European Pollen Database, that is found at another website tough (and things can be even more confusing, provided that the NOAA website has some dead links).

You can download EPD data in PostgreSQL dump format (one file for each table, with a separate SQL script create_epd_db.sql). Data in the EPD can be restricted or unrestricted. That’s fine, let’s see how many unrestricted datasets there are. Following the database documentation, the P_ENTITY table contains the use status of each dataset:

steko@gibreel:~/epd-postgres-distribution-20100531$ cat p_entity.dump \
 | awk -F "t" {' print $5 '} | sort | uniq -c 
 154 R 
 1092 U

which is pretty good because almost 88% of them are unrestricted (NB I write most of my programs in Python but I love one liners that involve awksort and uniq). We could easily create an “unrestricted” subset and make it available for easy download to all those who don’t want to mess up with restricted data.

But what do “unrestricted” mean for EPD data? Let’s take a more careful look (emphasis mine):

  1. Data will be classified as restricted or unrestricted. All data will be available in the EPD, although restricted data can be used only as provided below.
  2. Unrestricted data are available for all uses, and are included in the EPD on various electronic sites.
  3. Restricted data may be used only by permission of the data originator. Appropriate and ethical use of restricted data is the responsibility of the data user.
  4. Restrictions on data will expire three years after they are submitted to the EPD. Just prior to the time of expiration, the data originator will be contacted by the EPD database manager with a reminder of the pending change. The originator may extend restricted status for further periods of three years by so informing the EPD each time a three-year period expires.

Sounds quite good, doesn’t it? “for all uses” is reassuring and the short time limit is a good trade off. The horror comes a few paragraphs below with the following scary details:

  1. The data are available only to non-profit-making organizations and for research.

Profit-making organizations may use the data, even for legitimate uses, only with the written consent of the EPD Board, who will determine or negotiate the payment of any fee required.

Here the false assumption that only academia is entitled to perform research is taken for granted. And there are even more rules about the “normal ethics”: basically if you use EPD data in a publication the original data author should be listed among the authors of the work. I always thought citation and attribution were invented just for that exact purpose, but it looks like they have distinctly different approach to attribution. The EPD is even deciding what are “legitimate” uses of pollen data (I can hardly think of any possible unlegitimate use).


You write “Africa” but you read “Europe” again, because most research projects are from French and English universities. For this reason, the situation is almost the same. What is even worst is that in developing countries there are far less people or organizations that can afford buying those data, notwithstanding the fact that in regions under rapid development the study and preservation of environmental resources are of major importance.

Data are downloadable for individual sites using a search engine, in Tilia format (not ASCII unfortunately). The problems come out with the license:

The wording is almost exactly the same as for the EPD seen above:

Normal ethics pertaining to co-authorship of publications applies. The contributor should be invited to be a co-author if a user makes significant use of a single contributor’s site, or if a single contributor’s data comprise a substantial portion of a larger data set analysed, or if a contributor makes a significant contribution to the analysis of the data or to the interpretation of the results. The data will be available only to non-profit-making organisations and for research. Profit-making organisations may use the data for legitimate purposes, only with the written consent of the majority of the members of the Advisory board, who will determine or negotiate the payment of any fee required. Such payment will be credited to the APD.


As for dendrochronological data, there is a serious misunderstanding by universities and research centers of their role in society as places of research, innovation that is available for everyone. In other words, academia is a closed system producing data (at very high costs for society) that are only available inside its walls, but it’s all done with public money.

The only positive bit of the story, if any, is that these datasets are nevertheless available on the web, and their terms of use are clearly stated, no matter how restrictive. It would be just impossible to write a similar article about archaeological pottery, or zooarchaeological finds.

Appendix: Using pollen data

Pollen data are usually presented in forms of synthetic charts where both stratigraphic data and quantitative pollen data are easily readable. Each “column” of the chart stands for a species or genus. You can create this kind of visualization with free software tools.

The stratigraph package for R can be used for

plotting and analyzing paleontological and geological data distributed through through time in stratigraphic cores or sections. Includes some miscellaneous functions for handling other kinds of palaeontological and paleoecological data.

See the chart for an example of how they look like.

An example plot using the R stratigraph package
An example plot using the R stratigraph package

Developing a vocal language. Standing three miles apart.

Tonight I was walking along a country road near my house, almost in the
dark. Despite the highway that runs at less than 500 meters from there,
there was an unusual moment of silence (probably everyone else in Italy
was staring at the TV), and I suddenly realized that with that silence
it would be possible for me to hear someone crying out loud from the
Torre del Mangia — literally three miles away from there. Or viceversa,
if you like.

It’s not that different from how the muezzin is spreading his voice
and prayers. In a pre-industrial society, there is generally speaking
much more silence than now. As a consequence, you can hear voices and
sounds from far distances.

Now translate this concept in … 40,000 BP and imagine how you would
use your voice to communicate with someone else. The usual theory about
the development of human language deals with social practices like
sitting around the fire, etc. that happen while being in the same place.
That is fine, but to me it doesn’t explain everything: the same people
had to communicate also during the day, and if they were developing a
language that would fit their needs, we may suppose they used it during
hunting and catching as well. My idea is that in this way the language
that comes out is restricted by the use they made of it: if it was for
communicating from three miles away, it had to be made of distinct and
recognizable sounds. Thus, in a sense, a simpler language than what can
be used when sitting around the fire.

Following this line of reasoning, only with new habits and the
abandonment of nomadic life a more complex language would have been
developed. And, of course, this might as well imply that shepherds would
have continued to use such a language, or at least such

I’m perfectly aware that what I have written hasn’t a single link to
reality (and I don’t know anything about language), but it was certainly
more interesting than watching soccer and I had a nice walk in the dark.

Water basins are traces of extracting clay

Here around Siena there are lots of small water basins, measuring 10
meters in diameter on average. They tend to be near country houses, not
far from secondary roads.

They are used for water storage, but it’s not their original end.
Instead, they are what was left by small activities for the extraction
of clay – Siena is renowned for its “crete”.

The country house where I live is in a place once called “la Fornace”,
so it’s very likely that the basins around it were the last places where
clay was extracted before the kiln ceased to work.

Ancient Shipwrecks of the Mediterranean

Ci ho messo circa 5 giorni, ma alla fine sono riuscito a produrre un grafico accettabile del numero di relitti navali nel Mediterraneo tra V e VII secolo d.C.



Produrre un breve programma in linguaggio R è stato più difficile del previsto, considerato che è molto più simile al C (di cui non so nulla) che non a Python (l’unico vero linguaggio che posso dire di conoscere per sommi capi).

Comunque alla fine me la sono cavata e la mitica media ponderata individuale è stata domata… Da qui a farne materia di un esame universitario, è tutta un’altra storia…

Il codice per le medie ponderate individuali si trova qui.


2442 Sociologi, antropologi ed assimilati

I sociologi, antropologi ed assimilati studiano e descrivono la struttura delle società, le origini e l’evoluzione dell’umanità, l’interdipendenza fra caratteri dell’ambiente e delle attività umane e rendono accessibili la conoscenza acquistata in modo che possa servire a prendere delle decisioni politiche.

Le loro mansioni consistono:

  • a) nell’effettuare ricerche sull’origine, l’evoluzione, la struttura, le caratteristiche sociali, le modalità di organizzazione e l’interdipendenza delle società umane;
  • b) nel ricercare le origini e l’evoluzione dell’umanità con lo studio delle trasformazioni dell’ambiente fisico e delle istituzioni culturali e sociali;
  • c) nel ricostituire l’evoluzione dell’umanità con l’aiuto di reperti del passato come dimore, attrezzi, vasellame, valute, armi o oggetti scolpiti;
  • d) nel studiare le caratteristiche fisiche e climatiche di una zona o di una regione data e nel mettere in correlazione i risultati di questo studio con le attività economiche, sociali e culturali;
  • e) nell’esprimere opinioni sull’applicazione delle conoscenze acquisite per la definizione di politiche economiche e sociali applicabili a categorie di popolazione e a regioni determinate e propizie per lo sviluppo dei mercati;
  • f) nel preparare dei testi  colti  e dei rapporti;
  • g) nello svolgere le mansioni elencate;
  • h) nel coordinare altri lavoratori.

Fra le professioni che entrano in questo gruppo di base compaiono le seguenti:

  • Antropologo
  • Archeologo
  • Etnologo
  • Geografo
  • Sociologo

Fonte: http://extranet.regione.piemonte.it


Beh, il QMDAA è finito, ovviamente, e io nel frattempo non ho avuto modo di scrivere più su questo blogo.

Tutto fila dannatamente veloce, le cose da fare sono veramente troppe se uno non si ferma mai a farne una in particolare. La prossima cosa grossa sicuramente è il workshop 2007 sull’archeologia libera, a cui ho iniziato a lavorare con Gianluca, Giancarlo e Roberto. Vedremo. Intanto l’argomento su cui focalizzare la nostra attenzione è già stato scelto e si tratta della condivisione dei dati. Come Science Commons, solo in archeologia. Meno semplice di quanto si possa pensare, in realtà, considerate le resistenze accademiche di cui gode la nostra dannata disciplina (scienza? bah, la questione se lo sia o meno comincia ad annoiarmi… non so voi).

Nei giorni scorsi ho però potuto giovarmi della lettura stellare della Guida galattica dell’Autostoppista, che consiglio a tutti gli amanti del genere letterariolibro intelligente per persone intelligenti.


Da domenica sera sono a Campiglia Marittima per seguire un corso di archeologia quantitativa (dove per fortuna si fa uso massiccio di software libero come GRASS, QGIS, R). Ieri pomeriggio siamo andati al Parco Archeologico di Baratti per una escursione guidata,e qui potete trovare alcune foto… siamo tantissimi!!

Il corso è molto interessante e l’unico vero problema è che ci sono troppe cose da imparare in poco tempo.