Flickr selling prints of Creative Commons pictures: a challenge, not a problem

A few weeks ago Flickr, the most popular photo-sharing website, started offering prints of Creative Commons-licensed works in their online shop, among other photographs that were uploaded under traditional licensing terms by their authors.

In short, authors get no compensation when one of their photographs is printed and sold, but they do get a small attribution notice. It has been pointed out that this is totally allowed by the license terms, and some big names seem totally fine with the idea of getting zero pennies when their work circulates in print, with Flickr keeping any profit for themselves.

Some people seemed actually pissed off and saw this as an opportunity to jump off the Flickr wagon (perhaps towards free media sharing services like Mediagoblin, or Wikimedia Commons for truly interesting photographs). Some of us, those who have been involved in the Creative Commons movement for years now, had a sense of unease: after all, the “some rights reserved” were meant to foster creativity, reuse and remixes, not as a revenue stream for Yahoo!, a huge corporation with no known mission of promoting free culture. I’m in the latter group.

But it’s OK, and it’s not really a big deal, for at least two reasons. There are just 385 pictures on display in the Creative Commons category on the Flickr Marketplace, but you’ve got one hundred million images that are actually available for commercial use. Many are beautiful, artistic works. Some are just digital images, that happen to have been favorited (or viewed) many times. But there’s one thing in common to all items under the Creative Commons label: they were uploaded to Flickr. Flickr is not going out there on the Web, picking out the best photographs that are under a Creative Commons license, or even in the public domain, I guess they are not legally comfortable with doing that, even if the license totally allows it. In fact, the terms and conditions all Flickr users agreed to state that:


[…] you give to Yahoo the following licence(s):

  • For photos, graphics, audio or video you submit or make available on publicly accessible areas of the Yahoo Services, you give to Yahoo the worldwide, royalty-free and non-exclusive licence to use, distribute, reproduce, adapt, publish, translate, create derivative works from, publicly perform and publicly display the User Content on the Yahoo Services

That’s not much different from a Creative Commons Attribution license, albeit much shorter and EULA-like.

In my opinion, until the day we see Flickr selling prints of works that were not uploaded to their service, this is not bad news for creators. Some users feel screwed, but I wouldn’t be outraged, not before seeing how many homes and offices get their walls covered in CC art.

The second reason why I’m a bit worried about the reaction to what is happening is that, uhm, anyone could have been doing this for years, taking CC-licensed stuff from Flickr, and arguably at lower prices (17.40 $ for a 8″ x 10″ canvas print?). Again, nobody did, at least not on a large scale. Probably this is because few people feel comfortable commercially appropriating legally available content ‒ those who don’t care do this stuff illegally anyway, Creative Commons or not. In the end, I think we’re looking at a big challenge: can we make the Commons work well for both creators and users, without creators feeling betrayed?

Featured image is Combustion [Explored!] by Emilio Kuffer.

Yet another failure for cultural heritage data in Italy

This short informative piece is written in English because I think it will be useful for anyone working on cultural heritage data, not just in Italy.

A few days ago the Istituto Centrale per il Catalogo e la Documentazione published an internal document for all offices in the Ministry of Culture (actual name is longer, but you got it), announcing imminent changes and the beginning of a process for publishing all records about cultural heritage items (I have no idea on the exact size but we’re in the millions of records). In short, all records will be publicly available, and there will be at least one image for each record ‒ you’ll get anything from small pieces of prehistoric flint to renaissance masterpieces, and more. That’s a huge step and we can only be happy to see this, the result of decades of cataloguing, years of digital archiving and … some lobbying and campaigning too. Do you remember Beni Culturali Aperti? The response from the ICCD had been lukewarm at best, basically arguing that the new strong requirements for open government data from article 68 of the Codice dell’Amministrazione Digitale did not apply at all to cultural heritage data. So nobody was optimistic about the developments to follow.

And unfortunately pessimism was justified. Here’s an excerpt from the document published last week:

Brano della nota prot. n. 2975  del 17/11/2014 dell'Istituto Centrale per il Catalogo e la Documentazione
Nota prot. n. 2975 del 17/11/2014 dell’Istituto Centrale per il Catalogo e la Documentazione

relevant sentence:

Le schede di catalogo verranno rese disponibili con la licenza Creative Commons CC BY-NC-SA

that would be

Catalog records will be made available under the Creative Commons CC BY-NC-SA license

And that was the (small) failure. CC BY-NC-SA is not an open license. The license makes commercial (= paid!) work with such data impossible or very difficult, at a time when the cultural heritage private sector could just benefit from full access to this massive dataset, with zero losses for the gatekeepers. At the same time when we have certified that open licenses are becoming more and more widespread and non-open licenses like BY-NC-SA are used less and less because they’re incompatible with anything else and inhibit reuse, someone decided that it was the right choice, against all internationa, European and national recommendations and regulations. We can only hope that a better choice will be made in the near future, but the record isn’t very encouraging, to be honest.

Archaeology in the Mediterranean: I don’t wanna drown in cold water

This post is the second half of the one I had prepared for this year’s Day of Archaeology (Archaeology in the Mediterranean: do not drown if you can). For an appropriately timed mistake, I only managed to post the first, more relaxed half of the text. Enjoy this rant.

Written and unwritten rules dictate what is correct, acceptable and ultimately recognised by your peers: it is never entirely clear who sets research agendas for entire disciplines, but ‒ just to be more specific ‒ I feel increasingly stifled by the “trade networks” research framework that has dominated Late Roman pottery studies for the past 40 years now. Invariably, at any dig site, there will be from 1 to 100,000 potsherds from which we should infer that said site was part of the Mediterranean trade network. We are all experts about our “own” material, that is, the finds that we study, and apart from a few genuine gurus most of us have a hard time recognising where one pot was made, what is the exact chronology of one amphora, and so on. But those gurus, as leaders, contribute to setting in stone what should be a temporary convention as to what terminology, chronology and to a larger extent what approach is appropriate. I can hear the drums of impostor syndrome rolling in the back.

I don’t want to drown in this sea of small ceramic sherds and imaginary trade networks, rather I really need to spend time understanding why those broken cooking pots ended up exactly where we found them, in a certain room used by a limited number of people, in that stratigraphical position.

At the same time, I’m depressingly frustrated by how mechanical and repetitive the identification of ceramic finds can be: look at shape, compare with known corpora, look at fabric, compare with more or less known corpora. If any, look at decoration, lather, rinse, repeat. My other self, the one writing open source computer programs, wonders if all of this could not be done by a mildly intelligent algorithm, liberating thousands of human neurons for more creative research. But this is heresy. We collectively do our research and dissemination as we are told, with sometimes ridiculously detailed guidelines for the preparation of digital illustrations that end up printed either on paper or on PDF (which is the same thing). Our collective knowledge is the result of a lot of work that we need to respect, acknowledge, study and pass on to the next generation.

At the end of the obligations telling you how to study your material, how to publish it, and ultimately how to think about it, you could just be happy and let yourself comfortably drown into the next grant application. Don’t do that. Do more. Follow your crazy idea and sail the winds of Mediterranean archaeology.

Migrating a database from FileMaker Pro to SQLite

FileMaker Pro is almost certainly one of the least interoperable cases of proprietary database management software. As such, it is the worst choice you could make to manage your data as far as digital preservation goes. Period.

There is data to be salvaged out there. If you find data that you care about, you’ll want to migrate content from FileMaker Pro to another database.

Things are not necessarily easy. If you work primarily on GNU/Linux (Debian in my case) it may be even more difficult. It can be done, but not without proprietary software.

Installing FileMaker Pro

You can get a 30-days free trial of FileMaker Pro (FMP): you may need to register on their website with your e-mail address. It is a regular .exe installer.

Please note that, while other shareware programs exist to extract data from a FileMaker Pro database, there is absolutely no way to do it using free and open source software, and as far as I know nobody has ever done any reverse engineering on the format. Do not waste your time trying to open the file with another program: the FileMaker Pro trial is your best choice. Also, do not waste your time and money buying another proprietary software to “convert” or “export” your data: FileMaker Pro can be used to extract all of your data, including any images that were stored in the database file, and the trial comes at no cost if you already have one of the supported operating systems. After all, it is proprietary so it is appropriate to use the native proprietary program to open it.

Extracting data

Alphanumeric data are rather simple to extract, and result in CSV files that can easily be manipulated and imported by any program. Be aware however that you have no way to export your database schema, that is the relationships between the various tables. If you only have one table, you should not have used a database in the first place, but that’s another story.

Make sure FMP is installed correctly. Open the database file.

  1. Go to the menu and choose FileExport records
  2. Give a name for the exported file and make sure you have selected Comma-separated values (CSV) as export format
  3. A dialog will appear asking you to select which fields you want to export.
    • Make sure that “UTF-8″ is selected at the bottom
    • On the left you see the available fields, on the right the ones you have chosen
    • Click Move all ‒ you should now see the fields listed at the bottom right. If you get an error complaining about Container fields, do not worry, we are going to rescue your images later (see below)
    • Export and your file is saved.
  4. Take a look at the CSV file you just saved. It should open in Notepad or any other basic text editor. A spreadsheet will work as well and may help checking for errors in the file, especially encoding errors (accented letters, special characters, newlines inside text fields, etc.)
  5. Repeat the above steps for each table. You can choose the table to export from using the drop-down list in the upper left of the export dialog.

Extracting images from a Container Field in FileMaker Pro

If, for some unfortunate reason, image files have been stored in the same database file using a Container Field, the normal export procedure will not work and you will need to follow a slightly more complex procedure:

 Go to Record/Request/Page [First]
      * Set Variable [$filePath; Value: Get ( DesktopPath ) MyPics::Description & ".jpg"]
    ** Export Field Contents [MyPics::Picture; “$filePath”]  
        Go to Record/Request/Page [Next; Exit after last]
End Loop
*Set Variable Options: Name: $filePath
 Value: Use one of the following formulas
Mac: Get ( DesktopPath ) & MyPics::Description & ".jpg"

 Windows: "filewin:"& Get ( DesktopPath ) & MyPics::Description & ".jpg"

 Mac and Windows: Choose ( Abs ( Get ( SystemPlatform ) ) -1 ;
     /*MAC OS X*/
      Get ( DesktopPath ) & MyPics::Description & ".jpg"
      "filewin:"& Get ( DesktopPath ) & MyPics::Description & ".jpg"

 Repetition: 1

 **Export Field Contents Options:
 Specify target field: Picture
 Specify output File: $filePath

Migrating to SQLite

SQLite is a lightweight, file-based real database (i.e. based on actual SQL). You can import CSV data in SQLite very easily, if your starting data are “clean”. If not, you may want to look for alternatives.

Appendix: if you are on GNU/Linux

If you are on GNU/Linux, there is no way to perform the above procedure, and you will need a working copy of Microsoft Windows. The best solution is to use VirtualBox. In my case, I obtained a copy of Microsoft Windows XP and a legal serial number from my university IT department. The advantage of using VirtualBox is that you can erase FileMaker Pro and Windows once you’re done with the migration, and stay clear of proprietary software.

Let’s see how to obtain a working virtual environment:

  1. Install VirtualBox. On Debian it’s a matter of sudo apt-get install virtualbox virtualbox-guest-additions-iso
  2. Start VirtualBox and create a new machine. You will probably need to do sudo modprobe vboxdrv (in a terminal) if you get an error message at this stage
  3. Install Windows in your VirtualBox. This is the standard Windows XP install and it will take some time. Go grab some tea.
  4. … some (virtual!) reboots later …
  5. Once Windows is installed, make sure you install the VirtualBox Guest Additions from the Devices menu of the main window. Guest Additions are needed to transfer data between your regular directories and the virtual install.
  6. Install the FMP trial and reboot again as needed. Then you can open the database file you need to convert and follow the steps described above

Low back pain

I have been going through an acute event of low back pain a few months ago (the so-called colpo della strega), and I’m slowly recovering to normality ‒ still no lifting of heavy weights for me. It hurt me a lot, suddenly, but in retrospect it was not a surprise, because I had been having mild pain for months now and I know since 2010 that there’s a beginning of slipped disc at L5-S1 in my spine.

MRI scanI know this is very common, but I cannot help thinking about the consequences of this health issue as an archaeologist. I don’t call myself a field archaeologist now, but I have been spending 2-3 months a year in the field for several years (2003-2010) and in 2009 I did that as a profession for a while (most of the other fieldwork was done with universities). Luckily enough, but without any actual plan, in 2009 I started accumulating some experience with ceramics and I took part in several campaigns doing that instead of digging. I like digging ‒ I know very well that I am far from being good at it, because I think too much and I’m not quite a fast “identify-clean-record-dig” type ‒ but I still like it a lot. And, the less I practice archaeological digging (10 sparse days last year), the more I idealise it as the real archaeology.

Obviously, the idea that archaeology is restricted to fieldwork is wrong, but I’m only fortunate that I have a job and I am not forced to prove this truth.

It hurts.

I tracciati ICCD con la shell Unix e Python

Il trasferimento delle schede di catalogo ICCD avviene usando file di testo chiamati tracciati. Può essere utile sapere come trattare questi tracciati con gli strumenti più semplici a disposizione in un ambiente Unix.

L’argomento non è particolarmente eccitante e fortunatamente sembra destinato a diventare presto obsoleto con l’introduzione di un formato XML (peraltro, già obsoleto), ma qualche appunto tecnico può essere utile. I comandi della shell Unix permettono di ricavare informazioni di base e poi possiamo procedere con passaggi più elaborati in Python.

Nel caso di un singolo file di tracciato, per contare il numero di schede contenute possiamo fare riferimento al paragrafo CD, obbligatorio, che introduce ogni scheda:

$ cat tracciato.trc | grep -c -e "^CD:"

La parte principale del comando è l’espressione regolare ^CD: che unita all’opzione -c di grep permette di contare il numero di schede (nel nostro caso, 1527). L’espressione regolare è necessaria per evitare di includere nei risultati anche altre parti del tracciato, inclusi altri campi (ad esempio DSCD), come mostrato nell’esempio successivo:

$ cat tracciato.trc | grep -c "CD:"

Ricaviamo un elenco dei numeri NCTN delle schede, usando un’espressione regolare analoga e il comando cut, indicando lo spazio come carattere delimitatore e selezionando solo il secondo campo:

cat tracciato.trc | grep -e "^NCTN:" | 
cut -d " " -f 2

La lista prodotta dal comando sarà piuttosto lunga ed è meglio salvarla in un file a parte, usando una semplice pipe:

cat tracciato.trc | grep -e "^NCTN:" | cut -d " " -f 2 > nctn.txt

È importante controllare la correttezza dei dati. Secondo la normativa ICCD (sia la vecchia versione 2.00 del 1992, sia la nuova versione 3.00) il campo NCTN deve essere composto da 8 cifre, quindi al numero vero e proprio devono essere aggiunti zeri. In questo modo il numero 13887 diventa nel campo NCTN 00013887 e così via.

Di fatto, dal punto di vista informatico, 00013887 non è un numero intero (che sarebbe invariabilmente memorizzato come 13887) ma una stringa di 8 caratteri in cui i singoli caratteri sono numerici (una considerazione analoga vale, ad esempio, per i CAP). Purtroppo spesso accade che questa prescrizione rimanga disattesa, perché il campo NCTN viene erroneamente interpretato come un campo numerico, ad esempio nella creazione di un database.

Nel caso a cui stavo lavorando nei giorni scorsi effettivamente abbiamo un po’ di tutto: 5 cifre, 7 cifre, 8 cifre.

Nella seconda parte di questo post ci trasferiamo in una sessione IPyhon (in inglese) dove vediamo come elaborare ulteriormente i dati che abbiamo salvato nel file nctn.txt per ottenere un elenco leggibile dei numeri NCTN: prosegui la lettura su nbviewer.

Ricordo infine che una introduzione molto leggibile alla catalogazione ICCD è quella di Diego Gnesi Bartolani, che si può scaricare in PDF dal suo sito web.


#MuseumWeek: riflessioni a freddo

Il mese scorso abbiamo celebrato la prima #MuseumWeek, una settimana di alta visibilità social per i musei di alcuni paesi, tra cui l’Italia ‒ tutta su Twitter. Se ne è parlato molto tra gli addetti ai lavori e mi sembra che tutti siano rimasti contenti. Anche io ho partecipato alla #MuseumWeek, occupandomi dell’account @archeoliguria della Soprintendenza per i Beni Archeologici della Liguria (dove lavoro). Siamo quasi in fondo alla lista dei musei italiani che hanno aderito ufficialmente.

Già in corso d’opera c’erano state raccolte di dubbi, più sullo svolgimento pratico che sull’iniziativa in generale, come ha scritto Linkiesta. Qui vorrei raccogliere, un po’ per riflessione un po’ per mugugno, alcuni elementi di debolezza che ho colto durante la settimana e che mi sembrano importanti per tutte le prossime volte che parteciperemo a una iniziativa simile.

Anzitutto, i tempi. La #MuseumWeek si è svolta dal 24 al 30 marzo 2014, e la notizia è comparsa sul blog di Twitter il 10 marzo, cioè due settimane prima. Per essere una iniziativa a cui dovevano partecipare centinaia di istituzioni in mezza Europa, il preavviso è stato scarso. Ma il motivo è presto detto: è stata organizzata completamente top-down, con date e tempistiche fissate in anticipo, fino alla definizione degli argomenti da trattare. Mi direte: è il bello del social, l’improvvisazione, e tu sei un bradipo! Vi rispondo che con 5 musei da coordinare distribuiti in tutta la regione, con una infrastruttura tecnologica non proprio all’avanguardia non è stato facilei bradipi quasi sempre fanno 4 o 5 cose diverse contemporaneamente e non dedicano l’intera giornata ai social media. I bradipi più bradipi sono stati quelli che già durante la #MuseumWeek mandavano e-mail chiedendo di inviare via e-mail dei messaggi di 140 caratteri che poi avrebbero provveduto a far pubblicare su un certo account istituzionale molto seguito…

A chi si iscriveva mandando una e-mail a veniva inviata da Massimiliano D’Ostilio (della società TTA) una breve presentazione in cui si scoprivano i temi del giorno e le modalità previste di interazione. Si tratta secondo me di un documento banale, di taglio puramente promozionale, in cui sono stati elencati solo benefici e nessun rischio (l’ABC della progettazione, credo). Tanto per fare due esempi concreti: i musei italiani hanno corso il rischio di essere insultati per i noti problemi di gestione del patrimonio (durante quella settimana non credo sia successo, ma capita), mentre gli utenti hanno rischiato di trovarsi la timeline inondata di contenuti non sempre interessanti (e credo che in alcuni giorni questo sia successo davvero).

La scaletta delle tematiche era molto adatta a musei grandi e tradizionali, ruotando in modo determinante intorno alle collezioni, le opere e i quadri, insomma non il massimo per piccoli musei e aree archeologiche. In effetti se rileggete il post di presentazione questa preferenza è chiara, dal calibro dei musei citati (e speriamo coinvolti nell’ideazione, almeno loro). Sopra ho scritto che questi sono appunti per la prossima volta che parteciperemo proprio perché la #MuseumWeek, bella ed entusiasmante, era una offerta da prendere o lasciare ‒ anche se qualche museo l’ha interpretata a modo suo.

Ma insomma, tutti questi sono inutili mugugni perché la #MuseumWeek è stata una figata pazzesca e siamo stati nei trending topic per una settimana intera e abbiamo moltiplicato i follower e abbiamo avuto n-mila interazioni… forse. Siamo stati nei trending topic ma eravamo veramente tanti quindi era abbastanza scontato, oltre al fatto che Twitter potrebbe aver deciso di mettere la #MuseumWeek nei trending topic. Abbiamo moltiplicato i follower, indubbiamente, e per molte realtà piccole e agli esordi social questo è stato importante: @archeoliguria è passata nell’arco della settimana da 200 a 300 follower circa. Abbiamo avuto davvero molte interazioni, che non mi sono messo a contare, ma sulla qualità di queste interazioni ho qualche dubbio: anzitutto c’è stato un fortissimo senso di cooperazione tra le realtà medio-piccole e ci siamo fatti coraggio a vicenda, ritwittando i messaggi degli altri musei, commentando e rispondendo alle domande del giorno tra di noi, mentre il pubblico “esterno” ha interagito meno (potrei dire molto meno, se avessi dei numeri). Giornate come #AskTheCurator e #GetCreative hanno mostrato come il pubblico, anche quando si entusiasma, è abbastanza pigro o semplicemente non è abituato a parlare con il museo ‒ anche perché c’è una ampia fetta di popolazione “esperta” di archeologi, storici dell’arte, guide turistiche che forse ha partecipato più per senso di appartenenza che per curiosità verso qualcosa di nuovo. Generalizzando, mi dispiace invece notare come i grandi musei e poli museali abbiano scelto ancora una volta di rimanere in modalità broadcasting, sfruttando tutta la loro visibilità per mostrarsi al pubblico con cui non hanno minimamente interagito. Delle vere superstar. A proposito di superstar, è rimasto un po’ in sordina anche l’esercito sempre più numeroso dei personaggi parlanti che hanno iniziato a popolare Twitter dopo il successo dei due tamarri bronzi di Riace (a proposito: che fine avranno fatto?)

Tutto da buttare? Assolutamente no. Ma non è tutto #oro quel che #luccica e non credo, come ha invece scritto @insopportabile, che Twitter abbia assolto al ruolo di un ministero della cultura organizzando la #MuseumWeek. Il successo lo abbiamo fatto noi ma credo che ci voglia ben altro per trasferire il successo social ai musei in carne e ossa: non dobbiamo vendere niente, ma raccontare tantissimo.

For Aaron Swartz

I didn’t know Aaron Swartz. And yet his tragic end touched me a lot. I saw some friends and colleagues react strongly in the weeks following his death, as strong as you can be in front of a tragedy at least.

Aaron was only a few years younger than me. He had achieved so much, in so little time. He was an hero. He is an hero.

I was deeply touched and I am still sad especially because I do the kind of things that Aaron did, although on a much smaller scale. I am not an hero, of course.

In 2008 I started collecting air pollution data from a local government office. Everyday, one PDF. Later I started writing web scrapers for this dataset and others. I never really got to the point where the data could be of any use. Most of this was done out of frustration.

In 2009 I got a PhD scholarship from my university and with that came a VPN account that I could use from anywhere to access digital resources for which the university had a subscription (including part of JSTOR). I gave those credentials to several friends who had not the same privilege I had, and I didn’t worry, even though those were the same credentials used for my mailbox. You cannot even try to move your first steps into an academic career without access to this kind of resources.

I regularly share digital copies of prints, especially the incredibly awful copies made by photographing a book. Every single person I have been working with in the last three years does this regularly: scans, photographs, “pirate” PDFs or even pre-prints, because everything will do when you need a piece of “global” knowledge for your work. I have to break the rules so regularly that it feels normal. And yet, I don’t feel guilty for any of that, except for the fact that I didn’t take the next step with access to knowledge, giving to everyone and not just to a small circle of people.

Sometimes between 2008 and 2009 I helped making a copy of the entire archive of BIBAR (Biblioteca di Archeologia, mostly about medieval archaeology), hosted at my university. That’s more than 2 GB of academic papers, the same kind of content that Aaron took from JSTOR. Years later, that copy lives as a Torrent download, out of any restriction. It’s a small #pdftribute for Aaron.

An Early Christian basilica in Turin (Torino)

La reports that recent excavations in Turin (Torino) have brought to light an Early Christian basilica. That is the third Early Christian complex found in late Roman Augusta Taurinorum. Most interestingly, it is much further away from the city walls (in the upper part of the map shown here), as Egle Micheletto points out in the interview. While the size of the Early Christian city is not comparable to e.g. Mediolanum, it is still impressive!

Early Christian complexes in Turin (blue stars), with the Roman city wall in grey. The Po river (Padus/Eridanus) is in the lower right corner.

The archaeological remains will be visible to the public only when the construction project is finished in 2016, according to the article.

Is this a satisfactory account of archaeology in the public interest?

Introducing the Nobel Prize in Literature Index

I have been thinking about my own ignorance in culture recently, questioning my self-perception as an intellectual.

A good example of this is how many Nobel laureates in Literature I have never bothered to read at all. In some cases books were only assignments in highschool, but I think that counts as education after all.

You can share my feeling of inadequate ignorance, too. Open the list of Nobel laureates in literature. Count how many authors you have read, only complete works or substantial parts are valid. The resulting number is your NoPLi index.

Mine is a shameful 8.