Last week a tweet from the always brilliant Jolene Smith inspired me to write down my thughts and ideas about numbering boxes of archaeological finds. For me, this includes also thinking about the physical labelling, and barcodes.
Question for people who organize things for their job. I'm giving a few thousand boxes unique IDs. should I go random or sequential?
— Jolene Smith (@aejolene) March 3, 2017
The question Jolene asks is: should I use sequential or random numbering? To which many answered: use sequential numbering, because it bears significance and can help detecting problems like missing items, duplicates, etc. Furthermore, if the number of items you need to number is small (say, a few thousands), sequential numbering is much more readable than a random sequence. Like many other archaeologists faced with managing boxes of items, I have chosen to use sequential numbering in the past. With 200 boxes and counting, labels were easily generated and each box had an associated web page listing the content, with a QR code providing a handy link from the physical label to the digital record. This numbering system was put in place during 3 years of fieldwork in Gortyna and I can say that I learned a few things in the process. The most important thing is that it’s very rare to start from scratch with the correct approach: boxes were labeled with a description of their content for 10 years before I adopted the numbering system pictured here. This sometimes resulted in absurdly long labels, easily at risk of being damaged, difficult to search since no digital recording was made. I decided a numbering system was needed because it was difficult to look for specific items, after I had digitised all labels with their position in the storage building (this often implied the need to number shelves, corridors, etc.). The next logical thing was therefore to decouple the labels from the content listing ‒ any digital tool was good here, even a spreadsheet. Decoupling box number from description of content allowed to manage the not-so-rare case of items moved from one box to another (after conservation, or because a single stratigraphic context was excavated in multiple steps, or because a fragile item needs more space …), and the other frequent case of data that is augmented progressively (at first, you put finds from stratigraphic unit 324 in it, then you add 4.5 kg of Byzantine amphorae, 78 sherds of cooking jars, etc.). Since we already had a wiki as our knowledge base, it made sense to use that, creating a page for each box and linking from the page of the stratigraphic unit or that of the single item to the box page (this is done with Semantic MediaWiki, but it doesn’t matter). Having a URL for each box I could put a QR code on labels: the updated information about the box content was in one place (the wiki) and could be reached either via QR code or by manually looking up the box number. I don’t remember the details of my reasoning at the time, but I’m happy I didn’t choose to store the description directly inside the QR code ‒ so that scanning the barcode would immediately show a textual description instead of redirecting to the wiki ‒ because that would require changing the QR code on each update (highly impractical), and still leave the information unsearchable. All this is properly documented and nothing is left implicit. Sometimes you will need to use larger boxes, or smaller ones, or have some items so big that they can’t be stored inside any container: you can still treat all of these cases as conceptual boxes, number and label them, give them URLs.
There are limitations in the numbering/labelling system described above. The worst limitation is that in the same building (sometimes on the same shelf) there are boxes from other excavation projects that don’t follow this system at all, and either have a separate numbering sequence or no numbering at all, hence the “namespacing” of labels with the GQB prefix, so that the box is effectively called GQB 138 and not 138. I think an efficient numbering system would be one that is applied at least to the scale of one storage building, but why stop there?
Turning back to the initial question, what kind of numbering should we use? When I started working at the Soprintendenza in Liguria, I was faced with the result of no less than 70 years of work, first in Ventimiglia and then in Genoa. In Ventimiglia, each excavation area got its “namespace” (like T for the Roman theater) and then a sequential numbering of finds (leading to items identified as T56789) but a single continuous sequential sequence for the numbering of boxes in the main storage building. A second, newer building was unfortunately assigned a separate sequence starting again from 1 (and insufficient namespacing). In Genoa, I found almost no numbering at all, despite (or perhaps, because of) the huge number of unrelated excavations that contributed to a massive amount of boxes. Across the region, there are some 50 other buildings, large and small, with boxes that should be recorded and accounted for by the Soprintendenza (especially since most archaeological finds are State property in Italy). Some buildings have a numbering sequence, most have paper registries and nothing else. A sequential numbering sequence seems transparent (and allows some neat tricks like the German tanks problem), since you could potentially have an ordered list and look up each number manually, which you can’t do easily with a random number. You also get the impression of being able to track gaps in a sequence (yes, I do look for gaps in numeric sequences all the time), thus spotting any missing item. Unfortunately, I have been bitten too many times by sequential numbers that turned out to have horrible bis suffixes, or that were only applied to “standard” boxes leaving out oversized items.
On the other hand, the advantages of random numbering seem to increase linearly with the number of separate facilities ‒ I could replace random with non-transparent to better explain the concept. A good way to look at the problem is perhaps to ask whether numbering boxes is done as part of a bookkeeping activity that has its roots in paper registries, or it is functional to the logistics of managing cultural heritage items in a modern and efficient way.
Logistics. Do FedEx, UPS, Amazon employees care what number sequence they use to track items? Does the cashier at the supermarket care whether the EAN barcode on your shopping items is sequential? I don’t know, but I do know that they have a very efficient system in place, in which human operators are never required to actually read numerical IDs (but humans are still capable of checking whether the number on the screen is the same as the one printed on the label). There are many types of barcode used to track items, both 1D and 2D, all with their pros and cons. I also know of some successful experiments with RFID for archaeological storage boxes (in the beautiful depots at Ostia, for example), that can record numbers up to 38 digits.
Based on all the reflections of the past years, my idea for a region- or state-wide numbering+labeling system is as follows (in RFC-style wording):
- it MUST use a barcode as the primary means of reading the numerical ID from the box label
- the label MUST contain both the barcode and the barcode content as human-readable text
- it SHOULD use a random numeric sequence
- it MUST use a fixed-length string of numbers
- it MUST avoid the use of any suffixes like a, b, bis
In practice, I would like to use UUID4 together with a barcode.
A UUID4 looks like this:
1b08bcde-830f-4afd-bdef-18ba918a1b32. It is the UUID version of a random number, it can be generated rather easily, works well with barcodes and has a collision probability that is compatible with the scale I’m concerned with ‒ incidentally I think it’s lower than the probability of human error in assigning a number or writing it down with a pencil or a keyboard. The label will contain the UUID string as text, and the barcode. There will be no explicit URL in the barcode, and any direct link to a data management system will be handled by the same application used to read the barcode (that is, a mobile app with an embedded barcode reader). The data management system will use UUID as part of the URL associated with each box. You can prepare labels beforehand and apply them to boxes afterwards, recording all the UUIDs as you attach the labels to the boxes. It doesn’t sound straightforward, but in practice it is.
And since we’re deep down the rabbit hole, why stop at the boxes? Let’s recall some of the issues that I described non-linearly above:
- the content of boxes is not immutable: one day item X is in box Y, the next day it gets moved to box Z
- the location of boxes is not immutable: one day box Y is in room A of building B, the next day it gets moved to room C of building D
- both #1 and #2 can and will occur in bulk, not only as discrete events
The same UUIDs can be applied in both directions in order to describe the location of each item in a large bottom-up tree structure (add as many levels as you see fit, such as shelf rows and columns):
item X → box Y → shelf Z → room A → building B
b68e3e61-e0e7-45eb-882d-d98b4c28ff31 → 3ef5237e-f837-4266-9d85-e08d0a9f4751 3ef5237e-f837-4266-9d85-e08d0a9f4751 → 77372e8c-936f-42cf-ac95-beafb84de0a4 77372e8c-936f-42cf-ac95-beafb84de0a4 → e895f660-3ddf-49dd-90ca-e390e5e8d41c e895f660-3ddf-49dd-90ca-e390e5e8d41c → 9507dc46-8569-43f0-b194-42601eb0b323
Now imagine adding a second item W to the same box: since the data for item Y was complete, one just needs to fill one container relationship:
b67a3427-b5ef-4f79-b837-34adf389834f → 3ef5237e-f837-4266-9d85-e08d0a9f4751
and since we would have already built our hypothetical data management system, this data is filled into the system just by scanning two barcodes on a mobile device that will sync as soon as a connection is available. Moving one box to another shelf is again a single operation, despite many items actually moved, because the leaves and branches of the data tree are naïve and only know about their parents and children, but know nothing about grandparents and siblings.
There are a few more technical details about data structures needed to have a decent proof of concept, but I already wrote down too many words that are tangential to the initial question of how to number boxes.