Using Harris Matrix Data Package with the stratigraphr package

I am working on the Harris Matrix Data Package specification with the aim of decoupling it from my own “hmdp” tool. An important step towards the adoption of a data format is to have more software implementations. With this in mind, I present a procedure to import, analyze and plot a Harris Matrix data package in R, with the experimental stratigraphr library maintained by Joe Roe.

stratigraphr is a tidy framework for working with archaeological stratigraphy and chronology in R. It includes tools for reading, analysing, and visualising stratigraphies (Harris matrices) and sequences as directed graphs

https://stratigraphr.joeroe.io/

Let’s go!

Installing the needed libraries

Apart from the common tidyverse libraries, we need to install the stratigraphr and frictionless packages.

Please follow the installation instructions on their respective websites:

Loading packages

library(stratigraphr)
library("ggraph")
library(frictionless)
library(purrr)
library(tidyr)
library(dplyr)

Loading datasets

We load a Harris Matrix Data Package describing figure 12 from E.C. Harris’s manual Principles of archaeological stratigraphy, as modified by T.S. Dye. Please note that we are loading a package straight from a URL, and this could be an institutional repository like Zenodo or OSF.

fig12 <- frictionless::read_package("https://codeberg.org/steko/harris-matrix-data-package/raw/branch/main/fig12/datapackage.json")
contexts <- frictionless::read_resource(fig12, "contexts")
observations <- frictionless::read_resource(fig12, "observations")

Converting Harris Matrix Data Package to the stratigraphr format

Now the observations tibble contains our initial data that must be converted to the stratigraphr native format. We modify it in place.

observations <- observations %>% pivot_wider(names_from = url, values_from = older, values_fn = list)
observations <- rename(observations, context = younger)
observations <- rename(observations, below = `NA`)

Warning! The names_from = url parameter is a bit of a hack, and only works because the values in that column are all NULL.

The first approach is to use the same code from the stratigraphr documentation, but it returns an error. Directly loading the data in stratigraphr only works if all contexts exist in the context column of the observations table (the following code chunk is not going to work, shown here for demonstration):

h12_graph <- stratigraph(observations, "context", "below", "below")
ggraph(h12_graph, layout = "sugiyama") +
  geom_edge_elbow() +
  geom_node_label(aes(label = context), label.r = unit(0, "mm")) +
  theme_graph()

It seems like the context column doesn’t actually contain all contexts, which makes sense because there is no duplication of relationships in the Harris Matrix Data Package format ‒ it’s a tidy format! We can easily work around this by loading the full contexts table.

edges <- stratigraphr::strat_connect(observations[["context"]], observations[["below"]], "below")
str(edges)
'data.frame':   26 obs. of  2 variables:
 $ to  : chr  "2" "11" "12" "13" ...
 $ from: chr  "1" "1" "1" "1" ...
h12_graph <- tidygraph::tbl_graph(nodes = contexts, edges = edges, node_key = "label", directed = TRUE)

Ready to plot!

So far so good. Let’s try plotting the Harris Matrix.

ggraph(h12_graph, layout = "sugiyama") +
  geom_edge_elbow() +
  geom_node_label(aes(label = label), label.r = unit(0, "mm")) +
  theme_graph()
A Harris Matrix visualization of archaeological stratigraphy. It's a graph of nodes labeled with numbers, connected by edges drawn as orthogonal lines
A Harris Matrix visualization of archaeological stratigraphy, data from figure 12 from E.C. Harris’s manual Principles of archaeological stratigraphy, as modified by T.S. Dye.

It works perfectly!

We still need to include once-whole contexts in the picture, from the inferences table, but apparently this is not yet supported by stratigraphr either.

Summary: the quick way to analyze and plot archaeological stratigraphy data in R

In short, the equivalent to the stratigraphr vignette with Harris Matrix Data Package is:

harris12 <- frictionless::read_package(file="https://codeberg.org/steko/harris-matrix-data-package/raw/branch/main/fig12/datapackage.json")
contexts <- frictionless::read_resource(harris12, "contexts")
observations <- frictionless::read_resource(harris12, "observations")
observations <- observations %>% pivot_wider(names_from = url, values_from = older, values_fn = list) %>% rename(context = younger) %>% rename(below = `NA`)
edges <- stratigraphr::strat_connect(observations[["context"]], observations[["below"]], "below")
h12_graph <- tidygraph::tbl_graph(nodes = contexts, edges = edges, node_key = "label", directed = TRUE)
ggraph(h12_graph, layout = "sugiyama") +
  geom_edge_elbow() +
  geom_node_label(aes(label = label), label.r = unit(0, "mm")) +
  theme_graph()
A Harris Matrix visualization of archaeological stratigraphy. It's a graph of nodes labeled with numbers, connected by edges drawn as orthogonal lines
A Harris Matrix visualization of archaeological stratigraphy, data from figure 12 from E.C. Harris’s manual Principles of archaeological stratigraphy, as modified by T.S. Dye.

It’s slightly more verbose than the original stratigraphr, and it could certainly be improved, but it’s a good way to get started with archaeological stratigraphy data in R.


Commenti

Lascia un commento

Il tuo indirizzo email non sarà pubblicato. I campi obbligatori sono contrassegnati *

Questo sito usa Akismet per ridurre lo spam. Scopri come i tuoi dati vengono elaborati.