Principal Component Analysis (PCA), Archaeology and Geometric Morphometrics: some notes

tl;dr: This outlines how archaeologists could use Principal Component Analysis, realise its true potential, and use it as a “starter” and not a “means to an end” in analysing Geometric Morphometric data, and data more generally.


With an increase in user-friendly statistical software – capable of reading many different formats of shape data (beyond just .tps files) – all with increasingly sophisticated graphical outputs and visualisations, researchers (including ourselves!) are beginning to feel more confident in trying their hand at geometric morphometric methodologies (GMM henceforth). For anyone who has signed up to weekly archaeology and GMM “alerts”, e.g. ScienceDirect, you will be all-too-familiar with the exponential increase in article output on GMM. And this should be praised, of course! It provides researchers with a replicable method for testing hypotheses about shape where lineal measurements fail to be robust enough, not that lineal measurements should always be deserted! In many examples of literature on GMM and archaeology some form of Principal Component Analysis (PCA) is featured. But what is PCA? How useful is PCA? And how can we make the most of PCA? Are we becoming too reliant on such? Do we even know enough about it?

This blog-post stems from two experiences: 1) the amount of amendments I have made to my own PCAs over the last few months, as I have realised that these could have been more useful as visual descriptors of analysis, and one Friday night article-skimming (too cool, right?) and grappling with a PCA which got through peer-review and was, quite frankly, incomprehensible. This is a post which I hope will provide food for thought when you are making your own. I also apologise for the distinct lack of diagrams, i.e. no diagrams, ahead. It is rather text heavy, but there are plenty of examples out on the web and in articles. Much of my knowledge comes from Davis (1986) and Harper (1999) – there are some terrific diagrams contained within these references, I promise!

This post is the first of a number of blog-posts introducing methods typically used by shape-lovers in their analyses. If you fancy contributing please do contact us ( Maybe you’ll disagree with some of the things said here, or perhaps I’ve missed something – please let us know below! As I said, I am not an expert on this, I’m just trying to collate everything I have learned about PCA into one blog as a handy go-to guide (and you thought I was doing this for you guys, right?).

What is Principal Component Analysis (PCA)?

Principal Component Analysis is a statistical technique which finds “components”, or hypothetical variables which account for as much variance as possible within a multivariate dataset, i.e. a dataset with more than one variable (Davis, 1986; Harper, 1999). These newly-created hypothetical variables stem from a linear combination of the original variables, and are used to make the data easier to explore and visualise; strong patterns between different data-points can be visualised, and other underlying variables such as size (for morphometric data) can be analysed alongside these variables. While it can be used for data which features only two dimensions i.e. two variables (maximum height and maximum thickness), it is typically used for datasets of three dimensions or greater, when it becomes difficult to view in a point-cloud. In order to eliminate dimensions the method teases out variation on a new co-ordinate system, in which every value features a new (x,y) value. These new axes represent combinations of the variables analysed or “principal components”, uncorrelated with each other, with the first principal component representing the main source of variation, the second principal component representing the second main source of variation, and so forth. A great interactive visualisation of this technique can be seen in Powell’s online guide: It is important to remember that PCA does not take into account any form of group structure or subdivisions (males and females, tool-types etc.) – other ordination techniques including Discriminant Function Analysis will be more applicable if you need to do this. In shape-based literature you may come across “relative warps” – fret not! These are principal components visualised with vectors or thin-plate spline transformation grids.

When can we use Principal Component Analysis?

Principal Component Analysis has long been employed for the analysis of traditional/lineal measurements, and over the last two decades GMM utilising different landmark and semi-landmark-based methods, on biological and non-biological material, have been utilising PCA techniques as well. Whether this is a set of lineal measurements, or two-dimensional outline analyses of handaxes, Folsom points, or ceramic vessels, or even three-dimensional
surfaces of crania, Principal Component Analysis can provide an initial (and I stress initial!) examination of changes in variables and shape.

Some fundamentals of Principal Component Analysis

I’ll summarise this in a set of bullet points, to make it as clear as possible to understand:

  • The principal components have a vector of principal component coefficients (valuing from -1.0 to 1.0), which are used to create the principal component scores and indicate the direction of the corresponding PC axis in relation to the coordinate system of the original variables;
  • The principal component variance (otherwise known as eigenvalues) sum up to the same amount of the variances of all variables (i.e. total variance). In many outputs the variance of a principal component will take the form of a percentage of the total variance;
  • Many outputs will be able to represent the shape changes/deformations along the principal component axis e.g. Principal Component 1 may represent a transformation from bottom/distal heavy stone tools to proximal/heavy stone tools;
  • Outputs often include a “scree plot”, a plot of eigenvalues, which indicate the number of significant components which should be considered. After this curve starts to flatten out the components may be regarded as insignificant to shape change.

Hanging in there? It’ll get easier from here on, I promise.

How many principal components should you consider?

Short answer: there is no correct answer. Some people consider all principal components which account for more than 1% of variance, or an eigenvalue of 1.0 (Kaiser’s criterion), some people just use the main two components. It depends on how much variance you wish to describe. Someone once described it to me as a town: how many streets (i.e. principal components) should you take to get to know a town? It depends on the size of the streets, and the percentage of the town you want to understand from those streets. If a town is made up of two main streets is that enough to understand the town? You need to select enough components or streets (if you’re still following the analogy) to explain enough of the variance that you are comfortable with. A “good” PCA plot, for archaeology anyway, typically analyses the first three principal components (i.e. PCA 1 vs. PCA 2, PCA 2 vs. PCA 3, PCA 1 vs. PCA 3), with these scores typically accounting for more than 75% of shape variance. When in doubt use common sense! Publications will not let you publish hundreds of these plots, so think of the audience, but most importantly use your results appropriately.

Displaying and analysing the PCA results

The plots should be designed to be informative, whilst being clear at the same time. The suggestions below should not be treated as gospel, but rather designed to just make you think about your plot, and the information that can be displayed.

  1. PCA plots should have equal axes in order to accurately display the transformation of data; this needs to be better emphasised as many studies fail to modify their graphs (programs like PAST have recently added “equal axes” to their display settings – there is now no excuse!).
  2. Convex hulls can be added in order to display the entire range and distribution among different groups along the principal components displayed. 95% confidence ellipses (ellipses which cover 95% of the data plotted) are an alternative method to display the distribution of points within groups, and are more appropriate when you have some really far-out outliers.
  3. Some programs will not display the percentage of the principal component/relative warp inertia, if possible add these, just to allow the reader to fully understand the principal component plot.
  4. As with other graphs, include a legend and labels where appropriate.
  5. Visualising the changes in shape data, whether on the extremes of the axes, or by highlighting the shape of individual examples, is always a great way of communicating what is happening within a PCA plot and saves up on words!
  6. Preference: only use colour when appropriate! Rainbow PCAs may look cool, but when you’re dealing with many different groups it’ll get trippy. Maybe try symbols instead of colours?

So you’ve made the PCA graph… Congratulations! Seeing visual differences in the distribution of the points? At this part you may be tempted to just document the PCA plot, discuss the findings, and conclude. There is, however, so much more you can do with the data! If you take the PC scores…

You can document variability in the principal components in other forms of visual descriptors. How about a box-plot? It’ll allow you to get another visual descriptor of how tightly clustered the data is around the first few principal components, and allows you to see how tight different groups are around certain PCs;

Test for statistical significance: maybe perform a MANOVA and see if there is actual statistical significance among the first twenty principal components. Or maybe perform Canonical Variate Analysis? You’re spoilt for choice with some statistical programs… (remember though that statistical significance is not always archaeological significance!)

Regression: plot the main source of shape variation over other factors like size, or other forms of data e.g. latitude, time.

Most importantly it depends on what your hypothesis is and what your research questions are. Do not just do a technique because you can; do it because it is relevant.

And finally…

Just have fun exploring the PCA options of the different statistical programs. Many programs allow you to save your files as an .svg meaning you can truly customise it further in a variety of programs! But no rainbows, please?


References cited:

Davis, J.C. 1986. Statistics and Data Analysis in Geology. John Wiley & Sons, New York.
Harper, D.A.T. (ed.). 1999. Numerical Palaeobiology. John Wiley & Sons, New York.

Morphometrician of the Month (April)

This month we would like to introduce a new shape-lover, friend, and future collaborator Dr. Robert Z.(Zac) Selden Jr. from the crhr:archaeology


Name: Dr. Robert Z. Selden Jr.


Institution: Stephen F. Austin State University

What is your research on?

Geometric morphometrics of ceramic vessels, Caddo ceramics primarily, but I am also engaged in a bit of an affair with projectile points (less Paleoindian, and more Archaic). I’m interested in assemblage-level variation in ceramic shape and size (to include decorative elements) as these elements may help us to further extend our archaeological inferences related to cultural transmission, craft theory and the—potential—identification of communities of practice.

I see promise in the capacity of geometric morphometrics to assist in identifying specific vessels that might represent the beginning (or end) of a morphological tradition. While I don’t subscribe to evolutionary archaeology, my perhaps-too-lofty goal is to eventually identify (to the extent possible) the ceramic equivalent of a transitional species.

image 1

Network of ceramic types (circles/nodes) plotted by archaeological sites (lines/edges) within the study area. If there is an edge (or multiple edges) between two nodes, they both appear at the same (or multiple) site(s). If two or more types appear at multiple sites, the lines/edges between them are larger (weighted).

Upon completion of the quantitative analysis, shape attributes are ascribed qualitative identifiers (vessel shape numbers) based on the outcome of the analysis, which are used in a network analysis.

In practice, I use those data garnered from the integrated approach of geometric morphometrics and network analyses as something akin to a hypothesis engine, where the results of the analyses (which can—and should—be viewed independently of one another) tend to generate more questions than they answer. At the end of each run, I spend time reflecting on the process, and the method, while writing like a madman (I have dozens of journals full of observations, notes and possible future projects at this point), as I try to capture inferences and ancillary observations from each cycle before moving on to the next.

image 2

The same network as above, but with shape numbers added. The different colours represent different communities (defined by modularity) in which more/less vessel shapes are associated with each of the ceramic vessel types. I am currently working to plot these spatially and temporally (using R) to see how they articulate with known Caddo periods and phases.

Those questions raised—at least in part—in each iteration are helping me to refine and redress additional ancillary observations like vessel tilt and rotational asymmetry, along with a host of others. Over the past year (the last six months in particular), I have spent a lot of time on Skype asking folks whether they are seeing the same thing as I am. Many of the ancillary observations look to be quite useful, and I want to ensure that we capitalize on each of them as time comes available.

image 3

Rotational asymmetry – where the widest vessel radius is rotated 360-degrees to generate a nominal surface that is contrast with the mesh to calculate (and illustrate) the deviation from rotational symmetry—a.k.a., asymmetry. Note: I have shifted to a new method for the formal asymmetry analysis (more on that below), but this method may be useful in finally quantifying the differences for rotational asymmetry between coil-built and wheel-built vessels (where wheel-built vessels are generally thought to be more symmetrical).

What got you into morphometrics?

The genesis of my interest in geometric morphometrics began at a pub with my graduate school roommate (imagine that), who is a biological anthropologist. He was (is, rather) using geometric morphometrics to look at a variety of biological structures (from ape jaws to rodent post-crania), and we began a discussion of how the various methods might be applied to ceramics to answer questions related to cultural transmission.

image 4

Results of assemblage-level variation in vessel shape from three Caddo sites.

At that time, I had been working to insert myself between the repatriation and reburial process for Caddo burial vessels; particularly those that fell under the purview of the Native American Graves Protection and Repatriation Act (NAGPRA), as I wanted to document these important cultural items before they were no longer available for study. It was at this point that I really began contemplating the potential contributions of geometric morphometrics to my current research design. However, the bulk of our discussions remained centred upon how to properly identify landmark and semi-landmark data points on a series of specimens that—for all intents and purposes—have only a single homologous landmark (the central base), which made me question whether this was even feasible.

CNO 3SV10 81-89-1
by Dr. Robert Z. Selden Jr.
on Sketchfab


Where would you insert landmarks and semi-landmarks on the vessel above? This is an interactive 3D image; press play to activate it, then click/drag to rotate.

At this point, I want to echo Professor Collard’s statement from Morph2015, in that I see one of the principal challenges of employing a study of geometric morphometrics in archaeology as defining homologous (or for that matter, even semi-homologous) landmarks. It took me quite a while to settle on my current method of applying landmark and semi-landmark data points to the sample of 3D ceramics.

Initially, we used Cartesian coordinates that were subjectively applied, then exported those to Morphologika for analysis. While I was pleased to have taken a first stab at geometric morphometrics, I would be remiss if I didn’t mention my disappointment at the lack of replicability that I came to see as blatantly obvious in the pilot study.

In its current form (see an example in the YouTube video below), my method for applying landmark/semi-landmarks to ceramics has evolved substantially (after dozens of iterations), and continues to be refined. I have also shifted over to the geomorph package in R for the analysis.

YouTube video that outlines the process used to populate landmark/semi-landmark data points on a ceramic vessel using reference geometry in Geomagic Design X.

Subsequent to alignment, I now use Geomagic Design X (reverse-engineering software) to insert a revolving vector (defined by an algorithm—not subjectively [although note that the algorithm is still representative of a bias; just not my own]), then the single landmark (the only homologous point that I see as transcending all of the various vessel classifications [bowl, bottle, etc.]) at the central base, now defined by projecting a single point at the intersection of the 3D mesh and the revolving vector.

The basal plane (defined during alignment) is used to orient the vessel as if it were sitting on a planar surface—like the ground—which I assume to be the intent of the maker. This plane serves as the basis for a mesh sketch, used to generate and extrude a cylindrical surface around the circumference of the vessel. Deviations are calculated between the cylindrical surface and the mesh, making it possible to identify (consistently – ergo, replicable) the widest point of each vessel. That point is then used to insert a plane—coplanar to the central vector—along the widest profile of the vessel (defined by that widest point on the mesh surface).

The plane inserted along the widest profile is used as the base plane for a 3D mesh sketch, where 12 splines (six full vessel profiles) are inserted at equidistant intervals around the circumference of each vessel. Those splines are cut at the point of highest curvature along the rim. As an aside, I want to mention here that one of my goals was to generate a landmark/semi-landmark configuration that I could use for both 2D and 3D data, since we have hundreds of images in publications that can (and will) augment this initial study. The splines on the interior of the vessels were deleted (primarily because I use surface scanners, and cannot scan the interior of carinated bowls, bottles, etc.), and equidistant semi-landmarks were inserted along those splines from the central base (LM1) to the highest point of curvature on the rim along each radius, while rotating the vessel in a clockwise direction (although see note below). Once populated, those data are then exported to a .csv file.

Note: I have since incorporated a minor alteration to this method, and am now splitting each of the splines at the base/body and body/lip junctures as well. This provides a means by which I can explore the correlation between the base, body and lip, and explore things like shifts in basal morphology through time (which turned out to be important—remember those ancillary observations?). Additionally, I can use those divisions to figure out which component best discriminates between the various vessel shapes.

From this point, you’re likely familiar with the remainder of the analytical process – import raw data to your favourite software, then dig in.

image 5

Another quick note here – I am now using those landmarks associated with the widest vessel profile (a) to look at variations in (b) fluctuating and (c) directional asymmetry. I think that both measures of asymmetry have some potentially interesting applications in archaeology (for a variety of artefact classes—certainly not limited to ceramics).     

Do you have any advice for budding morphometricians/shape-lovers?

Ask questions – lots of them, and be skeptical of your own work. Also, be comfortable enough in your own skin to genuinely laugh at your mistakes, shake them off, and keep failing forward. I used to skateboard when I was younger, and still remember one of my best friends telling me, “if you’re not falling, you’re not pushing yourself hard enough.” I think that the same logic applies here (but do yourself a favour and try to do your falling in the lab, rather than in print—so, full-circle back to my initial answer; ask lots of questions).

Some of my biggest gains have come from posting my ideas online and soliciting feedback from the larger community of practitioners. Join the MORPHMET list, and participate in discussions about topics or concerns that interest you. Also try to think through issues that others are having; it helps to conceptualize the process outside of those artefacts or specimen categories that you become comfortable with.

What is your go-to guide/article?

It’s not a guide or an article, but a rather short read that I think many of you would enjoy; The Shape of Time by George Kubler. If I find myself discouraged, or feeling stagnant, this gets me right back into the swing of things. Also, if you have not read The Writing Life by Annie Dillard, you should. Plenty of guidance in there that can be appreciated by graduate students and professionals alike—truly transcen­ds disciplines.

Favourite software?

I’m a really big fan of geomorph. I had been working with Emma Sherratt for about a year before heading to Portugal last fall for a workshop with Dean Adams, Michael Collyer and Antigoni Kaliontzopoulou. These folks have been absolutely wonderful, and I cannot thank them enough for their guidance and support as I have made my way through the various analyses.

Favourite online reference (besides this)?

I always have one eye on the geomorph blog, and frequently find myself checking the new meetings, workshops, courses, etc. page on the SUNY Stony Brook website.

Additionally, I assembled the beginnings of what I’m hoping will be a nice crowdsourced bibliography of geometric morphometric resources—for archaeology and beyond—beginning with Sarah and Christian’s list, then expanding on it a bit (access that here—generate a .pdf of the bibliography by clicking on “PDF” at the top of that screen, and please share widely), and I’m hoping that this community will help me to get this resource up to date. Send additional references to with Morph2016 as the subject line (include DOI and ISSN where possible), and I’ll get them added!

Many thanks to Sarah and Christian for putting this resource together for the archaeological morphometric community. It’s nice to see more folks beginning to work through and apply these methods to archaeological problems.


Morph2016 Workshop!


We can now announce that in conjunction with Morph2016 ( and The Natural History Museum we will be organising a one-day workshop providing an introduction to geometric morphometrics for archaeologists and anthropologists.

Led by Prof. Norman MacLeod (The Natural History Museum, UK) this one-day workshop will provide a theoretical and practical overview of geometric morphometrics, from data collection, and choosing the right software, to data analysis and presentation. There will also be opportunities throughout the day to discuss your own data, should you wish to receive feedback, in addition to other techniques.

During the extended lunch there will also be an opportunity to visit the NHM anthropological and archaeological collections (I KNOW, RIGHT!?). More details about this will be outlined in due course.

The workshop will take place on 25th May 2016 (the day before Morph2016). Spaces are limited to 40 people so please do register as soon as possible, shape-lovers.

More information can be found on our Eventbrite page:

This will be a great opportunity to get a hands-on experience of GM before Morph2016. We hope to see many of you there!


               What a time to be alive. (Source: GIPHY)




Morphometrician of the Month (March)

We are pleased to announce a new series here at Archaeomorph called the ‘Morphometrician of the Month’. Each month we will post a short interview with a morphometrician from varying backgrounds and academic levels to highlight the application of geometric morphometrics in their research.

We are happy to announce Dr. Tim Astrop as our first Morphometrician of the Month and are looking forward to working with him in our future Archaeomorph collaborations.


Name: Dr. Tim Astrop


Institution: The University of Bath

Academic level: Postdoctoral Research Associate

What is your research on? Currently, I am engaged in a project looking at extinction selectivity in ammonites. As a palaeobiologist I’m interested in what we can learn about life on earth from it’s extensive, rich and (in my opinion) undervalued fossil record. Basically, ammonites were around for 350 million years and survived several major mass extinctions, even the Permo-Triassic extinction which saw the demise of an estimated 96% of all marine species. Somehow these tenacious little molluscs survived. Often as single lineages and despite losing much of their morphological disparity to such catastrophic events, they quickly radiated in their aftermath, evolving similar morphologies again and again. We are interested in elucidating which, if any, morphological and functional features certain lineages possessed that would enable them to survive or conversely, set them up for extinction. I currently use 3D rapid prototyping technology alongside 2D geometric morphometric methods to study the iconic shells of the group and subject them to both simulated and experimental water-flow experiments to see how form affected function (in terms of stability/drag etc).

morph of the month 1

Favourite software? I’m a superfan of Geomorph, the R package by Dean Adams and Em Sherratt. It is amazingly multi-functional and has the capability to capture and analyze 2D and 3D morphometric data in a plethora of ways. More recently I’ve been playing with Momocs, another R package that performs some really funky 2D outline analyses and produces some amazing graphical displays of your data.

Favourite online reference (besides this)? The Geomorph package website: it has loads of info, tutorials and Em Sherratt is amazingly responsive to queries and ideas regarding the package and it’s development.

If you are interested in being our Morphometrician of the Month please contact


Statistical Support Group for Archaeologists


Over the next few months (i.e. for the foreseeable future) both of us here at ArchaeoMorph will be hosting a series of internal statistical support sessions, here at the University of Southampton (Department of Archaeology), specifically aimed towards archaeologists of all years, covering everything from arithmetic means to canonical variates.

We will advertise all support sessions over this, and the Twittersphere, including dates of our drop-in data cafe sessions, and our program workshop days.

All are invited! For those who are unable to make it to these sessions: fret not! We will upload all our PowerPoint presentations to our ArchaeoMorph forum, including archaeological datasets we will use.

Today we hosted our first session (thank you to all those that attended!), providing a general introduction to statistics for archaeologists including: 1) hypothesis construction, independent/dependent variables, graphical representations of data and descriptives (dispersion, shape, central tendency). This IS now live on the forum.

Stay tuned!

^C & S

Navigating the bridge of archaeology together…



Good news ! (JAS and Morph2016)

Good news shape-lovers!

Following the success of Morph2015 we can announce that our special issue proposal for the Journal of Archaeological Science has been accepted. Titled ‘Deriving meaning from metrics: examining geometric morphometric frameworks within archaeological analyses‘ the edited volume will provide a review of how archaeologists are using geometric morphometric methodologies, and most importantly, the future of the methods within various sub-categories of archaeology.

Guidelines for authors will be forwarded on this week; if this is something you would like to contribute to then please contact us ASAP with a working title, and an abstract (

Also, have you heard about Morph2016!?

The second Morph conference ‘Morph2016: Morphometric Applications in Archaeology and Anthropology’ will be held on the 26th May 2016, and hosted by those amazing shape-lovers at University College London. More details including the CfP and registration can be found here: and their Twitter and Eventbrite pages (@Morph2016 , Eventbrite). In conjunction with the conference we will be hosting a one-day workshop (woohoo!) with Prof. Norman MacLeod in the Natural History Museum (more details to follow this week!). We know how much effort team UCL are investing into MORPH2016 so please do support your fellow shape-lovers!

I’m sure you’ll agree with us that this is an exciting time for studies involving geometric morphometrics within archaeology.


giphy 2
Our own version of this will follow shortly… (

^C & S

UPDATE: Redesigned and with a forum!


We’re now pretty! Over the last few weeks we here at ArchaeoMorph have redesigned the format of the website to what we hope, you think, is a much more suitable and professional design (because we are professionals, right?).

We’ve also added the forum (, and we will be continuously updating this in the next few days and from now on. Please do register and contribute; the more people who register, the more effective the forum, the better shape-lover you’ll become!

^C & S

Sourced from Giphy (