Recently I’ve been looking at the International Barcode of Life project. The idea is take DNA samples from animals and plants to help identify known species and discover new ones. While other projects strive to identify the complete genome for a few species, such as humans, dogs, red flour beetles and others, the barcoding project looks at a short 650-base sequence from a single gene. The idea is that this short sequence may not tell the whole story of an organism, but it should be enough to identify and distinguish between species. It will be successful as a barcode if (a) all (or most) members of a species have the same (or very similar) sequences and (b) members of different species have very different sequences.
I was able to acquire a data set of 1248 barcode sequences, all of them Lepidoptera (butterflies and moths) from Australia. Each entry gives the name of the specimen (if known), the location it was collected, and a 659 base (i.e. ACTG) barcode.
An Exercise in Species Barcoding