A Tutorial on Understanding and Using genealogical DNA testing to help build family trees

Started by Michael Brown on Thursday, March 6, 2014
Problem with this page?


Profiles Mentioned:

Showing all 23 posts
3/6/2014 at 1:43 PM

First, I must confess that while I do have a BS in a science it was not biology let alone a sub-field that looked more specifically at genetics.

Second, from some other discussions here (and elsewhere) I think it would be extremely helpful to have something here that helps people to understand just how the procreation process works from a genetic standpoint.

I am not sure if a discussion thread is the best venue for it, but let's start here.

I also am not sure about how technical we might wish to get, but there will have to be some use of technical terms. Also, I do not wish to condescend to anyone, nor do I wish to pander to anyone. However, to make it as easy as possible for people to understand the more technical aspects, I think the discussion should start with the most basic aspects of DNA and living organisms. Different kinds of organisms (i.e. bacteria, viruses, fungi, plants and animals) have different cell structures and genetic materials. Some bacteria and/or viruses, for example, use RNA as the primary encoding genetic material and not DNA. Also, many, if not most or all, multi-cellular animal organisms do not have all of their genetic material within a single sub-structure of the individual cells. The cell structure for mammals especially has sub-structures in the cytoplasm (which is something like a soup). The cytoplasm is outside of the cell nucleus but inside the cell wall/membrane. Some of those sub-structures are ribosomes and some are mitochondria. There are other sub-structures as well, but, as far as I know they are not important (at least not directly) to a discussion of the genetics. What is important for us, is that the primary genetic material (DNA for us) necessary for replicating individual cells and for procreation is not all stored in the cell nucleus. The majority of it is. That would be the 23 pairs of chromosomes. The rest is in the mitochondria in the cytoplasm. There are theories as to why the mitochondria have their own DNA. I do not think we need to get into that discussion.

Let's look at the mitochondria in a cell's cytoplasm. Mitochondria perform essential funtions in all of our cells, including the cells neccesary for procreation (that is in the spermatzoa and in the ova). Somehow, during the fertilization of an ovum by a sperm cell (or shortly after) the mitochondria of the sperm is destroyed. [ Note: I have read something recently about some research which indicates there are instances of some of the mitochondria of sperm cells surviving this process, but if it does occur, it does so very infrequently. Also, if it does occur, it will affect the outcomes of mitochondrial DNA testing. ] This means that mitochondria, and therefore, the mitochondrial DNA in our cells is contributed solely (reference the Note above) by the maternal line through the mitochondria of the fertilized ovum. This why mitochondrial DNA testing is done to confirm maternal lines.

Now for the majority of the DNA necessary for our procreation and cellular replication. The 23 pairs of chromosomes in the cell nucleus are to be categorized as 1 pair of sex chromosomes (the famous X- and Y- chromosomes) and 22 other pairs of chromosomes. Women have 2 X-chromosomes. Men have 1 of each (disregarding rare cases where the male of a species gets 1 Y- and 2 X- chromosomes. There are usually very debilitating diseases associated with these cases.) How do they get passed to the next generation and thereby determine the sex of the offspring? In a woman's ovaries, the cells which produce the ova, split the pairs of chromosomes when producing the daughter cells (the ova). Now here's a finer point about which I am unsure. I do not currently know if in this process, all ova get the exact same X-chromosome or if some get the 1 contributed by the mother while others get the 1 contributed by the father. Either way, each ovum (normally) only has a single X-chromosome. In Men, the pre-cursor cells in the testes also split the pairs of chromosomes when producing the spermatozoa. Some of the sperm will carry an X-chromosome. Some will carry the Y-chromosome. When a sperm with an X-chromosome fertilizes an ovum, the resulting embryonic cell has 2 X-chromosomes and if it develops properly produces a female. When a sperm with a Y-chromosome fertilizes an ovum, the resulting embryonic cell has 1 X- and 1 Y- chromosome, producing a male.

Since for these processes, the splitting of the pairs of chromosomes by the pre-cursor cells is a random process, the percentage of sperm with Y- chromosomes and those with X- chromosomes should be theoretically 50% each. The fertilized ova should therefore be 50% male and 50% female. All statistical processes do not match the overall statistics at any given time, but should tend toward those numbers, provided that there is not some other process at play ( such as, for example, cultural pressures to produce males and not females leading to the abortion of more female fetuses than male fetuses.) Now, the question this suggests is: what does this mean for genetics?

One thing it means is that the Y-chromosome DNA tests can only be used to verify or disprove DIRECT patrilineal descent/ancestry, since only the males carry the Y-chromosome. A female does not have Y-chromosomes to pass on. So, if one gets a Y-DNA test done and wishes to match to a male ancestor, then they must look for the match with a MALE. That may sound a bit redundant, but let's say that someone thinks that there may have been an NPE, a Non-Paternal Event - i.e. that at some point the true biological father of an ancestor was someone other than what the historical record says it was, they should be looking for a DIRECT MALE descendant (strictly paternal line) from the suspected true ancestor to have had offspring with the known maternal ancestor for comparison testing and not a FEMALE ancestor who was a descendant through a direct paternal line of the suspected Y-DNA source to have had a child or children from the historical biological father. Y-chromosome DNA is not crossed in any fashion with the DNA of an X-chromosome or any of the other chromosomes that a maternal ancestor carries.

Some DNA does get mixed. This occurs with the other chromosomal pairs. This is where the autosomal DNA testing comes into play. It may be used to help locate biological relatives from ANY and ALL lines. It does, however, become much less reliable the further removed the relationship due to this mixing.

Additionally, something to remember when discussing haplogroups. There a 2 distinct sets of haplogroups to discuss. One set of haplogroups is for the mitochondrial DNA. The other is for Y-chromosome DNA. THEY ARE NOT IDENTICAL. I am unaware of the status of X-chromosome research and also for that of the 22 non-sex chromosome pairs. There may be efforts with some of the genomic studies to determine if any haplogroups exist for these. I do not follow this research religiously. I suspect that one would be able to determine haplogroups for X-chromosomes, but the problem in using them is that of trying to determine how to make use of that. Since a woman has 2 X-chromosomes, how can we be sure which one comes from the mother and which from the father. For males, we can be sure that the X-chromosome comes from the mother, but how can we be sure which of her parents contributed that X-chromosome to her. Any such research would most likely require larger samples with far more detailed tracking of the ancestry of the contributors to be able to begin to sort that out reliably.

I am not going to get into the statistics of probability of matching a particular person as having a common ancestor within so many generations, except to say that my understanding so far is that the statistics are a little different depending on whether the test is Y-DNA, mitochondrial or autosomal. It also depends a great deal on the number of marker sites tested for comparison purposes. I have not had a separate stand-alone statistics class (although some of the classes I have had used statistics and probabilities to a large degree), nor have I, as I already stated, followed this research religiously. For that reason, I have not yet got my head around the rates of mutation for different marker sites on Y-chromosomes and mitochondrial DNA or on other chromosomes and how those rates of mutations combine with the random statistics of the mixing of chromosomes from parents during the production of sperm and ova to predict the likelihood of a common ancestor within a certain number of generations.

I apologize for such a long post. But, I think the way I have presented the basics of the topic as best as I currently understand them is the best way to lay the groundwork for the discussion. Hopefully, some other person or persons here can correct any mistakes I may have made in this initial post and provide more detail to help us all understand this topic better. Especially since I think that the use of DNA testing for genealogical purposes is almost definitely going to continue to increase.

3/6/2014 at 2:25 PM

http://www.geni.com/path/Michael+P+McCann+is+related+to+Michael+Bro... hmm.. well said.. I recently had a all in one family test done.. do you know any one who can tell what hallogroup from the raw data?

3/6/2014 at 2:35 PM

Michael, (nice name by the way) what do you mean by an all in one family test?

3/6/2014 at 4:01 PM

Ancestry.com was offering a 99 dollar christmas special that combined both the y dna and mtdna so basicly the male and female lines that cover both sides of the family.

3/6/2014 at 4:04 PM

oh and you'd be my 10th cousin 2 times Nathaniel Bassett being our common ancestor

3/6/2014 at 4:06 PM

Michael Brown - great post, thank you very much for it.

3/6/2014 at 4:45 PM

I looked at the relationship path you posted, cousin Michael. :)

As for your question, I can only think of 1 person here (on Geni) who seems to have a handle on the statistics from postings I've seen. Maybe you can ask Justin (I think it's Justin, anyway. I can't always trust my memory) from the Rice Pudding Part 2 thread.

3/6/2014 at 4:52 PM

No problem, Erica. I do not know much more than the basics, but I think there are plenty of people who do not even remember high school biology let alone understand the newer research results which have led to vastly improved genealogical and forensic tools. I like shows like CSI and NCIS, etc. but people also need to understand that there are limitations to these methods and those limitations can largely be quantified. I just wanted to get this started and hope that someone who is actively involved with this field of science or who has done considerably more reading on it than I can contribute more details with some level of technicality without going overboard with that. It needs to be concise and succinct enough for the vast majority of people to follow it.

3/6/2014 at 5:02 PM

Thanks for the keep it simple silly thread.. I can understand it and i don't have a degree in rocket science. Hmm.. funny you should mention ncis and csi.. Did you know ted danson from cheers\csi vegas is in my family tree?

3/6/2014 at 5:07 PM

Yes, that's what's so helpful about your post - helps us put in context (what we can remember) from high school biology or have gleaned from the developing use in forensic science.

The 1st thing, I think, to be aware of is that studying DNA for ancestry test results is still a very new field & science. I saw a presentation from Dr Wells of NatGeo which showed an enormous leap forward in people testing in 2013, but it's still a tiny, self selecting, and non representative database of results for comparison.

3/6/2014 at 5:26 PM

Hopefully in time as more people from different backgrounds get tests done combined with improved techniques with smaller error-bars, they can improve on the databases and results. One problem I see though is peoples fear or distrust of having to much of their own personal identifying information available to so many other people. Information that could either be used in impersonation/identity theft or misused/abused by authorities for other purposes.

3/6/2014 at 5:27 PM

Spelling and grammar corrections for my previous post: peoples should have been peoples' (possessive) and to much should have been too much.

My apologies.

3/6/2014 at 7:40 PM

This article looks good ...

How to Use DNA Testing for Genealogy Research

3/6/2014 at 7:56 PM

Thank you Erica! I had actually started to read that article a week or two ago. I did not read all of the pages, but it did seem to be a decent article.

Private User
3/8/2014 at 9:48 PM

Michael Brown! I can't imagine how I missed this thread and your wonderful intro. Thank you for starting this.

We have quite a few people here on Geni who are very knowledgable about DNA for genealogy. We have admins for various Family Tree DNA regional and surname projects, and even a DNA blogger or two. All the ones I know have told me over and over that they want to find a way to help people who have questions. DNA geeks love to talk about DNA. No surprise in that ;)

I would love to see people Follow or Join the DNA Primer Project here on Geni: http://www.geni.com/projects/DNA-Primer-A-portal-for-genetic-geneal...

Private User
3/8/2014 at 9:51 PM

Michael McCann, it might be possible to make a good guess about your haplogroup from the raw data of your test. Can you tell us a little more about it? If they tested your yDNA, what did they tell you about it?

3/9/2014 at 3:24 PM

how one finds info on a autosomal test i do not know. I had one done I suspect I might allready know my hallogroup at least for my dad's side Harald Tveit Alvestrand has his listed and because he's my fathers sixth cousin I suspect some modified form of what he has might be present on my line allthough i could be way off..

3/12/2014 at 4:14 PM

I am continuing to find and read more articles on DNA to further my own understanding. As I do, I will try to post some additional information from time to time to hopefully help others understand "this genetic thing". :)

I think it's time for a little more on just how DNA gets passed along.

First, most of the cells of a human body undergo a process known as mitosis. Most of the cells being cells of all types (muscle, connective tissue, blood, liver, lung, brain, etc.) except for those cells which are specific stem-cells which produce the ova and spermatzoa, the sex cells which combine during fertilization to produce a zygote or embryo. During mitosis, all 46 chromosomes are replicated. Initially each one is bound with its own copy in a state called sister chromatids. To complete the mitosis, these sister chromatids are split and 1 copy of each of the chromosomes from each parent are then paired with a corresponding chromosomal copy from the other parent and are sequestered in a new daughter cell, so that the daughter cell gets 46 chromosomes (23 pairs). Also, during part of this process, there may be some cross-over between chromosomes from different parents. When this occurs, the homologous chromosomes (the correpsonding chromosomes from the individual's parents) swap analagous sections of the chromosome. There may also be individual mutations at this time.

However, as these cells do not give rise to the reproductive cells, any such changes from cross-over, recombination, etc. do NOT get passed on to a new generation.

The only way that can happen is from such occurrences in meiosis (a process similar to mitosis) resulting in ova and sperm with DNA that is shuffled among the non-sex chromosomes and/or mutations in the sex chromosomes.

3/12/2014 at 4:34 PM

In that last post I left off discussion of the mitochondrial DNA. That is because that DNA is separate from the nuclear DNA (DNA contained in the cell nucleus). Mitochondrial DNA is contained in the Mitochondria, which are organelles in the cell cytoplasm external to the nucleus.

I do know that daughter cells get some of the mitochondria from the parent cell. I am not sure of 2 things, though. The first is whether the number of mitochondria is increased prior to cell division or after. The second is the process by which the Mitochondrial DNA is replicated.

3/14/2014 at 5:51 PM

I think perhaps, before we go further, it is time to discuss the structure of DNA and RNA.

Both can be either 'single-strand' or 'double-strand'.

Either way, along the length of the strand, they take a helical orientation.

Think of 'single-strand' DNA or RNA as a spiral staircase. Each 'step' of the staircase is a single nucleotide.

'Double-stranded' DNA or RNA has a nucleotide or 'step' on opposite sides of the supporting pole if you will.

Between DNA and RNA there are a total of 5 nucleotides.

They are abbreviated: A ( for adenine ), C ( for cytosine ), G ( for guanine), T ( for thymine ), and U ( for uracil ).

3 of them show up in both DNA and RNA. Those 3 are A, C, and G.

T ( thymine ) only shows up in DNA, while U ( uracil ) only shows up in RNA.

In single-stranded DNA or RNA, the length of the nucleic acid molecule is often discussed in terms of the number of nucleotides.

In double-stranded DNA or RNA, the number of "base pairs" is used. The base pairs are the 2 nucleotides or 'steps' which are at the same position along the length of the molecule. These nucleotides always pair up with only certain other nucleotides. G and C always pair up. A and T always pair up in DNA, while A and U pair up in RNA.

One thing to remember when discussing the DNA testing, SNP testing is Single Nucleotide Polymorphism testing. The intent is to look for mutations which result in the substitution of a single nucleotide (and its corresponding base-pair nucleotide). We will discuss this again.

Also, another point, the y-DNA and mtDNA tests do not test every nucleotide position along the DNA molecule. There are millions of base pairs on even the shortest chromosome.

I will continue this in a separate post.

3/14/2014 at 7:37 PM

You may be asking if it is necessary to talk about this subject at the level of individual nucleotides. Obviously, if you are getting a SNP test, it is important to understand what they are testing and that the testing is looking for changes at single nucleotides. The other tests will be discussed later.

Now, for the next installment. :)

An individual "gene" is comprised of dozens or hundreds of base pairs of nucleotides. The actual functional parts (with respect to actually encoding for the production of proteins to be used by the cell) are on only one of the strands. The nucleotides on that strand in the sequence which determines that gene are organized in groups of 3 nucleotides. These groups are called codons. Each codon determines what amino acid is to be added to the chain of amino acids in order to produce the protein. This is done in a preferred direction along the DNA strand. The actual protein production is controlled by single-stranded RNA, which was made using a strand of the DNA as a template. So, if the 3 nucleotides of the functional codon on the DNA strand are AAG, for example, the RNA that is pieced together to actually construct the protein that the gene encodes for will have UUC ( U in place of T, since it is RNA ). There are 64 different codons. However, some of them attach the same amino acid to the chain it is constructing. Some codons do not add an amino acid but terminate the process or perform some other function.

Not all of the nucleotides on a strand of DNA are part of a "gene". That is not to say that those nucleotides are not important to the cell's functions. Some segments of DNA are used in ways other than encoding for specific protein production. Some are used to regulate some of the processes. Some segments do not as of yet serve any known purpose. [Also, as a side note, at the ends of DNA strands there are long segments of repeated nucleotide sequences. These are called telomeres. During the process of DNA replication, these telomeres become shorter, due to physical manner of the process itself. As a result, some portion of the telomeres is lost each time the cell divides. DNA replication is a prelude to cell division. Eventually, the cell can no longer divide. This is believed to explain, in part, issues with cloning individual organisms. It is believed that telomeres help to maintain genomic integrity by insulating the rest of the chromosome so that all of the important nucleotide sequences are duplicated during the DNA replication.]

Hopefully, next time I can tie this together with the "markers" used in DNA tests.

Private User
3/14/2014 at 7:54 PM

Michael, I'm very pleased that you're going through this. It's hard for people to absorb in all one of big dose, so it's great to have it broken up into pieces. I'm a little surprised you're not getting more questions. Maybe that will come later.

3/14/2014 at 8:41 PM

It is just as much for my own edification as for that of other people. I have picked up bits and pieces over the years from reading an occasional article in Smithsonian, Nature, Scientific American, news coverage of Dolly the Sheep (remember that?), etc. Lately, getting a lot more of the detail and re-inforcing what I already knew from HS Biology and those other sources I mentioned. I already thought of breaking it up into segments. I like to think of myself as reasonably intelligent, but I don't care HOW smart someone is, the mind has limits on how much it can absorb in a short period of time. I am trying to make sure that the pieces I put up cover enough of one part of the topic to make a cohesive point or two in as concise and succinct a manner as I can. I hope I am finding a decent balance to it all and not make any major mistakes with the science itself in the process. Maybe some people are waiting for the parts about the actual tests which are offered before asking questions. We will see. At some point I will also post a link to that DNA Primer project in here.

Showing all 23 posts

Create a free account or login to participate in this discussion