Geni Podcast: The Problem with GEDCOM

Posted May 27, 2011 by Geni | 4 Comments

[Download This Episode] [Subscribe in iTunes] [Subscribe in Your Favorite Podcatcher]

Show Notes:

Transferring Your Genealogy Database Data

Many of us use genealogy database programs such as Legacy Family Tree, RootsMagic or Family Tree Maker to track our genealogy research and data. These programs often require and installation on your computer and a proprietary data file format is used.

However, there are methods you can use to transfer data between these programs.  Why would you want to do that? Well, let’s say that you are collaborating with another researcher and they’ve asked for your data file. But you are using RootsMagic while the other researcher is using Family Tree Maker and the file formats are different. There are several solutions to this problem which we’ll discuss.

The best solution, actually, might be to use an online collaborative site like Geni.com – you can invite others to look at your data and there is no need to exchange data files! But if you still use some of these database programs, here is what you need to know.

What is the GEDCOM Import/Export method of transferring data?

GEDCOM stands for GEnealogical Data COMmunication and is a universal file format for genealogy database software. Most programs will let you export your data into various GEDCOM formats so that you can then import it into another program or even online With Geni.com, you can export your data in GEDCOM format and then open it with another program.

What are the issues involved with using the GEDCOM method and why doesn’t it work very well?

GEDCOM appears to make the transfer of genealogy data back and forth between programs easy but, in fact, there can be quite a bit of data duplication and data loss when using the GEDCOM method.

Example: if your database program allows you to create your own events that are not part of the standard list of life events – such as First Holy Communion – that data might be exported as part of the GEDCOM file, but . . . the other program into which you import the data may not have a way to store that snippet of data. So what happens? For some programs, the data is simply dropped or is not imported. For others, the data is “dumped” into the Notes section and must be sorted out and reformatted.

Other issues involve multimedia. If you store photos and documents, very often this data – even the links if the items are stored in another data directory – are dropped as part of the conversion to GEDCOM format.

And sources – let’s not even go there.  If you spend a lot of time citing your sources for your genealogy research – which you should do – then you will be greatly disappointed when a data transfer reformats your sources or drops them all together.

What is being done to improve the GEDCOM file format?

There are several groups working towards improving GEDCOM. Better GEDCOM (http://bettergedcom.wikispaces.com/) is an independent community working to build a better GEDCOM file specification that serves 21st century genealogists. Their website is actually a knowledgebase using the wiki format, along with discussions, to try and develop an agreed upon standard for the GEDCOM format. BetterGEDCOM is a great place to get the latest news on the issues involved with the GEDCOM file format and what the genealogy community is trying to do to fix the problems.

The International OpenGen Alliance (http://www.opengen.org/) is another community group made up of volunteers trying to develop a single standard for genealogy data exchange. OpenGen uses a Basecamp platform which requires a membership – the platform provides a space for discussions, file uploads and other tools to collaborate with others on the issues involved with improving the GEDCOM standard.

What about AncestorSync? Does it really transfer your database anywhere you want with your images, documents, and sources intact?

AncestorSync™ (http://ancestorsync.com) is a new product that allows you to synchronize your online family tree with your genealogy database program. Right now, AncestorSync only supports certain online programs including Geni but look for this list to be expanded in the near future. AncestorSync is not a file format – it is a utility that allows you to synchronize the data you have online in programs like Geni with the genealogy database software you prefer to use.

In many cases, while an import of or export to GEDCOM is fine, once the data is imported into a program, it is difficult to keep both sets of data up to date without ripping out the entire file and doing a new import/export. The goal of AncestorSync is to only update the data that has changed.

AncestorSync developers are working hard to add new features such as selective data synching so the user can determine what events, what types of data will be synched. In addition, they are looking to expand the number of online programs they work with beyond the current set of programs.

More About Thomas MacEntee

  • Webinars: Thomas will be presenting a FREE webinar via Legacy Family Tree webinars entitled Google Forms for Genealogists on Wednesday, June 1, 2011 at 1pm Central time. Click here to register.
  • GeneaBloggers Radio: Every Friday evening from 9-10:30 pm Central time, Thomas MacEntee hosts an Internet radio show – GeneaBloggers Radio (http://www.blogtalkradio.com/geneabloggers). Via your computer, you can listen to interviews with interesting genealogists and companies involved in the genealogy industry.This Friday, May 27, 2011, this week’s discussion is Military Records and Genealogy.

 

Show Notes:

 

Grant Brunner: Welcome to the Geni podcast. I’m Grant Brunner, and with me today is Thomas MacEntee. How are you, Thomas?

Thomas MacEntee: I’m doing great, Grant, I’m wonderful.

Grant: That’s great to hear. So here’s the problem. You have your sources, you have your photos, you have your whole tree database, you’ve done that all on your computer. How do you get that to someone else? Or say you give somebody your information, or let’s say that you even want to just switch programs. Let’s say you’re using Legacy and you want to switch over to RootsMagic. How do you go about getting that? I mean like, yes, we have ways of doing it. Up to this point there have been some issues, so let’s start off really basic. Let’s just understand transferring your genealogical database. How do you go about doing that?

Thomas: There are several ways; some are easier than others. This is what it comes down to – a lot of us use genealogy database programs such as Legacy, Family Tree, RootsMagic, Family Tree Maker. Those are installed on our computer physically, usually on our C drive. That is what we use to enter our genealogy data, our research information, and they all use their own proprietary file format. Now if you want to transfer the data between those programs, first off, someone’s going to say, “Why do you want to do that?” Well, let’s say you’re collaborating with another researcher and they ask for your data file. Also, you’re using RootsMagic, they’re using Family Tree Maker. There are several solutions that you can use to go back and forth and transfer the data. 

Now the other thing I do want to point out also, Grant, is that the best solution might be to start with an online collaborative site like Geni.com, where you can invite others to look at your data. That way there’s no need to exchange those data files.

But even with Geni.com there’s some times when you want to export the data so that you can use it in other means or other people can use it in other programs. So basically, it comes down to using GEDCOM.

Grant: People who are new to genealogy probably don’t even know what GEDCOM stands for and what its capabilities are.

Thomas: Right, so we’re going to talk about the GEDCOM import/export method of transferring data. GEDCOM is spelled GEDCOM – you usually see it in all caps – and it stands for Genealogical Data Communication. And so GEDCOM is an acronym where they’re taking segments of those words and making it into one word. It’s a universal file format just for genealogy database software. Most programs will let you export your data into GEDCOM, and Geni.com is one of those. You can then, once you have it in GEDCOM, import it into another program.

Grant: Yeah. So the idea of GEDCOM is really great, having one file format that everything understands to get your information blanket portable. We at Geni really like the idea of GEDCOM, and for a long time we supported GEDCOM import. We always support GEDCOM export, getting your data out, but the reason why we had to disable is that it was causing a lot of problems. GEDCOM is far from perfect. From my understanding, it’s about 14 or 15 years old, and when you have a file format that is that old, you end up having a lot of problems that just don’t solve with modern software. So what are some of the issues involved with GEDCOM, and why are some of the reasons why it doesn’t work so well?

Thomas: Well you’re right, Grant, it is 14 years old. Unfortunately GEDCOM, the people that had set the standard and last did the update, it really has not kept up with the times. Look at the explosion that we’ve seen in terms of programs, online programs, and software for genealogy. It’s not able to meet the standard: all the features, the bells and whistles in terms of that data. At the surface GEDCOM appears to make the transfer of genealogy data back and forth very easy, but to be honest, you can wind up with a lot of data duplication and data loss. This is what has happened to me personally. 

Here’s an example. Let’s say your database program allows you to create your own life events that are not part of the standard list of life events like birth, death marriage. Let’s say I want to create one for first Holy Communion because my family’s Catholic and that’s one of the sacraments and that’s a life event. So I go ahead and set that up, and then I export that as part of my GEDCOM file.

But guess what? When I go to open it up in another program, what if that program doesn’t recognize the ability to add special events? What’s going to happen to that snippet of data? For some of them, the data’s just simply dropped. It’s not included, not imported.

For others, that data can wind up getting dumped in a certain area like “notes” or “exceptions,” and then you’ve got to go find it and sort it out and reformat it and it’s just more trouble than it’s worth.

Some of the other issues involve multimedia. Some early versions of programs – and some still do now – they let you store your photos and documents, and physically. I mean the data itself, not a link to the file. What happens is, very often that information is not included when you’re doing conversion to GEDCOM format.

And then finally, sources. Part of me doesn’t even want to go there, because I spent a lot of time citing my sources, formatting my sources, and then when you do a GEDCOM export, you’re really taking a chance as to what’s going to happen with those formats. Are they going to get dropped? Are they going to get mapped over to the appropriate fields? So those are mostly the problems right now with GEDCOM.

Grant: We all agree, pretty much everybody is in agreement, that the idea of GEDCOM is a really good idea, but the problem is the implementation. Because it’s so old and hasn’t been updated, the implementation’s not so good. You end up losing information, which is really bad. For somebody who’s a historian, the idea of exporting your data and then something gets lost? It makes my heart swell. So what exactly is being done to try to improve GEDCOM?

Thomas: Well, right now there’re several groups that are working towards improving GEDCOM. One is called Better GEDCOM, and they’re at BetterGEDCOM.wikispaces.com, they’ve got a site there. They’re an independent community working to build a better GEDCOM file specification that will serve 21st-century genealogists. The website is actually built as a knowledge base using the wiki format, and you also have a series of discussions in a forum. 

Several developers and users are trying to develop an agreed-upon standard, a newer standard, for the GEDCOM format. I recommend that you go and visit it if you’re interested. It’s a great place to get the latest news on some of the issues involved with the file format and what the genealogy community is trying to do to fix the problems.

The other one is called the International OpenGen Alliance. They are atwww.OpenGen.org. They’re another community, made up of volunteers, trying to develop a single standard for genealogy data exchange. Now OpenGen uses the Basecamp platform for collaboration, and that requires a membership. You’re going to have to ask for a membership. That platform provides an area for discussions, uploading files, other tools for collaboration.

Now from what I understand, there’s a disagreement between two different groups. I don’t know if it’s these groups, but basically, one wants an open source, really free and open, format for GENCOM, while others don’t mind having one of the big players in the field like FamilySearch really set the standard. We’ve seen this in other tech venues.

Let’s take Adobe Acrobat. They developed the PDF format. It is a proprietary format, but it’s generally become accepted because it was so popular and so good as a standard. So that’s where there seems to be a major disagreement right now. You know, do we have an independent group, or should they align themselves with a powerful vendor who can sort of crowd-source this entire concept? Does that make sense?

Grant: These are both good ideas, it’s a good idea to work on an update of the GEDCOM standard. It’s a good idea. The problem with that is that, one, it moves slow. Any time you’re trying to create a standard – and it’s not just genealogy, it’s any kind of standard – that moves very, very slow. You have to get of a lot of people to agree on something and that’s not easy to do. I’m not particularly fond of the import/export. It’s slow and clunky, you know. As it stands now, you lose a lot of information when you’re trying to do GEDCOM. 

There’s a product that the folks over at Real-Time Collaboration are making. As of this recording, it is not available yet. The beta will be available very soon. It’s called AncestorSync. I’m curious what your thoughts are, and if you want to kind of give it a description about what it is and what it does and how it’s different than just exporting and importing your GEDCOM from one program to another?

Thomas: Sure. This is what I know. In the research that I’ve done on AncestorSync, and knowing that, as you just said, the program itself is really not available. As a techie I would love to get my hands on it and play with it, so I can only go on the information that’s out there, that the community is sharing right now. So AncestorSync is atAncestorSync.com. It’s a new product, and it allows you to synchronize your online family tree with your genealogy database program. So basically, I could have Geni.com as one of the formats that is available or will be available, and I could probably sync it with one of my programs online. 

Right now, as I just said, AncestorSync only supports those certain online programs, but that’s going to expanded in the future. I want to make it clear, AncestorSync is not a file format, so it’s not like GEDCOM. It doesn’t surprise me it’s taken 14 years to update GEDCOM.

As you say, there’s so many people that want input into it, and then there’s some people and vendors that outright want ownership, so I can see where it takes that long.

But AncestorSync is not a file format. It’s basically a utility program, and it synchronizes data, which is different than import-export. Import-export, like you said, takes a long time. If I have, what, 8,000 people in my database and I’m citing all of my sources and I have a lot of events and I have a lot of multimedia, it can take quite a while for me to do an import and export.

A syncing process, like AncestorSync? Think of Dropbox, for those of you that have Dropbox. It would sort of be like Dropbox for a genealogy file. I want what’s at Geni.com to be in sync with what’s in my genealogy program on my computer. You do the initial synchronization, which will probably take longer than anything else. Then when you change data in one, it should synchronize the data in the other program. That’s the whole idea of a program like AncestorSync. So it’s easier to keep both sets of data up to date.

Right now, with GEDCOM, this is common, what happens, and I know I have this frustration with Ancestry’s public family tree. I have a family tree on Ancestry, and I use Family Tree Maker. There’s no way I can update, through GEDCOM, just specific records on Ancestry’s tree. I have to rip the whole thing out and then import the entire GEDCOM all over again.

That’s why AncestorSync is a little bit different. It’s that, eventually, they say you’ll be able to specify even certain events that you or do not want synchronized. Certain types of information that you do or you do not want synchronized. It will be selective, and I think this would be a great advance if it works out.

Grant: Yeah, absolutely. I have played with it quite a bit. For those of you who are interested in finding out more, I am doing a webinar this Friday. That is Friday the 27th at 8:00 PM Eastern Time. If you’re interested at all, please sign up. Go to GeniWebinars.com,GeniWebinars.com, and you’ll be able to see the webinar and sign up and watch. If you can’t make the date, don’t worry, the video will be posted later on. But it’s really great. I’ve had a chance to play with it. I am working very closely with the engineers over at Real-Time Collaboration and that do AncestorSync. It’s a really great program, and they have great granular controls. 

If I said, “I want to sync these generations, I only want to sync this information.” Or if you only want to do a one-time sync. Or if you say, “Hey, I only want to push information from Geni to my Legacy family tree.”

You can have a one-way sync, or you can have a two-way sync. They’ve worked it out. They’ve gone through and figured out, in the file format of each of the platforms that they work with, and they’re doing many, many more. The folks over at Real-Time Collaboration have really done a great job. Coming as someone who’s just had the ability to play with it and look at what they’ve done, it’s really amazing stuff. I’m really excited about it, just as a Geni user.

Thomas: Great, that sounds great. Actually I’m thinking of being a part of that webinar on Friday night, I think that would be a really good way for people to learn more.

Grant: Yeah, absolutely. So if you’re interested in taking a look, definitely come to the webinar and take a look-see. Again, the beta will be available very, very soon. If you just go over to AncestorSync.com. If you preorder – and mind you, guys, it’s really cheap. It’s $10 a year with them. That’s not $10 a month, it’s $10 a year to have your whole family tree synced perfectly back and forth. It’s really great, I’m really stoked about it. So Thomas, how can we find out more about you?

Thomas: Well, this is when I’m… Summer is a busy time for genealogists, and especially genealogy speakers like me. One thing I want to point out, Grant, is we had a very successful Google Docs for genealogists webinar over at Legacy Family Tree last month, or earlier this month, on May 18th. As a follow-up to that, I’m doing one called Google Forms for Genealogists on Wednesday, June 1st. It’s at 1:00 Central. So basically, go to LegacyFamilyTree.com, click on the “Training” button, menu, and go to “Webinars”, and you can register. It is free, and Google Forms are just hot. I used them all the time. When I want a survey, I create in Google Forms. We use them at genealogy societies all the time. 

It’s amazing what you can do and what people are doing with these forms. They’re even creating family group sheets that they can send to family members and fill out online. And it doesn’t make the data public. You can keep that data private. So I’m really exciting about this one, and that’s going to be a great webinar.

Also what I’ve got in the hopper is GeneaBloggers Radio. I am back as host this coming Friday, May 27th, starting at nine o’clock central. That’s at BlogTalkRadio.com/GeneaBloggers.

We’re going to be focusing on military records for Memorial Day weekend. We’ve got Curt Witcher from the Allen County Public Library in to talk about the War of 1812 pension files. We’ve got Jeffrey Vaillant on, talking about Civil War ancestors and Civil War records. It’s going to be a great show, and I’m hoping people can tune in.

Grant: That’s awesome. Thank you very much, Thomas, for your time. So for the Geni Podcast, I’m Grant Brunner, thanks for listening.

 

Share: