Wild idea: Checksum GEDCOMs?

Started by Harald Tveit Alvestrand on Tuesday, September 7, 2010
9/7/2010 at 7:49 PM

I woke up this morning to see that another couple of GEDCOM uploads had hit my particular area of the tree.

After merging for a while, I'm pretty sure I've seen this before - it's a GEDCOM that I've already merged; the spelling mistakes and odd choices are all in the same places.

Wouldn't it be wonderful if Geni could checksum GEDCOM files (or parts of GEDCOM files, possibly after throwing away information that Geni can't use anyway, like the specific IDs of nodes in the tree), and report back "I threw away this branch of your upload and connected it somewhere - this has been uploaded and integrated before"?

Yes, I have been merging 5000 profiles from one GEDCOM upload and it takes so much time that I could have used a lot better in here!

9/8/2010 at 8:11 AM

Totally agree. Using checksums is a bit shaky though, because they often rely on having the same information in the files. Most user I know of get their GEDCOMs from external sources and add their own info using genealogy software before throwing the GEDCOM into Geni and other online collaboration sites. That way, it is almost impossible to rely on checksums for merging if one doesn't separate individuals.

However, having a better interface for merging - like merging obvious candidates automatically (typically where some important fields do match exactly, including ancestors' names and such) - is welcome.

9/8/2010 at 10:54 AM

Oh, I wish there was no Gedcom-import possibility at all. It only makes a mess and we have enough of that already.

9/8/2010 at 11:11 AM

Yet my experience with introducing new people to Geni who've already dabbled in any sort of computer-based genealogy is "how do I upload my GEDCOM".... people *hate* having to retype their own work, and for the stuff they did themselves, I can't blame them. It's the fact that their export will also include the random GEDCOM information *they* imported that creates the mess.

Sigh. No perfect solutions.

Geni is working on creating mechanisms to filter gedcom files that are uploaded. Of course this is a HARD problem to solve. We curators were just discussing this with Noah.

