GEDCOM's - Calling all GENI users with unworkable GEDCOM's

Started by Judith "Judi" Elaine (McKee) Burns on Wednesday, November 9, 2011

Participants:

Showing 1-30 of 39 posts
11/9/2011 at 12:05 PM

Judi,

I have some problems with a forest gedcom.
Mike Stangel is looking into it for me.
As a result the UTF-8 export problems are now solved, but it seems the previous gedcom imports allowed non UTF-8 data to get into the database. This data is still exported unchanged and will pose problems for gedcom import.

There are also some problems in which profiles are included in the forest export. According to Mike those should be (from the profile you export from):
The blood relatives,the current spouse's blood relatives, the blood relatives of everyone married to either set, plus the spouses of all of the above.

This could mean there are privacy concerns. It is not clear to me if there are profiles missing that should have been included and/or there are profiles present which should not have been included.

If this definition for the selection of a forest export is correct, there could be profiles you manage or added that are not included in this forest export (like family of non-current spouses?).

It seems a Basic user can only do a forest export of his or her own profile. So it could be difficult to get all entered or managed content in a gedcom export.

There are also other problems with the export: the export does not confirm to gedcom standards. This may prevent most program's from reading all data from the gedcom.

Then there is a problem with the gedcom standard itself:
It allows for program defined tags (tags that start with a _ [like _Mar])
This poses problems for the importing program; it may not know about those tags and ignore them or put the values in other fields.

11/9/2011 at 12:49 PM

Judi,

I think you should contact Geni about this.
You might try Mike, he has been very helpful.

11/9/2011 at 5:34 PM

Judi, I'll look into the _MAR fields. The other issue (FAM records that reference individuals not in the file, or INDI records that reference marriages / families that are not in the file) is something we're already working on.

11/10/2011 at 8:32 AM

The _MAR records should not be in an export in the "PAF-compatible" format. We're still working on the other, but in our experience those come up as warnings rather than errors. Your documents and images are still downloadable from Geni.

Private User
11/11/2011 at 9:24 AM

This is a report of experiences restoring a recent Geni GEDCOM Forrest Export to the FamilyTreeBuilder (FTB) stand-alone genealogy program running on my PC.

It was mostly successful but with a few issues which are detailed below.

Overview:
• 1671 profiles from the Geni tree were loaded into the FTB project.

• 733 photos were downloaded directly from the Geni servers. This took 2 min 30 sec to accomplish. (A good 16 Mbps HS Internet connection was available)

• The new FTB Tree looked fine in casual viewing on the PC.

A lengthy and detailed log file (1,147 lines of text in this case) was generated by FTB during the import process. Issues reported were:

• 5 instances of errors due to the use of unstructured dates (e.g. “ABT SEP 1958 TO ABT JUN 1962” were reported. This is a valid error that should be fixed in our tree on the Geni website.

• 17 issues related to missing FAM records in the GEDCOM file which prevented assignment of parents to 17 individuals.

• 2 issues were reported related to FAM records that WERE present, but which referenced IDs for profiles that were found to be missing in the GEDCOM file.

• Most of the data in the logfile (99%) reported the presence of non-standard tags. 365 were found during the import process. This is not really a problem and only due to the inclusion of Geni-specific tags in the GEDCOM (e.g. “3 NOTE {geni:county} Wood”) but which are not comprehended by FTB. These tags were properly ignored and discarded by FTB.

Conclusion: This all really boils down to one serious issue: Some work would be required to clean up the 17 reported instances of missing parental relationships in the new FTB Tree, but it is do-able. Not sure of the cause - improper entry on our part or missing data in the GEDCOM? Need to investigate further. (The parents in those 17 instances ARE properly displayed in the tree on the Geni website. Not missing there!)

In getting ready to post this, I noticed from Mike Strangle’s post above that problems related to the FAM_RECORD are a known issue and being worked on. Good to hear!

11/11/2011 at 10:04 AM

Kenneth, I suspect you'll find that those cases are people right at the boundary of the Forest export, for example you aunt's husband (uncle) is in your forest but his parents are not; I believe we may be including a "FAMC" (family) reference for his parent marriage, even though that parent marriage and the people attached to it are not included in the export. We're working on a fix.

Private User
11/11/2011 at 10:15 AM

I suggest you make this report available for Geni. I would like a copy myself just to see the challenges.

The “ABT SEP 1958 TO ABT JUN 1962” statement must be an old hidden GEDCOM tag attached a profile simply because Geni does not support other options to a date than ABT (about/ca). The statement is as far as I know without checking the standard correct, but a from - to date range combined with about is probably an unexpected combination since is more common to use BETWEEN..

Private User
11/11/2011 at 2:03 PM

The latest GEDCOM standard is 5.5.1 - 5.5 have to many limitations and major parts of Geni's export (like supporting UTF8 - the most common method to encode other characters than a-z,)is not supportedb y 5.5.

11/11/2011 at 2:51 PM

Bjørn,

What other possibilities than a gedcom download are there to get data from Geni?

Private User
11/11/2011 at 3:03 PM

Doing some programming yourself using the API.
http://www.geni.com/platform/developer/help

I am currently working myself on a package I call Desktop tools for Geni.com which might include some tools for GEDCOM - freeware/donationware of course.

11/11/2011 at 3:15 PM

Bjørn,

Thanks a lot. If you are building something, do you have a timetable?
I'm rather busy with other things at the moment, so I cannot spent too much time on Geni at the moment.
I did a little experimenting with a forest download to help me access the profiles: http://jjw0.comze.com/geni/geni-index.htm

Private User
11/11/2011 at 3:26 PM

As the curator job: it is just voluntary and it is made mostly for my self as a challenge. Having too many ideas and too little time to realize them, but parts of that package might be released as separate programs.

11/11/2011 at 3:42 PM

I hope you would release them.
The gedcom export only exports the most basic information and as far as I can determine it could be difficult or impossible to get all the people you manage in an export if you are a basic user.

Private User
11/11/2011 at 3:50 PM

The big problem with GEDCOM is that there is no standard, - everyone are extending the standard by adding their own homebrewed tags, so making a n acceptable GEDCOM export needs to know what program it will be imported into.

11/11/2011 at 4:10 PM

Bjørn,

Most programs will ignore tags they do not know about or convert them to text fields. By replacing those tags with tags the destination program knows about, you can import more data.

But fields that are not in the export can never be imported.
An export to XML may be useful. That could perhaps be translated to different gedcom dialects more easily.

There is a Dutch program that can read a number of gedcom dialects and translate between them.
This is the download link: http://www.aldfaer.net/sitemap//index.php?q=webfm_send/132
This is for the documentation (in Dutch): http://www.aldfaer.net/sitemap//index.php?q=webfm_send/41
I think the program also needs a version of the .NET framework

The Geni forest gedcoms are rather big, so I do not known if they can be handled by the program.

If there is interest I can contact the maker and see if he is willing and has time to adjust the program to read Geni gedcoms.
But I think Geni should fix some issues with the export first.

Private User
11/11/2011 at 7:00 PM

Mike Stangel – I have only had time so far to manually track through two of the “Indexes Referenced But Not Defined” issues reported for the import of a Geni GEDCOM file into the FamilyTreeBuilder program mentioned in my earlier post.

The first instance seemed to fit the scenario you suggested, but the second didn’t. In the 2nd instance, the orphaned child’s FAMC tag references a FAM record that does not exist in the GEDCOM file. That of course could be caused by being right at the edge of the forest extent as you suggested.

But, in this case the child’s parents ARE included in the GEDCOM, and they DO have a FAM record, but that FAM record only includes HUSB and WIFE tags. There is no CHIL tag referencing the child as there should be. Thus the child cannot be assigned to his proper parents.

I’ll continue to work through these as time permits.

BTW, regarding “bumping the edge of the forest export” scenario you suggested, which might also be present here – I don’t have a good understanding of what the forest extent is. I thought if a forest GEDCOM was selected via Family>Share Your Tree> GEDCOM Export that it would include the entire viewable tree up to a max of 100,000 profiles. Is that not true? Is it different for PRO vs Non-PRO users?

11/12/2011 at 1:45 AM

According to Mike this is what is included in the export:

The forest is best visualized like this: http://i234.photobucket.com/albums/ee318/geniteam/forest.jpg

That's your blood relatives, your current spouse's blood relatives, the blood relatives of everyone married to either set, plus the spouses of all of the above.

This seems to exclude the tree of ex-partners.

11/14/2011 at 5:04 PM

Judi,

A tag starting with a _ that is not recognized should be ignored by the importing program or should be put in some text field (for most programs that is an option)
gedcom files are just plain text files, so you can edit them and you might be able to replace _MAR with a tag that is recognised.
That should make it possible to import the gedcom if there are not other issues.
It could be that you have issues with non UTF-8 characters (like I had)
If you are lucky the importing program will complain and tell you where you could find the error. You could then replace the offending character(s) and try and see if it will import then.
Missing profiles could be an error in the gedcom export, but it could also be that those profiles are outside of the forest.
(That's your blood relatives, your current spouse's blood relatives, the blood relatives of everyone married to either set, plus the spouses of all of the above [Calculated from the exported profile])
If that is the problem you would have to ask a PRO user to make an export from another profile in the tree(s) those persons are in.

11/14/2011 at 5:25 PM

No need to edit for the _MAR tags, if you request the "PAF-compatible gedcom" then we'll leave them out.

I've checked in a fix for the boundary bleeding, and expect to release it later this week.

11/14/2011 at 6:04 PM

Judi,

I think you should have a bit more patience and wait for Mike's fix later this week and then try again.

Private User
1/9/2012 at 9:58 AM

Earlier in this thread I posted regarding my experience of restoring a GENI GEDCOM forest backup to a Desktop PC Genealogy program in order to ascertain the viability of the backup. The import was generally successful, but 17 records of the 1671 total were reported as issues, later determined to be “missing FAM record” issues.

Mike Stangel commented that that was a known problem related to Forrest Export Boundary extent bleed-over and was being worked on.

I’m happy to report that in my latest experience of restoring to the Desktop PC program, this issue now appears to be fixed. Our tree now contains 2110 persons and all records were properly imported in today’s restore (from a Dec 19, 2011 GEDCOM backup).

To double check, I have a program that checks a GEDCOM backup file for orphan situations like this, and while it found the 17 in the November time-frame backup, it found none in the more recent Dec 19th backup.

Good job Geni team. Thanks!

1/10/2012 at 1:04 PM

Thanks for the report, Kenneth!

1/10/2012 at 3:46 PM

Michael Stangel

Tanks for trying the correct the export.

The last forest export I downloaded on 12 December still had some issues with the consistency. Where there any changes after that?

There where also issues with the data (non utf-8 data while the export format was utf-8). You stated this was due to gedcom import of data with non utf-8 data that where exported unchanged. Some programs have problems importing data from a gedcom with this kind of data and most users will not be able to fix this. So I'm still hoping that you will adjust the data on export when it does not confirm to the export format.

1/11/2012 at 10:42 AM

I don't believe we've released any GEDCOM export changes since 12 December. We're still working on detecting the invalid UTF-8 characters, but even once we have a reliable way to detect them, I'm not sure what we could do other than just blank out the offending characters.

1/11/2012 at 3:01 PM

Mike,

That would help a lot.
Some programs will not import any data from a gedcom with invalid characters in the gedcom. For non technical users this is very frustrating because they do not know how to correct the gedcom.

2/16/2012 at 10:47 PM

Judi,

A gedcom can be edited by hand (you could even use notepad for it).
From my own experience I would guess it has non utf-8 data in it.
You could try an editor like notepad++t (http://notepad-plus-plus.org/) that can save into utf-8 and change the _Mar into _MARR it with that and then save it to utf-8 (do not overwrite your original file).

If that does not help you prboably need help from Mike Strangel.

2/17/2012 at 3:09 PM

Judy,

There are may dialects of the GEDCOM standard.
any tag that starts with an _ is a dialect and might not be understood by other programs.

I would think most programs should be able to import data with
0 @...@ FAM
1 MARR
...
1 HUSB @...@
1 WIFE @...@

If there are spaces before the level indicator you can try to remove them for 1 family and see if that helps
some programs may have problems with space before the level indicator (the 0 and 1 in the example above)

The file should have MARR and not _MARR and some programs may be case sensitive (MARR should be in uppercase)

6/30/2012 at 6:52 AM

Any recent progress to report on this key issue?

I am having some success with a new concept in handling Gedcoms called Beyond. Checkout http://www.beholdgenealogy.com

Private User
6/30/2012 at 11:10 AM

I'm exporting the gedcoms to Family Tree Builder and the marriage details are intact. I had a few exceptions where the place name (PLAC tag) for the marriage was too long.

While the _MAR tag is present in the file (and not correct), it is repeated correctly with the MARR tag in the relationships section near the end of the file.

FTB just ignores the invalid _MAR tag and picks up the correct MARR tag.

Private User
6/30/2012 at 11:24 AM

An example of a marriage relationship near the end of the file...

0 @F6000000014400866904@ FAM
1 MARR
2 DATE ABT 1876
2 PLAC District of Penzance, Cornwall, England
2 ADDR
3 CITY District of Penzance
3 STAE Cornwall
3 CTRY England
1 HUSB @I6000000014396093862@

Geni created them and FTB accepted them - no problems

Showing 1-30 of 39 posts

Create a free account or login to participate in this discussion