A Geni crash - What if the worst happens?

Started by Private User on Thursday, October 27, 2011

Participants:

Showing 1-30 of 115 posts
Private User
10/27/2011 at 10:31 AM

On 10/24/2011, Private User, asked

"What would we all do if Geni.com disappeared tomorrow without trace? Now I know and accept the Geni.com has x number of backups, etc. etc. And yes we saw the amazing recovery from the big crash etc. etc. But disaster recovery planning is all about what to do in case of unforeseen disaster.

So here's my scenario. Geni.com disappears without trace. We can't access its back-ups, etc.

What do we the users do?

How would we piece together our Gedcoms? Our photos? Our documents? Our timelines? Our projects? Our discussions? Our privacy? Would we form a collective to create new data centres? How would we access Fay's packets of paper?

How would we even communicate with each other? How many of us have each other's outside-Geni email addresses or phone numbers?

Where would we even congregate to start a recovery? Facebook? LinkedIn? Would we be able to organise ourselves to "buy out" the assets if that were a possibility to get us back up?

Any thoughts? What's our plan guys?"

--
To see the original, and all the responses there, go to: http://www.geni.com/discussions/98590?msg=749531 - this is in Are Geni's Goals Compatible - Or Incompatible? - and continue reading for a good bit of response and commentary.

And then there are additional comments, discussions, suggestions in Geni Pro Just Got A Whole Lot Better, starting (at least, one round does) with http://www.geni.com/discussions/99057?msg=750073

Private User
10/27/2011 at 11:11 AM

Discussing WeRelate.org, Private User said (original at: http://www.geni.com/discussions/99067?msg=746421 ):

"WeRelate has many better possibilities and tools than Geni, among others an excellent gedcom upload and a much better presentation of the data on a profile and a much better merge tool, but unfortunately is WeRelate a small site with about 2 million profiles. The site is free without charges and democratic !"

Folks might want to check it out as another sort of back-up.

10/27/2011 at 2:08 PM

There seem to be some problems in gedcom export

The gedcom seems to be able to contain characters not allowed in the UTF-8 character set.
It is inconsistent.
Mike Stangel is investigating these.

The gedcom export is incomplete:
The gedcom export only exports the links to photo's and other documents.I'm not sure all links can easily used to batch download all the information (Also one would need a program to extract the links from the gedcom or use a good text editor to do that).

The gedcom does not seem to export data from the sources tab.
I would not be surprised if it did not export data from time-line or discussions linked to the profile.

Information about the profile manager(s), followers and revisions is not exported nor are the guestbook, statistics, related projects or the on the web part of the contact information.


Some output will need to be translated to other gedcom dialects to be imported correctly. An example:
1 NOTE {geni:occupation} Milkman
will not be imported to an occupation field in other programs without some editing of the gedcom file.

Private User
10/27/2011 at 11:11 PM

Job Waterreus, - UTF-8 is not a character set, it is a method to encode Unicode where the goal is to cover all alphabets in the world, and it does mostly too.

Could you give examples where the encoding is wrong and why?

The problem is usually that the import program does not support Unicode fully by converting UTF-8 to the very limited 8-Bit ANSEL (American National Standard for Extended Latin Alphabet Coded Character Set for Bibliographic Use) or just ignoring the CHAR instruction in the header which in Geni exports are correct according to the standard.

http://homepages.rootsweb.ancestry.com/~pmcbride/gedcom/55gctoc.htm

When it comes to GEDCOM: GEDCOM is a plain text file and can can of course not include images or other binary data, but the references to the online data is correct, so it is up to the import program to use it.
Other Geni-internal values like managers, revisions and so on are not supported by the GEDCOM standard either, so the general problem is GEDCOM.

In general: GEDCOM is a stone-age standard that does not cover modern ways of representing data, but so far there is no replacements.

I am planning to implement a program that postprosess a GEDCOM to download and replace image link since making an efficient download is better done using the API. Links to external web pages can however not be replaced, so it is still up to the program to make them useful within their program.

You are however correct that some items like occupation is tagged wrong since GEDCOM have an OCCU {OCCUPATION} tag that should be used and there are some limited source references too.

Private User
10/27/2011 at 11:20 PM

Sorry, linked to an old version if the GEDCOM standard.
The latest version is 5.5.1, - you find a lot links to it on the Internet, for example: http://www.phpgedview.net/ged551-5.pdf

10/27/2011 at 11:26 PM

Bjørn,

I have to go to work, will answer tonight.
Mike Stangel is investigating the problems.

10/28/2011 at 4:57 AM

Bjørn said: "When it comes to GEDCOM: GEDCOM is a plain text file and can can of course not include images or other binary data, but the references to the online data is correct, so it is up to the import program to use it.
Other Geni-internal values like managers, revisions and so on are not supported by the GEDCOM standard either, so the general problem is GEDCOM.

In general: GEDCOM is a stone-age standard that does not cover modern ways of representing data, but so far there is no replacements."

And there you have the main reason for gedcom not being a good backup. It's as simple as that.

Private User
10/28/2011 at 5:21 AM

I use another internet-place where I am write and add pictures in case Geni do not work anymore: http://genebase.com and takes backup on cd.

Private User
10/28/2011 at 10:43 AM

Notere: GEDCOM - I have looked with Word Processor - for info from Timeline Events in the Gedcom - found for auto-generated events, including description that was added - but NOT for the manually added events - zip for those manually added events I looked for.

10/28/2011 at 12:47 PM

It seems it is possible to encode multi-media information in a gedcom
http://homepages.rootsweb.ancestry.com/~pmcbride/gedcom/55gcappe.htm
There will probably be very few programs that can use that.

It is also possible to use user defined keywords (Tags)

tag :=
A tag consists of a variable length sequence of alphanum characters. All user-defined tags, tags used that have not been defined in the GEDCOM standard, must begin with an underscore character (0x5F).

So it is possible to export any data, but the importing program will not know about Geni tags, so a special program is needed to read that information or the gedcom must be converted to another dialect before the info for the user defined tags can be used.
As an example: Geni used the tag _EMAIL for storing the e-mail address of the user

10/28/2011 at 12:57 PM

an example of the problematic data in the gedcom file:
1 NAME à/Sheeran/

With the past action it was probably translated
it came from this profile: 6000000003947938246
tree view: http://www.geni.com/family-tree/index/6000000003947938246

Private User
10/28/2011 at 1:02 PM

The profile you link to is private.
Is the à using a combining accent or is is using the real code for it?
I have reported errors on combining accents once.

10/28/2011 at 1:10 PM

the hex value is C3 I think

Private User
10/28/2011 at 1:44 PM

The profile name is starting with Ó, which correct encoded using UTF-8 is encoded as C3 93, so if you observe à you are using a text viewer not reading it as UTF-8.

Private User
10/28/2011 at 1:54 PM

GEDCOM is not pure teXt. One thing the GEDCOM exporter does do correctly is to attach photos and documents. I've used it successfully.
Pity about the timeline events.

As myheritage.com keeps a copy of your tree on both your PC and their site, I'm going with it. The version on your PC comes with software, so I've copied it all to a DVD. I now have a self-contained family tree. They have the private/public similar to Geni.

Private User
10/28/2011 at 2:07 PM

But... You might have found an error here.

A stupid/classical error made by programmers not used to other letters than a-z...

You listed the GEDCOM name as à/Sheeran/ - telling me that Geni have replaced the first name Órán with an initial because it is a private profile.

If this replacement is done AFTER encoding it to an UTF-8 stream of bytes and picking only the first byte instead of BEFORE, you would end up with a broken UTF-8 sequence which also explains why the period and the space after the à (C8) is gone in your viewer, probably because of a standard way to get in sync after detecting the error.
Michael Stangel?

Private User
10/28/2011 at 2:16 PM

If Geni fixed the problem with timeline events in the exporter, I'd have everything I wanted. Managers and revisions are specific to Geni - revisions don't always work, so why back them up? managers have been superceded by the new 'a Pro can do anything' approach.

Private User
10/28/2011 at 2:17 PM

Ken, GEDCOM IS pure text, but as I said: Programs following the standard are able to use the links to download the linked documents and images.

The problem doing that is however that these programs will fail if trying to access data that require a login or access code to Geni to be able to download them and that they don't have a recovery option if they hit the Geni rate limit on more than 40 requests within 10 seconds.

Private User
10/28/2011 at 2:28 PM

In which case, myheritage.com and geni.com follow the same conventions. They have interpreted everything that is in the Geni file correctly. I done a reasonably extensive check. If it's not on my PC, it's because geni didn't put it in the gedcom file - timeline events.

I'd recommend that you give it a try - it's free. As has been mentioned, geni and the other site/viewer need to understand each other's conventions. I've found that in this case, they do.

Private User
10/28/2011 at 2:29 PM

i Done? oops.

10/28/2011 at 2:39 PM

Bjørn,

That it display here as 1 NAME à/Sheeran/ does not mean it show like that in the program I use (that display a colour coded hex value), but doing a copy and past here it becomes what you see here.

Mike has the gedcom file so should be able to check.

Private User
10/28/2011 at 2:53 PM

You've made a very important point, which I didn't realise. The GEDCOM file by itself doesn't contain everything (photos), but if you let the other site expand the contents, it does.

Therefore, if Geni collapses any GEDCOM file would be incomplete unless you've already let the other site resolve the links. I'm unintentionally doing this.

Private User
10/28/2011 at 3:06 PM

Sorry, - two parallel discussions here, not quite in sync.

Jon: As I said: You have found an error and I have given a probably explanation on it, and I guess Mike will confirm it. The shortened name of Órán is done on the UTF-8 sequence and not picking the first Unicode character. Mike: Be also aware of combining accents which means that you have to pick more than the first Unicode value to get it correct.

Ken, you are correct and as Remi said: GEDCOM is not a backup format, - we should all follow his recommendation on using our own PC as the main research tool and just using Geni to display your findings.

Private User
10/28/2011 at 3:14 PM

Bjorn. Not one of my profiles was all my own work - it was very much a team effort. This was why we were using Geni as the central repository. Having a backup of just my work doesn't help much. However, we can achieve this with the other site.

10/28/2011 at 3:25 PM

I can confirm both the tree structure errors Job has reported, as well as the multi-byte truncation problem on the first initial. We'll get both fixed, though I cannot commit to a timeframe just yet.

Private User
10/28/2011 at 3:50 PM

By the way: flagging a private profile by making an initial of the first name - does it get correct in all languages/cultures?

How is Chinese, Japanese, Russian etc names truncated?

10/28/2011 at 3:54 PM

Nice to know, Mike, and it's good that errors get identified and solved.

Ken: Not all of the information in my profiles or all the profiles I have access to is mine or my information either, but I add it to my personal genealogical software with either the source information that acompanies the information or as the profile manager/Geni as my source. I then use this information to look up primary sources if possible. I am getting as much info as possible into my own genealogical software, because there it will never disappear, since I'm doing multiple backups at multiple sites automatically and manually.

Then the team effort will also be stored in my personal files on my personal computers.

It's good to know that MyHeritage use both an online and an offline program that can talk to each other and exchange data. It probably only works within MyHeritage own software, what if I want to use other genealogical software?

Private User
10/28/2011 at 4:07 PM

Remi. All sites/software have that problem (GEDCOM was meant to be the solution), but MH permits sharing data without putting your own at risk. You've got the option to pull the data into your tree (just by a click) unlike Geni who will merge, whether you like it or not. The 'same' tree can have variations.

10/28/2011 at 4:23 PM

To show an example of what can be done with a gedcom back-up and Genealogica Grafica (free software) and a little editing by me
(sorry Dutch version) http://jjw0.comze.com/geni/GGframesPC.htm

10/28/2011 at 4:34 PM

Mike,

Thanks for investigating.

Showing 1-30 of 115 posts

Create a free account or login to participate in this discussion