Geni consistency & plausibility checker

Started by Private User on Saturday, April 29, 2017
Problem with this page?

Participants:

Profiles Mentioned:

Related Projects:

Showing 1-30 of 503 posts
Private User
4/29/2017 at 10:45 AM

I'm working on a new feature in the SmartCopy extension that will run a consistency & plausibility check automatically against the immediate family of a focus profile on Geni. It will then inject a message box into the Geni page when issues are detected. My intent is to enable it by default and it will not require Curator authorization before working. Here is my in-work example: https://media.geni.com/p13/84/2b/00/31/53444843794dbe13/screen_shot...

If you have ideas for rules (error, warn, info), let me know. :)

4/29/2017 at 12:06 PM

Interesting. Something that I have been waiting for.
Not sure if the guys from MyHeritage can provide their existing list of rules implemented in their desktop app 😀

4/29/2017 at 12:20 PM

I like it. Maybe a link to open up the profile for editing? I think some don't realize they "can" edit ....

Private User
4/29/2017 at 1:11 PM

Um... interesting features will be:
- Expand the consistency to all people being followed
- Stores results and decissions (for example, discard some inconsistencies and being stored, so they do not popup again and again).
- Being able to visualize the complete list of incosistencies
- Marriage is at least after 14 years old (just in case...)
- All places recorded match standard format.
- All dates are recorded correctly (sometimes dates are not correctly stored... I found this)
- Lexical consistency of names (this can be complicated)

4/29/2017 at 2:09 PM

Should it be a separate extension from the SmartCopy extension?
Not having written any Chrome extensions myself, I don't know if that makes it harder or easier, but the user groups and required permissions seem different, so thinking of them as separate extensions may be simpler for people.

Private User "Marriage is at least after 14 years old (just in case...) "

This is probably time and country based.
I have seen earlier profiles where they are much younger.

4/29/2017 at 4:33 PM

Jeff,

Please keep it takes about 9 months for a child to be born, so a father could die before the child is born.

Some other checks:
Child is born after mother has died
2 children within 9 months
Child is born before or after ending of marriage
Child is born before mother is 14 years old
Child is born when mother is older then .. (45?)
Child is born before father is 12 year old
Multiple partners at the same time
Mother has more then .. (20?) children
Age greater then 115

Enough overlap in age for marriage?
Marriage age probably should be time/region depended
Marriage between near family (may be more likely for royalty)

Little more complicated (probably also time/region depended):
Not enough / too many generations without dates between profiles that have dates
Noticing extreme cases may be possible without looking at time/region

Private User
4/29/2017 at 4:36 PM

Dan also mentioned it as a possible separate extension, and maybe if it grows into something that's large enough to be independent, but for now, I'm planning to just have it as part of SmartCopy.

1) SmartCopy has 944 weekly users, according to Google. That's a large user base that would make instant use of consistency checks on Geni, instead of starting over and rebuilding a separate user base.
2) It's easier to develop because I already have the core code in SmartCopy.
3) Maybe the consistency features will bring in new users that will utilize the main features of the extension to build the world tree.

Private User
4/29/2017 at 4:44 PM

Thanks Job - My in-work code is considering the pregnancy term and I'll work to include those additional rules.

Valentín, at this time, I think it's only feasible for this tool to consider the immediate family of a profile. Scanning multiple generations would involve too many queries and be too time consuming for something that is intended to be dynamically integrated into the webpage. Scanning all followed profiles would need something that could be run offline. I also think it would be infeasible to "Stores results and decisions (for example, discard some inconsistencies and being stored, so they do not popup again and again). " I don't have the storage and I don't think it would be good to store it in the browser storage.

Private User
4/29/2017 at 4:56 PM

So far, I've been working with exact dates. Not sure how best to deal with Circa, Before, After, Between. That's going to require a lot more logic.

Private User
4/29/2017 at 6:01 PM

The following are the consistency checks performed by MyHeritage Family Tree Builder.

The missing numbers are conditions that were not encountered during the check of my Geni extraction.

03 – Alive but too old
05 – Parent too young
06 – Parent too old
07 – Child born after death of parent
08 – Fact occurring after death
09 - Fact occurring before birth
12 – Siblings age (too close)
13 – Siblings with same first name
15 – Descendant – Ancestor age mismatch
17 – Large spouse age difference
19 – Married too young
20 – Too young to be spouse
21 – Inconsistent placename spelling
22 – Place name resembles date
25 – Inconsistent last name spelling
28 – Maiden name similar to married name

Maiden name 'Falvey' of Mary Falvey is similar to the last name of her spouse Thomas Falvey

29 – Double spaces in name
32 – Suffix in first name

First name of Gladys I O'Connor ends with the suffix 'I', which should be moved to the separate Suffix field

34 - Alias in first name

First name of Mervyn "Ned" Ellem includes "Ned", which should be moved to the separate Alias field

36 – Siblings have different last names
37 – Incorrect use of uppercase/lowercase

Private User
4/29/2017 at 8:03 PM

Nice! Thanks Private User

Private User
4/30/2017 at 1:03 AM

Private User you can store locally in the user's browser ( I think the class is LocalStorage). User shall be aware that his work is only applicable to one browser, one computer. Doing things smartly he can copy/paste the result.

Or... you can offer the possibility to copy/paste between browser/computers

Private User
4/30/2017 at 1:43 AM

Thumbs up from me!!

4/30/2017 at 2:39 AM

and me too.

4/30/2017 at 7:08 AM

...and from me !

Private User
4/30/2017 at 9:15 AM

"17 – Large spouse age difference "

Hello all, what do you think is a reasonable age difference to compare?

Private User
4/30/2017 at 9:26 AM

I placed an option to close the check with an "X" on the right. In my current version, that just hides it until you refresh or update the profile. As Valentin mentioned, I can use local storage in the browser, but this storage is limited (usually 5-10mb) and could impact performance.

I'm wondering if I should make the close more persistent. So that when you click "X", it will no longer consistency check those profiles. I guess the problem with this though is that if you want to undo that, I have to create some ability to clear it.

What are your thoughts on the temporary or persistent closing of the checks for profiles that indicate issues?

Private User
4/30/2017 at 9:29 AM

Where's my engineer Dan Cornett? :)

4/30/2017 at 10:01 AM

Would these be over-rideable (that's not a word but you must know what I mean)?
For example previous owners of my house visited me and they mentioned that their mother's maiden name was the same as her married name although, to their knowledge they were not related and came from different parts of the country.
Historically there are examples of marriage of and indeed births to a 12 year old girl.
Large spouse age difference too will have considerable variability.

4/30/2017 at 10:09 AM

Two thumbs up Jeff, thanks for the never ending improvements to SmarCopy! This will be a very helpful feature I'm sure.

Private User
4/30/2017 at 10:22 AM

Terry Jackson (Switzer), I am considering making the values something you can change. So if you want to decrease the age warning from 105 to 100, or increase it to 115. I'm not sure that will be available in the first release, but I'm keeping it in mind. It's more a matter of reorganizing the SmartCopy configuration so that you're not lost in various options and can easily distinguish options for copying data from those that control consistency. Initially though, I'd like to figure out some good default values.

4/30/2017 at 10:58 AM

Yes, make the values user adjustable solves the outliers / different cultures, and of course an over ride so the warning doesn't repeat.

Spouse age difference warning 20 years

Private User
4/30/2017 at 4:36 PM

A further extract produced more checks from FTB...

01 - Birth after death
02 - Died too old
04 - Child older than parent
10 - Death date resembles cause of death
11 - Death place resembles cause of death
14 - Descendant older than ancestor
16 - Ancestor of himself

If using FTB ensure you use version 7. Version 8 is full of bugs and is not compatible with Geni.

5/1/2017 at 12:00 PM

Thinking a bit about the UI, if embedded in the SmartCopy extension: (which, I agree, may be a way to 'pull in' more active Tree Builders as well as creating a whole new set of "Tree Quality Auditors") ....

If there are two 'tabs' (Smart Copy vs. Consistency Checks) in the configuration panel, maybe have a top-level 'disable' tick-box for the consistency checks so it can be disabled if one temporarily is having performance issues in accessing the "SmartCopy Host".

Probably something should change on the 'icon' when the consistency checking is enabled / disabled so one doesn't forget they turned it off the day before because of performance issues.

re: persistent "closing" of consistency checking.

That's a tough issue, I think.

-- in the long run, it should be part of the Geni database, similar to a data conflict ("yes, the age is over 125, but that's known based on the best references") ... and maybe probably really could be best as an "acknowledgement" of the potential issue (e.g. "Spouse age difference > X years").

-- barring that for the present time, maybe it should be a user-config option to store "close/acknowledge" info locally? Thus, if I'm on a brain-dead (RAM limited) machine, I might turn of the local persistence, but on another machine I'll turn it on. (Turning off, then back on, could act to "clear it".)

-- version 1 of the over-ride/inhibit probably should start a just the per-profile level, not all the detail items for each profile!

-- the 'possible issues' injection should work on Profile as well as Tree Views, I'd hope.

Hmmm ... for some "reference" pages, it'd be really nice to "flag" some inconsistencies on the Smart-Copy panel itself, so one can perhaps get a better sense of whether the "data-about-to-be-copied" might degrade the quality of the Geni profile(s).

-- these might include items such as "will reduce the detail of dates", "fewer location details fields", etc.

-- Maybe, in the SmartCopy panel, the textual description of the 'potential issue' could be in a "tool-tip" kind of hover popup?

(more later on dates ... got to think about that a bit more ...)

Private User
5/1/2017 at 1:24 PM

I don't expect consistency checks will cause performance issues with the site as it runs asynchronously. If the SmartCopy server is having issues, you just won't see any warnings. It's not a persistent message box. It only becomes visible when an issue is identified in the family.

"the 'possible issues' injection should work on Profile as well as Tree Views, I'd hope." - Yup it does.

5/1/2017 at 1:33 PM

The list of checks is alrady extensive enough that if we run all of them, some of them will be triggered on (I think) at least one out of 50 profiles (for example siblings with different birth names).
This means that we definitely need a way to say "no, this is OK, I've checked the weird stuff, and I don't want to be warned any more".
What's more, we need a way to tell *others* that "this is OK, don't worry". Which means server-side storage.
Checks that rarely trigger falsely (age > 125) are fine to run all the time. But we might want to delay tests that trigger frequrently (siblings with differing last names) until we have server-side storage of "this is checked, don't worry" flags.

5/2/2017 at 7:48 AM

Along the lines of Harald's comments:
I was thinking that the "configuration options" might have a 'radio-button' selection of when-to-run:
-- "run always",
-- "never run",
-- and a third option of either "run delayed" or "run on demand".

The "run on demand" could be akin to the SmartCopy "Submit", in that it runs / displays the results of those evaluations when the user 'clicks' something. That group could (by default) include those items where it's not unusual to have "false positives", whereas the "run always" group could trigger the HTML insertion banner / icon change.

To add an additional layer of complexity (<smile> it's what I do, sometimes, before simplifying!) ...

We'd discussed having user-modifiable customization of the checks (e.g.: age > ZZ years) ... it could even be useful to have two or even three "columns" of those customizing values, where the selection of the appropriate "column" is done based on the birth/death year of the focus profile. In that way, one could have rather "tight" checks for, say post 1700's, somewhat looser checks between then and 600 CE, and even looser checks pre-600's (with the user being able to change those "selection years", of course).

And ... ... the associated 'when-to-run' would also be in each "column".

That way I could have a rule which is always evaluated in the modern era, but that same rule might never be run in "Biblical times" (even when I click the "run-extra checks" button).

Private User
5/4/2017 at 10:37 AM

My limited use of this tool sort of scares me a little bit - I'm finding way more problems than I expected, which I guess is a good thing. Hope to get it out to you all soon.

5/4/2017 at 10:55 AM

Did anyone mention "marriage date = death date" and "spouse death date = spouse death date" ? These are common profile errors.

Showing 1-30 of 503 posts

Create a free account or login to participate in this discussion