5

I am a Hobby Xojo-User. I wanna import a Gedcom-File to my Program, espacially to a SQLite-Database.

Structure of the Database

Tables

Persons

 - ID: Integer
 - Gender: Varchar // M, F or U
 - Surname: Varchar
 - Givenname: Varchar

Relationships

 - ID: Integer
 - Husband: Integer
 - Wife: Integer

Children

 - ID: Integer
 - PersonID: Integer
 - FamilyID: Integer
 - Order: Integer

PersonEvents

 - ID: Integer
 - PersonID: Integer
 - EventType: Varchar // e.g. BIRT, DEAT, BURI, CHR
 - Date: Varchar
 - Description: Varchar
 - Order: Integer

RelationshipEvents

 - ID: Integer
 - RelationshipID: Integer
 - EventType: Varchar // e.g. MARR, DIV, DIVF
 - Date: Varchar
 - Description: Integer
 - Order: Integer

I wrote a working Gedcom-Line-Parser. He splits a single Gedcomline into:

 - Level As Integer
 - Reference As String // optional
 - Tag As String
 - Value As String // optional

I load the Gedcom-File via TextInputStream (working fine). No i need to parse every Line.

Gedcom-Individual-Sample

0 @I1@ INDI
1 NAME George /Clooney/
2 GIVN George
2 SURN Clooney
1 BIRT
2 DATE 6 MAY 1961
2 PLAC Lexington, Fayette County, Kentucky, USA

You'll see, the Level-Numbers shows us a "Tree-Structure". So i thought it would be the best and simplest way to parse the File into separated Objects (PersonObj, RelationshipObj, EventObj etc.) into a JSONItem, because there its easy to get the Childs of a Node. Later on, i can simple read the Nodes, Child-Nodes to create the Database-Entries. But i don't know how to create such an Algorithm.

Can anyone help my please?

Genealogy
  • 71
  • 9
  • 1
    As this question is rather complex and individual and probably requires some back-and-forth discussion, I think this is better asked in the Xojo forum. – Thomas Tempelmann Sep 01 '15 at 08:29
  • Hi Thomas Tempelmann ;) I done this many times before, but looks like, no one is really interested in this area :/ Thats why i'm asking here. Maybe you can gave me some more input? – Genealogy Sep 01 '15 at 14:58
  • I don't even understand where your difficulties are. One one hand, you can describe the algorithm, yet you say you cannot code it. There are so many things you may or may not know, and it's a bit too much to ask that I assume you don't know anything and I provide a complete solution here, spending maybe half an hour on it all. – Thomas Tempelmann Sep 02 '15 at 10:39
  • Ok, i spent the whole night to write and discribe a structure and a simple Parser. It's working fine But i think, if there will be a Gedcom File with 10000 or more Lines, the App will freeze. I'll informe You! – Genealogy Sep 02 '15 at 10:59
  • Yes, if you have something working and need ideas to optimize or fix it, this is where people are more willing to help as they can see what you've done exactly. – Thomas Tempelmann Sep 03 '15 at 11:17

1 Answers1

3

To parse the Gedcom lines with a good speed, try these ideas:

Read the entire file into a String and split the lines up:

dim f as FolderItem = ...
dim fileContent as String = TextInputStream.Open(f).ReadAll
fileContent = fileContent.DefineEncoding (Encodings.WindowsLatin1)
dim lines() as String = ReplaceLineEndings(fileContent,EndOfLine).Split(EndOfLine)

Parse every line using RegEx to extract its 3 columns

dim re as new RegEx
re.SearchPattern = "^(\d+) ([^ ]+)(.*)$"
for each line as String in lines
  dim rm as RegExMatch = re.Search (line)
  if rm = nil then
    // nothing found in this line. Is this correct?
    break
    continue // -> onward with next line
  end
  dim level as Integer = rm.SubExpressionString(1).Val
  dim code as String = rm.SubExpressionString(2)
  dim value as String = rm.SubExpressionString(3).Trim
  ... process the level, code and value
next

The RegEx search pattern means that it looks for the start of the line ("^"), then for one or more digits ("\d"), a blank, one or more non-blank chars ("[^ ]"), and finally any more chars (".") before the end of the string ("$"). The parentheses around each of these groups is for extracting their results with SubExpression() then.

The check for rm = nil hits whenever the line does not contain at least a number, a blank and at least one more character. If the Gedcom file is malformed or has blank lines, this may be the case.

Hope this helps.

Thomas Tempelmann
  • 11,045
  • 8
  • 74
  • 149
  • Thank you Thomas, I used RegEx exactly that way before. ;) I will go on with my parser and if there will be some more Questions, i'll let you know that. Sure, the discussion is not at the point of the end... :D – Genealogy Sep 03 '15 at 12:00
  • If my answer is helpful, don't forget to upvote it, please. That's how you give others point and how you'll eventually get some points here, too. – Thomas Tempelmann Sep 03 '15 at 16:57
  • @ThomasTempelmann hello I am a little late but is there a way to do this in reverse I have a DB and want to make it int a Gedcom file. – Flow Oct 09 '22 at 04:29