0

I'm trying to get a simple example to load using the ETL loader but I must be missing something. I've followed various threads on Stack Overflow and have been going by the documentation on extractors, but I'm coming up short in my attempt.

Here's my data: vertices.csv

label,data,Date
v01,0.1234,2015-01-01 02:30
v02,0.5678,2015-02-20 15:32
v03,0.9012,2015-03-30 11:00

I am setting two JSON files to try and load this into a PLOCAL database:

vertices.json

{
    "config": {
        "log": "debug",
        "fileDirectory": "./",
        "fileName": "vertices.csv"
    }
}

and commonVertices.json

{
    "begin": [ { "let": { "name": "$filePath",  "expression": "$fileDirectory.append($fileName )" } } ],
    "config": { "log": "info" },
    "source": { "file": { "path": "$filePath" } },
    "extractor": { "csv": { "ignoreEmptyLines": true,
                            "nullValue": "N/A",
                            "columnsOnFirstLine": true,
                            "dateFormat": "yyyy-mm-dd HH:MM",
                            "columns": ["label:string","weight:float","Date:datetime"]
                          }
                 },
    "transformers": [
            { "vertex": { "class": "myVertex" } },
            { "code":   { "language": "Javascript", "code": "print('    Current record: ' + record); record;" } }
        ],
    "loader": {
        "orientdb": {
            "dbURL": "plocal:test.orientdb",
            "dbType": "graph",
            "batchCommit": 1000,
            "classes": [ { "name": "myVertex", "extends", "V" } ],
            "indexes": [ { "class": "myVertex", "fields":["label:string","Date:datetime"], "type":"NOTUNIQUE" } ]
        }
    }
}

I'm loading it using the oetl.sh with the command:

$ oetl.sh commonVertices.json vertices.json

The output, with debug information, is here:

> oetl.sh commonVertices.json vertices.json
OrientDB etl v.2.2.7 (build 2.2.x@rdcab5af4dce4b538bdb4b372abba46e3fc9f19b7; 2016-08-11 15:17:33+0000) www.orientdb.com
[csv] INFO column types: {weight=FLOAT, Date=DATETIME, label=STRING}
BEGIN ETL PROCESSOR
[file] INFO Reading from file ./vertices.csv with encoding UTF-8
Started execution with 1 worker threads
[orientdb] DEBUG orientdb: found 9 vertices in class 'null'
[orientdb] DEBUG orientdb: found metadata field 'null'
Start extracting
[csv] DEBUG document={weight:0.1234,Date:null,label:v01}
[csv] DEBUG document={weight:0.5678,Date:null,label:v02}
[1:vertex] DEBUG Transformer input: {weight:0.1234,Date:null,label:v01}
[csv] DEBUG document={weight:0.9012,Date:null,label:v03}
[1:vertex] DEBUG Transformer output: v(myVertex)[#25:3]
[1:code] DEBUG Transformer input: v(myVertex)[#25:3]
    Current record: myVertex#25:3{weight:0.1234,Date:null,label:v01} v1
[1:code] DEBUG executed code=OCommandExecutorScript [text=print('    Current record: ' + record); record;], result=myVertex#25:3{weight:0.1234,Date:null,label:v01} v1
[1:code] DEBUG Transformer output: myVertex#25:3{weight:0.1234,Date:null,label:v01} v1
[2:vertex] DEBUG Transformer input: {weight:0.5678,Date:null,label:v02}
[2:vertex] DEBUG Transformer output: v(myVertex)[#26:3]
[2:code] DEBUG Transformer input: v(myVertex)[#26:3]
    Current record: myVertex#26:3{weight:0.5678,Date:null,label:v02} v1
[2:code] DEBUG executed code=OCommandExecutorScript [text=print('    Current record: ' + record); record;], result=myVertex#26:3{weight:0.5678,Date:null,label:v02} v1
[2:code] DEBUG Transformer output: myVertex#26:3{weight:0.5678,Date:null,label:v02} v1
[3:vertex] DEBUG Transformer input: {weight:0.9012,Date:null,label:v03}
[3:vertex] DEBUG Transformer output: v(myVertex)[#27:3]
[3:code] DEBUG Transformer input: v(myVertex)[#27:3]
    Current record: myVertex#27:3{weight:0.9012,Date:null,label:v03} v1
[3:code] DEBUG executed code=OCommandExecutorScript [text=print('    Current record: ' + record); record;], result=myVertex#27:3{weight:0.9012,Date:null,label:v03} v1
[3:code] DEBUG Transformer output: myVertex#27:3{weight:0.9012,Date:null,label:v03} v1
[orientdb] INFO committing
Pipeline worker done without errors:: true
all items extracted
END ETL PROCESSOR
+ extracted 3 rows (0 rows/sec) - 3 rows -> loaded 3 vertices (0 vertices/sec) Total time: 149ms [0 warnings, 0 errors]

It loads... but the date fields aren't getting populated with any data as shown by this query:

orientdb {db=test.orientdb}> SELECT FROM myVertex

+----+-----+--------+------+----+-----+
|#   |@RID |@CLASS  |weight|Date|label|
+----+-----+--------+------+----+-----+
|0   |#25:0|myVertex|0.1234|    |v01  |
|1   |#26:0|myVertex|0.5678|    |v02  |
|2   |#27:0|myVertex|0.9012|    |v03  |
+----+-----+--------+------+----+-----+

3 item(s) found. Query executed in 0.003 sec(s).

So far, in tinkering around, it seems that the ETL will import dates if you leave the "dateFormat" and "columns" fields out of the commonVertices.json file, but by doing so it may import the DATE but it will not import the time.

I'm a bit stuck on this one, it looks like a bug to me but I'm new with OrientDB so hopefully just a user error that has a simple solution.

As always, any help is greatly appreciated!

TxAG98
  • 1,070
  • 2
  • 10
  • 25
  • Pay attention to the format string ` "dateFormat": "yyyy-mm-dd HH:MM"` , the right format should be: ` "dateFormat": "yyyy-MM-dd HH:mm"`. Take a look at [javadoc](https://docs.oracle.com/javase/8/docs/api/index.html?java/time/package-summary.html) – Roberto Franchini Aug 17 '16 at 14:07
  • Just adding a useful link to another question: [http://stackoverflow.com/a/11046198/2059999](http://stackoverflow.com/a/11046198/2059999) – TxAG98 Mar 09 '17 at 00:10

1 Answers1

1

I have tried with

"extractor": { "csv": { "ignoreEmptyLines": true,
                            "nullValue": "N/A",
                            "columnsOnFirstLine": true,
                            "dateFormat": "yyyy-MM-dd hh:mm"
                          }
                 },

and it worked

enter image description here

Hope it helps.

Alessandro Rota
  • 3,560
  • 1
  • 8
  • 10
  • Thanks! Changing the formatting did work... That said, the [documentation on extractors](http://orientdb.com/docs/2.2.x/Extractor.html) appears to be wrong since their example shows a dateFormat field of "dd-mm-yyyy HH:MM". Interestingly, changing the date format from my example alone didn't make it work. I had to **remove** the `"columns": ["label:string","weight:float","Date:datetime"]` field as well or it wouldn't work. – TxAG98 Aug 17 '16 at 15:38
  • Moreover, if I remove the "columns" line **and** use the date formatting as indicated in the OrientDB documentation ("yyyy-dd-mm HH:MM") it will still load a date... but the dates actually got mangled: v01 gets assigned an incorrect date of "2017-06-01 02:01:00" instead of the correct "2015-01-01 02:30". – TxAG98 Aug 17 '16 at 15:39