Evaluation of Metadata Representations in RDF stores

Preface:

This is the experiment description website for the paper under review “Evaluation of Metadata Representations in RDF stores” submitted for the Semantic Web Journal special issue on “Benchmarking Linked Data”. It provides links to tools and datasets, as well as some additional information about the results and the benchmark execution procedures, in order to allow verification and reproduction of this work.

Links and Tools:

Original Wikidata Experiment by Hernandez et al.

http://users.dcc.uchile.cl/~dhernand/wquery/

Benchmark execution and data transformation (wikidata) framework

https://github.com/JJ-Author/wikibase-bench

MRM Query transformation tool (used for DBpedia queries)

https://github.com/JJ-Author/meta-sparql

MRM data transformation tool (used for DBpedia datasets)

https://github.com/JJ-Author/meta-rdf/tree/sdw-dbpedia 

DBpedia Historique Extractor adaption (used to generate revision metadata)

https://github.com/JJ-Author/Historic

DBpedia meta-rdf input format transformation (aggregated metadata and data)

https://github.com/JJ-Author/dbpedia-revision-meta-convert

Datasets:

Wikidata dataset:

The datasets file can be downloaded from the original Wikidata experiment (https://dx.doi.org/10.6084/m9.figshare.3208498.v1)

DBpedia dataset:

The final dataset files can be downloaded here. For every MRM a version from the English and German Chapter exists. The experiments were run using both language versions.

Every MRM links to the same revision metadata, which need to be loaded as well.

DBpedia mini dataset:

The final dataset used for dataset sizes study can be downloaded here.

Queries:

Wikidata Quins Queries:

We reused the queries from the original Experiment but added support for the queries using a FILTER EXIST statement instead of a triple pattern (f(o)ngraphs,f(o)rdr

DBpedia Queries:

meta-rdf query translation framework

In order to translate the queries, we developed a generic tool for rewriting SPARQL queries for different MRMs.

The idea is that triple patterns within a query are replaced by special annotation, which will be translated into the appropriate format.

It consists of the following annotations:

                          

#!data(?s,?p,?o)!#

accessing a regular data triple (needed for regular data queries)                         

#!reif(?id,?s,?p,?o)!#

analogous to #!data but retrieving statement id as well

#meta(?id,?k,?v)!#

retrieve metadata key and value given statement id

#meta2(?id,?k,?v)!#

retrieve metadata key and value, which is reified itself (due to meta-metadata), given a statement id

|x|

x denotes a template variable, which gets replaced by a specific constant to derive query instances from the template

to illustrate the different behaviour, an example translation of the annotations into standard reification MRM is shown below:

        

#!data(?s,?p,?o)!#

?dummyVar_0 a <http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement>;
<http://www.w3.org/1999/02/22-rdf-syntax-ns#subject>         ?s;
<http://www.w3.org/1999/02/22-rdf-syntax-ns#predicate>         ?p;
<http://www.w3.org/1999/02/22-rdf-syntax-ns#object>                 ?o.

#!reif(?id,?s,?p,?o)!#                

?id a <http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement>;
<http://www.w3.org/1999/02/22-rdf-syntax-ns#subject>         ?s;
<http://www.w3.org/1999/02/22-rdf-syntax-ns#predicate>         ?p;
<http://www.w3.org/1999/02/22-rdf-syntax-ns#object>                 ?o.        

#meta(?id,?k,?v)!#

?id <http://sdw.aksw.org/metardf/hasSharedMeta> ?shared_2 .?shared_2 ?k ?v .

#meta2(?id,?k,?v)!#

?id <http://sdw.aksw.org/metardf/hasSharedMeta> ?shared_3 .?dummyVar_3 a <http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement>;
<http://www.w3.org/1999/02/22-rdf-syntax-ns#subject>         ?shared_3;
<http://www.w3.org/1999/02/22-rdf-syntax-ns#predicate>         ?k;
<http://www.w3.org/1999/02/22-rdf-syntax-ns#object>                 ?v.

DATA queries

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%    DBQ-SIM-01

SELECT ?p ?o

WHERE {    

        #!data(<|person|>,?p,?o)!#

}

LIMIT 1000

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%    DBQ-SIM-02

PREFIX dbo: <http://dbpedia.org/ontology/>

SELECT DISTINCT ?city ?pop

WHERE {    

        #!data(?city,dbo:populationTotal,?pop)!#

        #!data(?city,dbo:country,<|country|>)!#

        FILTER(?pop>20000||?pop>"20000")

}

LIMIT 10000

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%    DBQ-MED-01

PREFIX owl: <http://www.w3.org/2002/07/owl#>

SELECT ?p ?o

WHERE {    

        #!data(<|e_en|>,owl:sameAs,?e_de)!#

          #!data(<|e_en|>,?p,?o)!#

          FILTER EXISTS { #!data(?e_de,?p,?o)!# }

          FILTER(?e_de!=<|e_en|>).

}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%    DBQ-MED-02

PREFIX dbo: <http://dbpedia.org/ontology/>

SELECT (count(distinct ?company) as ?c)

WHERE {    

        #!data(?company,dbo:locationCountry,<|country|>)!#

        #!data(?company,rdf:type,dbo:Company)!#

}

## %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%    DBQ-HAR-01 => original query. not used anymore

PREFIX dbo: <http://dbpedia.org/ontology/>

SELECT  DISTINCT ?company1

WHERE { {

                #!data(?company1,rdf:type,dbo:Company)!#

                #!data(?company2,rdf:type,dbo:Company)!#

                #!data(?company1,dbo:industry,<|sector|>)!#

                #!data(?company2,dbo:industry,<|sector|>)!#

        } OPTIONAL{

                #!data(?company1,rdfs:label,?label1)!#

                #!data(?company2,rdfs:label,?label2)!#

               } OPTIONAL{

                #!data(?company1,dbo:locationCity,?city1)!#

                #!data(?company2,dbo:locationCity,?city2)!#

               } OPTIONAL{

                #!data(?company1,dbo:locationCountry,?country1)!#

                #!data(?company2,dbo:locationCountry,?country2)!#

               }

                FILTER(?company1!=?company2 && ( (?label1=?label2 && STRLEN(?label1)>3) || ?city1=?city2 || ?country1=?country2 )   )  

}

## %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%    DBQ-HAR-01 => rewritten query due to stardog issues

PREFIX dbo: <http://dbpedia.org/ontology/>

SELECT  DISTINCT ?company1

WHERE {

                #!data(?company1,rdf:type,dbo:Company)!#

                #!data(?company2,rdf:type,dbo:Company)!#

                #!data(?company1,dbo:industry,<|sector|>)!#

                #!data(?company2,dbo:industry,<|sector|>)!#

          OPTIONAL{

                #!data(?company1,rdfs:label,?label1)!#

        } OPTIONAL{

                #!data(?company2,rdfs:label,?label2)!#

               } OPTIONAL{

                #!data(?company1,dbo:locationCity,?city1)!#

        } OPTIONAL{

                #!data(?company2,dbo:locationCity,?city2)!#

               } OPTIONAL{

                #!data(?company1,dbo:locationCountry,?country1)!#

        } OPTIONAL{

                #!data(?company2,dbo:locationCountry,?country2)!#

               }  

               FILTER (?company1!=?company2)

        FILTER ( (?label1=?label2     && STRLEN(?label1)>3 )  ||

                 (?city1=?city2       && bound(?city1)     )  ||

                 (?country1=?country2 && bound(?country1)  )      )             

}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%    DBQ-HAR-02

PREFIX dbo: <http://dbpedia.org/ontology/>

PREFIX owl: <http://www.w3.org/2002/07/owl#>

SELECT  DISTINCT ?person

WHERE {

        {#!data(?person,rdf:type,owl:Thing)!#}

        #OPTIONAL

        {#!data(?person,?p,?place)!# #!data(?place,rdf:type,dbo:Place)!# }

        OPTIONAL

        {#!data(?place,owl:sameAs,?place2)!#}

        FILTER (         (bound(?place2) &&        EXISTS{#!data(?place2,dbo:isPartOf,<|region|>)!#} )

                         ||                                         EXISTS{#!data(?place,dbo:isPartOf,<|region|>)!#}         )

}

## %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%    DBQ-HAR-02 ⇒ original query, not used because of different results sets

PREFIX dbo: <http://dbpedia.org/ontology/>

PREFIX owl: <http://www.w3.org/2002/07/owl#>

SELECT  DISTINCT ?person

WHERE {

        {#!data(?person,rdf:type,owl:Thing)!#}

        #OPTIONAL

        {#!data(?person,?p,?place)!# #!data(?place,rdf:type,dbo:Place)!# }

        OPTIONAL

        {#!data(?place,owl:sameAs,?place2)!#}

        FILTER (                 EXISTS{#!data(?place2,dbo:isPartOf,<|region|>)!#}

                         ||         EXISTS{#!data(?place,dbo:isPartOf,<|region|>)!#}         )

}

MIXED queries

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%    DBM-SIM-01

SELECT ?p ?o ?date

WHERE {    

        #!reif(?id,<|person|>,?p,?o)!#

        #!meta2(?id,<http://purl.org/dc/element/1.1/created>,?date)!#

}

LIMIT 1000

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%    DBM-SIM-02

PREFIX dbo: <http://dbpedia.org/ontology/>

SELECT DISTINCT ?city ?pop ?provenance

WHERE {    

        #!reif(?id,?city,dbo:populationTotal,?pop)!#

        #!data(?city,dbo:country,<|country|>)!#

        #!meta(?id,<http://ns.inria.fr/dbpediafr/voc#hasMainRevision>,?provenance)!#

        FILTER(?pop>20000||?pop>"20000")

}

LIMIT 10000

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%    DBM-MED-01

PREFIX owl: <http://www.w3.org/2002/07/owl#>

SELECT ?p ?o ?confidence

WHERE {    

        #!reif(?id,<|e_en|>,owl:sameAs,?e_de)!#

          #!data(<|e_en|>,?p,?o)!#

          FILTER EXISTS { #!data(?e_de,?p,?o)!# }

          #!meta2(?id,<http://ns.inria.fr/dbpediafr/voc#uniqueContributorNb>,?confidence)!#

          FILTER(?e_de!=<|e_en|>).

}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%    DBM-MED-02

PREFIX dbo: <http://dbpedia.org/ontology/>

SELECT (count(distinct ?company) as ?c)

WHERE {    

        #!reif(?id2,?company,dbo:locationCountry,<|country|>)!#

        #!reif(?id,?company,rdf:type,dbo:Company)!#

        #!meta2(?id,<http://ns.inria.fr/dbpediafr/voc#uniqueContributorNb>,?cont)!#

        #!meta2(?id2,<http://ns.inria.fr/dbpediafr/voc#revPerYear2016>,?revs)!#

        FILTER(?revs >5 && ?cont>10)

}

## %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%    DBM-HAR-01 ⇒ original not used anymore because of stardog issues

PREFIX dbo: <http://dbpedia.org/ontology/>

SELECT  DISTINCT ?company1 ?mod1 ?mod2

WHERE { {

                #!data(?company1,rdf:type,dbo:Company)!#

                #!data(?company2,rdf:type,dbo:Company)!#

                #!reif(?id1,?company1,dbo:industry,<|sector|>)!#

                #!meta2(?id1,<http://purl.org/dc/element/1.1/modified>,?mod1)!#

                #!reif(?id2,?company2,dbo:industry,<|sector|>)!#

                #!meta2(?id2,<http://purl.org/dc/element/1.1/modified>,?mod2)!#

        } OPTIONAL{

                #!data(?company1,rdfs:label,?label1)!#

                #!data(?company2,rdfs:label,?label2)!#

               } OPTIONAL{

                #!data(?company1,dbo:locationCity,?city1)!#

                #!data(?company2,dbo:locationCity,?city2)!#

               } OPTIONAL{

                #!data(?company1,dbo:locationCountry,?country1)!#

                #!data(?company2,dbo:locationCountry,?country2)!#

               }

                FILTER(?company1!=?company2 && ( (?label1=?label2 && STRLEN(?label1)>3) || ?city1=?city2 || ?country1=?country2 )   )  

}

## %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%    DBM-HAR-01 => rewritten query due to stardog

PREFIX dbo: <http://dbpedia.org/ontology/>

SELECT  DISTINCT ?company1 ?mod1 ?mod2

WHERE {

                #!data(?company1,rdf:type,dbo:Company)!#

                #!data(?company2,rdf:type,dbo:Company)!#

                #!reif(?id1,?company1,dbo:industry,<|sector|>)!#

                #!meta2(?id1,<http://purl.org/dc/element/1.1/modified>,?mod1)!#

                #!reif(?id2,?company2,dbo:industry,<|sector|>)!#

                #!meta2(?id2,<http://purl.org/dc/element/1.1/modified>,?mod2)!#

          OPTIONAL{

                #!data(?company1,rdfs:label,?label1)!#

        } OPTIONAL{

                #!data(?company2,rdfs:label,?label2)!#

               } OPTIONAL{

                #!data(?company1,dbo:locationCity,?city1)!#

        } OPTIONAL{

                #!data(?company2,dbo:locationCity,?city2)!#

               } OPTIONAL{

                #!data(?company1,dbo:locationCountry,?country1)!#

        } OPTIONAL{

                #!data(?company2,dbo:locationCountry,?country2)!#

               }

                FILTER (?company1!=?company2)

        FILTER ( (?label1=?label2     && STRLEN(?label1)>3 )  ||

                 (?city1=?city2       && bound(?city1)     )  ||

                 (?country1=?country2 && bound(?country1)  )      )  

}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%    DBM-HAR-02

PREFIX dbo: <http://dbpedia.org/ontology/>

PREFIX owl: <http://www.w3.org/2002/07/owl#>

SELECT  DISTINCT ?person ?provenance

WHERE {

        {#!reif(?id,?person,rdf:type,owl:Thing)!# #!meta(?id,<http://ns.inria.fr/dbpediafr/voc#hasMainRevision>,?provenance)!#}

        #OPTIONAL

        {#!data(?person,?p,?place)!# #!data(?place,rdf:type,dbo:Place)!# }

        OPTIONAL

        {#!data(?place,owl:sameAs,?place2)!#}

        FILTER (         (bound(?place2) &&        EXISTS{#!data(?place2,dbo:isPartOf,<|region|>)!#} )

                         ||                                         EXISTS{#!data(?place,dbo:isPartOf,<|region|>)!#}         )

}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%    DBM-HAR-02 ⇒ original not used because of different results sets

PREFIX dbo: <http://dbpedia.org/ontology/>

PREFIX owl: <http://www.w3.org/2002/07/owl#>

SELECT  DISTINCT ?person ?provenance

WHERE {

        {#!reif(?id,?person,rdf:type,owl:Thing)!# #!meta(?id,<http://ns.inria.fr/dbpediafr/voc#hasMainRevision>,?provenance)!#}

        #OPTIONAL

        {#!data(?person,?p,?place)!# #!data(?place,rdf:type,dbo:Place)!# }

        OPTIONAL

        {#!data(?place,owl:sameAs,?place2)!#}

        FILTER (                 EXISTS{#!data(?place2,dbo:isPartOf,<|region|>)!#}

                         ||         EXISTS{#!data(?place,dbo:isPartOf,<|region|>)!#}         )

}

Executing the experiments

Backend Setup

Stardog environment Setup

Blazegraph environment Setup

Virtuoso environment Setup

Wikidata dataset loading:

DBpedia dataset loading:

Blazegraph

Stardog

        //TODO prepare data folder for dbpedia

Run quins queries:

for every backend

Run sdw queries:

for every backend

Results

WIKIDATA LOADING RESULTS BLAZEGRAPH details

RDR:

Load: 563,676,547 stmts added in 282869.894 secs, rate= 1956, commitLatency=5179888ms, {failSet=0,goodSet=0}

Total elapsed=288053788ms => 80,01 hours

79064334336 (74G) bytes journal file

559,737,535 statements count in database

RDR fixed-date:

Load: 563.676.547 stmts added in 31349.234 secs, rate= 3284, commitLatency=140283294ms, {failSet=0,goodSet=0}

Total elapsed=171806338ms => 47,72 hours

79064334336 (74G) bytes journal file

> 800 GB IO written

RDR fixed-date-batch (swap disabled):

Load: 563,676,547 stmts added in 36163.168 secs, rate= 15587, commitLatency=0ms, {failSet=0,goodSet=627}

Total elapsed=37447229ms => 10,40 hours

71805173760  (67G) bytes journal file

> 387,3 GB IO written to import (before batch commit)

> 395,92 GB IO written overall

WARN : AbstractBTree.java:3758: wrote: name=kb.spo.POS, 2 records (#nodes=1, #leaves=1) in 7649ms : addrRoot=-757108429687874493

559,737,535 statements count in database

Singleton Property fixed-date-batch (swap disabled):

loading: 560.960.838 stmts added in 23846.394 secs, rate= 23523, commitLatency=0ms, {failSet=0,goodSet=623}

Load: 563,676,547 stmts added in 24000.387 secs, rate= 23486, commitLatency=0ms, {failSet=0,goodSet=627}

Total elapsed=25632544ms => 7,12 hours

65205960704 (60G) bytes

272.07 GB IO written overall?

559,760,916 statements count in database

Nary Relation fixed-date-batch (swap disabled)

loading: 562.047.938 stmts added in 50887.282 secs, rate= 11044, commitLatency=0ms, {failSet=0,goodSet=624}

Load: 563.678.588 stmts added in 51189.005 secs, rate= 11011, commitLatency=0ms, {failSet=0,goodSet=627}

Total elapsed=52469815ms => 14,57 hours

65205960704 (60G) bytes

559,762,957 statements count in database

Standard reification fixed-date-batch (swap disabled)

Load: 644.981.737 stmts added in 23903.623 secs, rate= 26982, commitLatency=0ms, {failSet=0,goodSet=627}

Total elapsed=25536461ms => 7,09 hours

65205960704 (60G) bytes

288.07 GB IO written overall

641,066,106 statements count in database

NGRAPHS:

Load: 482.371.357 stmts added in 269549.828 secs, rate= 1752, commitLatency=5717448ms, {failSet=0,goodSet=0}

Total elapsed=275831067ms => 76,61 hours

116122910720 (109G) bytes journal file

NGRAPHS fixed-date:

Load: 482.371.357 stmts added in 20609.1 secs, rate= 3079, commitLatency=136021807ms, {failSet=0,goodSet=0}

Total elapsed=156632191ms => 43,51 hours

116122910720 (109G) bytes journal file

482,370,426 statements count in database

NGRAPHS fixed-date-batch (swap disabled):

loading: 479031956 stmts added in 44504.696 secs, rate= 10763, commitLatency=0ms, {failSet=0,goodSet=621}

loading: 481729599 stmts added in 45242.674 secs, rate= 10647, commitLatency=0ms, {failSet=0,goodSet=625}

Load: 482371357 stmts added in 45325.73 secs, rate= 10642, commitLatency=0ms, {failSet=0,goodSet=627}

Total elapsed=46920179ms => 13,03 hours

105494806528 (109G) bytes journal file

482,370,426

WIKIDATA LOADING RESULTS VIRTUOSO details

NGRAPHS VIRTUOSO:

disable_auto_indexing.sql (ngraphs)

Elapsed time: 0.829115973 seconds

setup_list_ngraphs.sql (ngraphs)

Elapsed time: 0.783968777 seconds

load_data.sql (ngraphs)

Elapsed time: 9594.64527815 seconds 2,66 h

enable_auto_indexing.sql (ngraphs)

Elapsed time: 1349.192160706 seconds 0,37 h

# triples 482.371.357

50281316352 bytes db file

NARY VIRTUOSO:

Running disable_auto_indexing.sql (naryrel)

Elapsed time: 0.952281977 seconds

Running setup_list_naryrel.sql (naryrel)

Elapsed time: 0.966631676 seconds

Running load_data.sql (naryrel)

Elapsed time: 7956.271867824 seconds

Running enable_auto_indexing.sql (naryrel)

Elapsed time: 4247.081246498 seconds

49331306496 bytes db file

SGPROP VIRTUOSO:

Running disable_auto_indexing.sql (sgprop)

Elapsed time: 0.963152739 seconds

Running setup_list_sgprop.sql (sgprop)

Elapsed time: 0.84018891 seconds

Running load_data.sql (sgprop)

Elapsed time: 7341.24353104 seconds

Running enable_auto_indexing.sql (sgprop)

Elapsed time: 4233.850771077 seconds

49658462208 bytes db file

STDREIF VIRTUOSO:

Running disable_auto_indexing.sql (stdreif)

Elapsed time: 1.063498089 seconds

Running setup_list_stdreif.sql (stdreif)

Elapsed time: 0.945383899 seconds

Running load_data.sql (stdreif)

Elapsed time: 7586.993446389 seconds

Running enable_auto_indexing.sql (stdreif)

Elapsed time: 3741.900296941 seconds

48538583040 bytes db file

OTHER LOADING RESULTS

please see tables in paper

QUERY EXECUTION TIME RESULTS

The query execution times of the benchmark run for this paper can be downloaded as csv files here.

The csv columns correspond to: used MRM format; query template/pattern name; query instance number; execution time in seconds; number of results; http return code of store (if client timeout did not occur)