Evaluation of Metadata Representations in RDF stores
This is the experiment description website for the paper under review “Evaluation of Metadata Representations in RDF stores” submitted for the Semantic Web Journal special issue on “Benchmarking Linked Data”. It provides links to tools and datasets, as well as some additional information about the results and the benchmark execution procedures, in order to allow verification and reproduction of this work.
Original Wikidata Experiment by Hernandez et al. | |
Benchmark execution and data transformation (wikidata) framework | |
MRM Query transformation tool (used for DBpedia queries) | |
MRM data transformation tool (used for DBpedia datasets) | |
DBpedia Historique Extractor adaption (used to generate revision metadata) | |
DBpedia meta-rdf input format transformation (aggregated metadata and data) |
The datasets file can be downloaded from the original Wikidata experiment (https://dx.doi.org/10.6084/m9.figshare.3208498.v1)
The final dataset files can be downloaded here. For every MRM a version from the English and German Chapter exists. The experiments were run using both language versions.
Every MRM links to the same revision metadata, which need to be loaded as well.
The final dataset used for dataset sizes study can be downloaded here.
We reused the queries from the original Experiment but added support for the queries using a FILTER EXIST statement instead of a triple pattern (f(o)ngraphs,f(o)rdr
In order to translate the queries, we developed a generic tool for rewriting SPARQL queries for different MRMs.
The idea is that triple patterns within a query are replaced by special annotation, which will be translated into the appropriate format.
It consists of the following annotations:
#!data(?s,?p,?o)!# | accessing a regular data triple (needed for regular data queries) |
#!reif(?id,?s,?p,?o)!# | analogous to #!data but retrieving statement id as well |
#meta(?id,?k,?v)!# | retrieve metadata key and value given statement id |
#meta2(?id,?k,?v)!# | retrieve metadata key and value, which is reified itself (due to meta-metadata), given a statement id |
|x| | x denotes a template variable, which gets replaced by a specific constant to derive query instances from the template |
to illustrate the different behaviour, an example translation of the annotations into standard reification MRM is shown below:
#!data(?s,?p,?o)!#
?dummyVar_0 a <http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement>; |
#!reif(?id,?s,?p,?o)!#
?id a <http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement>; |
#meta(?id,?k,?v)!#
?id <http://sdw.aksw.org/metardf/hasSharedMeta> ?shared_2 .?shared_2 ?k ?v . |
#meta2(?id,?k,?v)!#
?id <http://sdw.aksw.org/metardf/hasSharedMeta> ?shared_3 .?dummyVar_3 a <http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement>; |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% DBQ-SIM-01
SELECT ?p ?o
WHERE {
#!data(<|person|>,?p,?o)!#
}
LIMIT 1000
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% DBQ-SIM-02
PREFIX dbo: <http://dbpedia.org/ontology/>
SELECT DISTINCT ?city ?pop
WHERE {
#!data(?city,dbo:populationTotal,?pop)!#
#!data(?city,dbo:country,<|country|>)!#
FILTER(?pop>20000||?pop>"20000")
}
LIMIT 10000
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% DBQ-MED-01
PREFIX owl: <http://www.w3.org/2002/07/owl#>
SELECT ?p ?o
WHERE {
#!data(<|e_en|>,owl:sameAs,?e_de)!#
#!data(<|e_en|>,?p,?o)!#
FILTER EXISTS { #!data(?e_de,?p,?o)!# }
FILTER(?e_de!=<|e_en|>).
}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% DBQ-MED-02
PREFIX dbo: <http://dbpedia.org/ontology/>
SELECT (count(distinct ?company) as ?c)
WHERE {
#!data(?company,dbo:locationCountry,<|country|>)!#
#!data(?company,rdf:type,dbo:Company)!#
}
## %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% DBQ-HAR-01 => original query. not used anymore
PREFIX dbo: <http://dbpedia.org/ontology/>
SELECT DISTINCT ?company1
WHERE { {
#!data(?company1,rdf:type,dbo:Company)!#
#!data(?company2,rdf:type,dbo:Company)!#
#!data(?company1,dbo:industry,<|sector|>)!#
#!data(?company2,dbo:industry,<|sector|>)!#
} OPTIONAL{
#!data(?company1,rdfs:label,?label1)!#
#!data(?company2,rdfs:label,?label2)!#
} OPTIONAL{
#!data(?company1,dbo:locationCity,?city1)!#
#!data(?company2,dbo:locationCity,?city2)!#
} OPTIONAL{
#!data(?company1,dbo:locationCountry,?country1)!#
#!data(?company2,dbo:locationCountry,?country2)!#
}
FILTER(?company1!=?company2 && ( (?label1=?label2 && STRLEN(?label1)>3) || ?city1=?city2 || ?country1=?country2 ) )
}
## %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% DBQ-HAR-01 => rewritten query due to stardog issues
PREFIX dbo: <http://dbpedia.org/ontology/>
SELECT DISTINCT ?company1
WHERE {
#!data(?company1,rdf:type,dbo:Company)!#
#!data(?company2,rdf:type,dbo:Company)!#
#!data(?company1,dbo:industry,<|sector|>)!#
#!data(?company2,dbo:industry,<|sector|>)!#
OPTIONAL{
#!data(?company1,rdfs:label,?label1)!#
} OPTIONAL{
#!data(?company2,rdfs:label,?label2)!#
} OPTIONAL{
#!data(?company1,dbo:locationCity,?city1)!#
} OPTIONAL{
#!data(?company2,dbo:locationCity,?city2)!#
} OPTIONAL{
#!data(?company1,dbo:locationCountry,?country1)!#
} OPTIONAL{
#!data(?company2,dbo:locationCountry,?country2)!#
}
FILTER (?company1!=?company2)
FILTER ( (?label1=?label2 && STRLEN(?label1)>3 ) ||
(?city1=?city2 && bound(?city1) ) ||
(?country1=?country2 && bound(?country1) ) )
}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% DBQ-HAR-02
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
SELECT DISTINCT ?person
WHERE {
{#!data(?person,rdf:type,owl:Thing)!#}
#OPTIONAL
{#!data(?person,?p,?place)!# #!data(?place,rdf:type,dbo:Place)!# }
OPTIONAL
{#!data(?place,owl:sameAs,?place2)!#}
FILTER ( (bound(?place2) && EXISTS{#!data(?place2,dbo:isPartOf,<|region|>)!#} )
|| EXISTS{#!data(?place,dbo:isPartOf,<|region|>)!#} )
}
## %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% DBQ-HAR-02 ⇒ original query, not used because of different results sets
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
SELECT DISTINCT ?person
WHERE {
{#!data(?person,rdf:type,owl:Thing)!#}
#OPTIONAL
{#!data(?person,?p,?place)!# #!data(?place,rdf:type,dbo:Place)!# }
OPTIONAL
{#!data(?place,owl:sameAs,?place2)!#}
FILTER ( EXISTS{#!data(?place2,dbo:isPartOf,<|region|>)!#}
|| EXISTS{#!data(?place,dbo:isPartOf,<|region|>)!#} )
}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% DBM-SIM-01
SELECT ?p ?o ?date
WHERE {
#!reif(?id,<|person|>,?p,?o)!#
#!meta2(?id,<http://purl.org/dc/element/1.1/created>,?date)!#
}
LIMIT 1000
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% DBM-SIM-02
PREFIX dbo: <http://dbpedia.org/ontology/>
SELECT DISTINCT ?city ?pop ?provenance
WHERE {
#!reif(?id,?city,dbo:populationTotal,?pop)!#
#!data(?city,dbo:country,<|country|>)!#
#!meta(?id,<http://ns.inria.fr/dbpediafr/voc#hasMainRevision>,?provenance)!#
FILTER(?pop>20000||?pop>"20000")
}
LIMIT 10000
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% DBM-MED-01
PREFIX owl: <http://www.w3.org/2002/07/owl#>
SELECT ?p ?o ?confidence
WHERE {
#!reif(?id,<|e_en|>,owl:sameAs,?e_de)!#
#!data(<|e_en|>,?p,?o)!#
FILTER EXISTS { #!data(?e_de,?p,?o)!# }
#!meta2(?id,<http://ns.inria.fr/dbpediafr/voc#uniqueContributorNb>,?confidence)!#
FILTER(?e_de!=<|e_en|>).
}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% DBM-MED-02
PREFIX dbo: <http://dbpedia.org/ontology/>
SELECT (count(distinct ?company) as ?c)
WHERE {
#!reif(?id2,?company,dbo:locationCountry,<|country|>)!#
#!reif(?id,?company,rdf:type,dbo:Company)!#
#!meta2(?id,<http://ns.inria.fr/dbpediafr/voc#uniqueContributorNb>,?cont)!#
#!meta2(?id2,<http://ns.inria.fr/dbpediafr/voc#revPerYear2016>,?revs)!#
FILTER(?revs >5 && ?cont>10)
}
## %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% DBM-HAR-01 ⇒ original not used anymore because of stardog issues
PREFIX dbo: <http://dbpedia.org/ontology/>
SELECT DISTINCT ?company1 ?mod1 ?mod2
WHERE { {
#!data(?company1,rdf:type,dbo:Company)!#
#!data(?company2,rdf:type,dbo:Company)!#
#!reif(?id1,?company1,dbo:industry,<|sector|>)!#
#!meta2(?id1,<http://purl.org/dc/element/1.1/modified>,?mod1)!#
#!reif(?id2,?company2,dbo:industry,<|sector|>)!#
#!meta2(?id2,<http://purl.org/dc/element/1.1/modified>,?mod2)!#
} OPTIONAL{
#!data(?company1,rdfs:label,?label1)!#
#!data(?company2,rdfs:label,?label2)!#
} OPTIONAL{
#!data(?company1,dbo:locationCity,?city1)!#
#!data(?company2,dbo:locationCity,?city2)!#
} OPTIONAL{
#!data(?company1,dbo:locationCountry,?country1)!#
#!data(?company2,dbo:locationCountry,?country2)!#
}
FILTER(?company1!=?company2 && ( (?label1=?label2 && STRLEN(?label1)>3) || ?city1=?city2 || ?country1=?country2 ) )
}
## %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% DBM-HAR-01 => rewritten query due to stardog
PREFIX dbo: <http://dbpedia.org/ontology/>
SELECT DISTINCT ?company1 ?mod1 ?mod2
WHERE {
#!data(?company1,rdf:type,dbo:Company)!#
#!data(?company2,rdf:type,dbo:Company)!#
#!reif(?id1,?company1,dbo:industry,<|sector|>)!#
#!meta2(?id1,<http://purl.org/dc/element/1.1/modified>,?mod1)!#
#!reif(?id2,?company2,dbo:industry,<|sector|>)!#
#!meta2(?id2,<http://purl.org/dc/element/1.1/modified>,?mod2)!#
OPTIONAL{
#!data(?company1,rdfs:label,?label1)!#
} OPTIONAL{
#!data(?company2,rdfs:label,?label2)!#
} OPTIONAL{
#!data(?company1,dbo:locationCity,?city1)!#
} OPTIONAL{
#!data(?company2,dbo:locationCity,?city2)!#
} OPTIONAL{
#!data(?company1,dbo:locationCountry,?country1)!#
} OPTIONAL{
#!data(?company2,dbo:locationCountry,?country2)!#
}
FILTER (?company1!=?company2)
FILTER ( (?label1=?label2 && STRLEN(?label1)>3 ) ||
(?city1=?city2 && bound(?city1) ) ||
(?country1=?country2 && bound(?country1) ) )
}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% DBM-HAR-02
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
SELECT DISTINCT ?person ?provenance
WHERE {
{#!reif(?id,?person,rdf:type,owl:Thing)!# #!meta(?id,<http://ns.inria.fr/dbpediafr/voc#hasMainRevision>,?provenance)!#}
#OPTIONAL
{#!data(?person,?p,?place)!# #!data(?place,rdf:type,dbo:Place)!# }
OPTIONAL
{#!data(?place,owl:sameAs,?place2)!#}
FILTER ( (bound(?place2) && EXISTS{#!data(?place2,dbo:isPartOf,<|region|>)!#} )
|| EXISTS{#!data(?place,dbo:isPartOf,<|region|>)!#} )
}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% DBM-HAR-02 ⇒ original not used because of different results sets
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
SELECT DISTINCT ?person ?provenance
WHERE {
{#!reif(?id,?person,rdf:type,owl:Thing)!# #!meta(?id,<http://ns.inria.fr/dbpediafr/voc#hasMainRevision>,?provenance)!#}
#OPTIONAL
{#!data(?person,?p,?place)!# #!data(?place,rdf:type,dbo:Place)!# }
OPTIONAL
{#!data(?place,owl:sameAs,?place2)!#}
FILTER ( EXISTS{#!data(?place2,dbo:isPartOf,<|region|>)!#}
|| EXISTS{#!data(?place,dbo:isPartOf,<|region|>)!#} )
}
Blazegraph
Stardog
//TODO prepare data folder for dbpedia
for every backend
for every backend
RDR:
Load: 563,676,547 stmts added in 282869.894 secs, rate= 1956, commitLatency=5179888ms, {failSet=0,goodSet=0}
Total elapsed=288053788ms => 80,01 hours
79064334336 (74G) bytes journal file
559,737,535 statements count in database
RDR fixed-date:
Load: 563.676.547 stmts added in 31349.234 secs, rate= 3284, commitLatency=140283294ms, {failSet=0,goodSet=0}
Total elapsed=171806338ms => 47,72 hours
79064334336 (74G) bytes journal file
> 800 GB IO written
RDR fixed-date-batch (swap disabled):
Load: 563,676,547 stmts added in 36163.168 secs, rate= 15587, commitLatency=0ms, {failSet=0,goodSet=627}
Total elapsed=37447229ms => 10,40 hours
71805173760 (67G) bytes journal file
> 387,3 GB IO written to import (before batch commit)
> 395,92 GB IO written overall
WARN : AbstractBTree.java:3758: wrote: name=kb.spo.POS, 2 records (#nodes=1, #leaves=1) in 7649ms : addrRoot=-757108429687874493
559,737,535 statements count in database
Singleton Property fixed-date-batch (swap disabled):
loading: 560.960.838 stmts added in 23846.394 secs, rate= 23523, commitLatency=0ms, {failSet=0,goodSet=623}
Load: 563,676,547 stmts added in 24000.387 secs, rate= 23486, commitLatency=0ms, {failSet=0,goodSet=627}
Total elapsed=25632544ms => 7,12 hours
65205960704 (60G) bytes
272.07 GB IO written overall?
559,760,916 statements count in database
Nary Relation fixed-date-batch (swap disabled)
loading: 562.047.938 stmts added in 50887.282 secs, rate= 11044, commitLatency=0ms, {failSet=0,goodSet=624}
Load: 563.678.588 stmts added in 51189.005 secs, rate= 11011, commitLatency=0ms, {failSet=0,goodSet=627}
Total elapsed=52469815ms => 14,57 hours
65205960704 (60G) bytes
559,762,957 statements count in database
Standard reification fixed-date-batch (swap disabled)
Load: 644.981.737 stmts added in 23903.623 secs, rate= 26982, commitLatency=0ms, {failSet=0,goodSet=627}
Total elapsed=25536461ms => 7,09 hours
65205960704 (60G) bytes
288.07 GB IO written overall
641,066,106 statements count in database
NGRAPHS:
Load: 482.371.357 stmts added in 269549.828 secs, rate= 1752, commitLatency=5717448ms, {failSet=0,goodSet=0}
Total elapsed=275831067ms => 76,61 hours
116122910720 (109G) bytes journal file
NGRAPHS fixed-date:
Load: 482.371.357 stmts added in 20609.1 secs, rate= 3079, commitLatency=136021807ms, {failSet=0,goodSet=0}
Total elapsed=156632191ms => 43,51 hours
116122910720 (109G) bytes journal file
482,370,426 statements count in database
NGRAPHS fixed-date-batch (swap disabled):
loading: 479031956 stmts added in 44504.696 secs, rate= 10763, commitLatency=0ms, {failSet=0,goodSet=621}
loading: 481729599 stmts added in 45242.674 secs, rate= 10647, commitLatency=0ms, {failSet=0,goodSet=625}
Load: 482371357 stmts added in 45325.73 secs, rate= 10642, commitLatency=0ms, {failSet=0,goodSet=627}
Total elapsed=46920179ms => 13,03 hours
105494806528 (109G) bytes journal file
482,370,426
NGRAPHS VIRTUOSO: | |
disable_auto_indexing.sql (ngraphs) | Elapsed time: 0.829115973 seconds |
setup_list_ngraphs.sql (ngraphs) | Elapsed time: 0.783968777 seconds |
load_data.sql (ngraphs) | Elapsed time: 9594.64527815 seconds 2,66 h |
enable_auto_indexing.sql (ngraphs) | Elapsed time: 1349.192160706 seconds 0,37 h |
# triples 482.371.357 | |
50281316352 bytes db file | |
NARY VIRTUOSO: | |
Running disable_auto_indexing.sql (naryrel) | Elapsed time: 0.952281977 seconds |
Running setup_list_naryrel.sql (naryrel) | Elapsed time: 0.966631676 seconds |
Running load_data.sql (naryrel) | Elapsed time: 7956.271867824 seconds |
Running enable_auto_indexing.sql (naryrel) | Elapsed time: 4247.081246498 seconds |
49331306496 bytes db file | |
SGPROP VIRTUOSO: | |
Running disable_auto_indexing.sql (sgprop) | Elapsed time: 0.963152739 seconds |
Running setup_list_sgprop.sql (sgprop) | Elapsed time: 0.84018891 seconds |
Running load_data.sql (sgprop) | Elapsed time: 7341.24353104 seconds |
Running enable_auto_indexing.sql (sgprop) | Elapsed time: 4233.850771077 seconds |
49658462208 bytes db file | |
STDREIF VIRTUOSO: | |
Running disable_auto_indexing.sql (stdreif) | Elapsed time: 1.063498089 seconds |
Running setup_list_stdreif.sql (stdreif) | Elapsed time: 0.945383899 seconds |
Running load_data.sql (stdreif) | Elapsed time: 7586.993446389 seconds |
Running enable_auto_indexing.sql (stdreif) | Elapsed time: 3741.900296941 seconds |
48538583040 bytes db file |
please see tables in paper
The query execution times of the benchmark run for this paper can be downloaded as csv files here.
The csv columns correspond to: used MRM format; query template/pattern name; query instance number; execution time in seconds; number of results; http return code of store (if client timeout did not occur)