1 00:00:05,973 --> 00:00:07,908 Hi, guys! Can everybody hear me? 2 00:00:09,170 --> 00:00:11,898 So, hi! Nice to meet you all. I'm Erica Azzellini. 3 00:00:11,898 --> 00:00:14,606 I'm one of the Wikimovement Brazil's Liaison, 4 00:00:14,606 --> 00:00:17,829 and this is my first international Wikimedia event, 5 00:00:17,829 --> 00:00:21,023 so I'm super excited to be here and I hopefully, 6 00:00:21,023 --> 00:00:24,311 *will share something interesting for you all here on this lengthy talk.* 7 00:00:25,247 --> 00:00:30,441 *So this work starts with research that I was developing in Brazil,* 8 00:00:30,441 --> 00:00:34,219 *Computational Journalism and Structured Narratives with Wikidata.* 9 00:00:34,276 --> 00:00:35,958 *So in journalism,* 10 00:00:35,958 --> 00:00:39,616 *they're using some natural language generation software* 11 00:00:39,616 --> 00:00:41,418 *for automating news* 12 00:00:41,418 --> 00:00:46,535 *for news that have quite similar narrative structure.* 13 00:00:46,535 --> 00:00:51,600 *And we developed this concept here of structured narratives,* 14 00:00:51,600 --> 00:00:54,548 *thinking about this practice on computational journalism,* 15 00:00:54,548 --> 00:00:58,361 *that is the development of verbal text, understandable by humans,* 16 00:00:58,361 --> 00:01:01,274 *automated from predetermined arrangements that process information* 17 00:01:01,274 --> 00:01:05,395 *from structured databases, which looks like that,* 18 00:01:05,395 --> 00:01:10,043 *the Wikimedia universe and on this tool that we developed.* 19 00:01:10,043 --> 00:01:13,555 *So, when I'm talking about verbal text understandable by humans,* 20 00:01:13,555 --> 00:01:15,808 *I'm talking about Wikipedia entries.* 21 00:01:15,808 --> 00:01:17,778 *When I'm talking about structured databases,* 22 00:01:17,778 --> 00:01:20,017 *of course, I'm talking about Wikidata here.* 23 00:01:20,017 --> 00:01:22,777 *And predetermined arrangement, I'm talking about Mbabel,* 24 00:01:22,777 --> 00:01:24,271 *that is this tool.* 25 00:01:25,467 --> 00:01:31,216 *The Mbabel tool was inspired by a template* *by user Pharos, right here in front of me,* 26 00:01:31,279 --> 00:01:33,356 *thank you very much,* 27 00:01:33,356 --> 00:01:39,114 *and it was developed with Ederporto that is right here too,* 28 00:01:39,114 --> 00:01:40,974 *the brilliant Ederporto.* 29 00:01:42,599 --> 00:01:44,498 *We developed this tool* 30 00:01:44,498 --> 00:01:47,780 *that automatically generates Wikipedia entries* 31 00:01:47,780 --> 00:01:50,600 *based on information from Wikidata.* 32 00:01:53,189 --> 00:01:58,130 *We actually do some thematic templates* 33 00:01:58,130 --> 00:02:01,152 *that are created on the Wikidata module,* 34 00:02:01,573 --> 00:02:03,716 *WikidataIB Module,* 35 00:02:03,716 --> 00:02:07,835 *and these templates are pre-determined, generic and editable templates* 36 00:02:07,835 --> 00:02:09,677 *for various article themes.* 37 00:02:09,677 --> 00:02:15,411 *We realized that many Wikipedia entries had a quite similar structured narrative* 38 00:02:15,411 --> 00:02:18,922 *so we could create a tool that automatically generates that* 39 00:02:18,922 --> 00:02:21,598 *for many Wikidata items.* 40 00:02:24,207 --> 00:02:28,571 *Until now we have templates for museums, works of art, books, films,* 41 00:02:28,571 --> 00:02:31,265 *journals, earthquakes, libraries, archives,* 42 00:02:31,265 --> 00:02:34,855 *and Brazilian municipal and state elections, and growing.* 43 00:02:34,855 --> 00:02:38,984 *So, everybody here is able to contribute and create new templates.* 44 00:02:38,984 --> 00:02:43,508 *Each narrative template includes an introduction, Wikidata infobox,* 45 00:02:43,508 --> 00:02:46,158 *section suggestions for the users,* 46 00:02:46,158 --> 00:02:50,499 *content tables or lists with Listeria, depending on the case,* 47 00:02:50,499 --> 00:02:53,713 *references and categories, and of course the sentences,* 48 00:02:53,713 --> 00:02:55,776 *that are created with the Wikidata information.* 49 00:02:55,776 --> 00:02:58,642 *I'm gonna show you in a sec an example of that.* 50 00:03:00,137 --> 00:03:05,749 *It's an integration with Wikipedia, integration with Wikidata,* 51 00:03:05,749 --> 00:03:08,760 *so the more properties properly filled on Wikidata,* 52 00:03:08,760 --> 00:03:12,311 *the more text entries you'll get on your article stub.* 53 00:03:12,857 --> 00:03:15,623 *That's very important to highlight here.* 54 00:03:16,343 --> 00:03:18,969 *Structuring this Wikidata can get more complex* 55 00:03:18,969 --> 00:03:22,017 *as I'm going to show you on the election projects that we've made.* 56 00:03:22,017 --> 00:03:26,552 *So I'm going to let you hear this Wikidata Lab XIV for you* 57 00:03:26,552 --> 00:03:29,471 *after this lengthy talk* 58 00:03:29,471 --> 00:03:32,259 *that is very brief, so you'll be able to choose* 59 00:03:32,259 --> 00:03:34,554 *on the work that we've been doing on structuring Wikidata* 60 00:03:34,554 --> 00:03:36,005 *for this purpose too.* 61 00:03:37,272 --> 00:03:39,725 *We have this challenge to build a narrative template* 62 00:03:39,725 --> 00:03:44,383 *that is generic enough to cover different Wikidata items* 63 00:03:44,383 --> 00:03:46,347 *and to suppress the gender* 64 00:03:46,347 --> 00:03:50,359 *and the number of difficulties of languages,* 65 00:03:52,054 --> 00:03:54,252 *and still sounding natural for the user* 66 00:03:54,252 --> 00:03:59,252 *because we don't want to sound like it doesn't click for the user* 67 00:03:59,252 --> 00:04:00,546 *to edit after that.* 68 00:04:01,956 --> 00:04:07,625 *This is how the Mbabel looks like on the bottom form.* 69 00:04:07,625 --> 00:04:14,507 *You just have insert the item number there* *and call the desired template* 70 00:04:14,507 --> 00:04:21,673 *and then you have article to edit and expand, and everything.* 71 00:04:22,135 --> 00:04:26,856 *So, more importantly, why we did it? Not because it's cool to develop* 72 00:04:26,856 --> 00:04:30,922 *things here in Wikidata, we know, we all hear, know about it.* 73 00:04:30,922 --> 00:04:36,178 *But we are experimenting this integration from Wikidata to Wikipedia* 74 00:04:36,178 --> 00:04:39,226 *and we want to focus on meaningful individual contributions.* 75 00:04:39,226 --> 00:04:42,608 *So we've been working on education programs* 76 00:04:42,608 --> 00:04:45,067 *and we want the students to feel the value* 77 00:04:45,067 --> 00:04:47,280 *of their entries too, but not only--* 78 00:04:47,280 --> 00:04:49,405 *Oh, five minutes only, Geez, I'm gonna rush here.* 79 00:04:49,405 --> 00:04:50,599 (laughing) 80 00:04:50,794 --> 00:04:54,160 *And we want you all to make tasks for users in general,* 81 00:04:54,270 --> 00:04:57,801 *especially on tables and this kind of content* 82 00:04:57,801 --> 00:04:59,988 *that it's a bit of a rush to do.* 83 00:05:02,456 --> 00:05:05,523 *And we're working on this concept of abstract Wikipedia.* 84 00:05:05,523 --> 00:05:09,269 *Denny Vrandečić wrote an article super interesting about it* 85 00:05:09,269 --> 00:05:11,500 *so I linked here too.* 86 00:05:11,500 --> 00:05:14,792 *And we also want to now support small language communities* 87 00:05:14,792 --> 00:05:17,845 *to fill the lack of content there.* 88 00:05:18,784 --> 00:05:23,885 *This is an example of how we've been using* *this Mbabel tool for GLAM* 89 00:05:23,885 --> 00:05:25,748 *and education programs,* 90 00:05:25,748 --> 00:05:29,861 *and I showed you earlier the bottom form of the Mbabel tool* 91 00:05:29,861 --> 00:05:34,264 *but also we can make red links that aren't exactly empty.* 92 00:05:34,264 --> 00:05:35,931 *So you click on this red link* 93 00:05:35,931 --> 00:05:38,862 *and you automatically have this article draft* 94 00:05:38,862 --> 00:05:41,660 *on your user page to edit.* 95 00:05:42,964 --> 00:05:48,762 *And I'm going to briefly talk about it because I only have some minutes more.* 96 00:05:50,009 --> 00:05:51,356 *On educational projects,* 97 00:05:51,356 --> 00:05:56,799 *we've been doing this with elections in Brazil for journalism students.* 98 00:05:56,799 --> 00:06:01,993 *We have the experience with the [inaudible] students* 99 00:06:02,087 --> 00:06:05,314 *with user Joalpe-- he's not here right now,* 100 00:06:05,314 --> 00:06:07,867 *but we all know him, I think.* 101 00:06:07,867 --> 00:06:11,930 *And we realize that we have the data about Brazilian elections* 102 00:06:11,930 --> 00:06:14,748 *but we don't have media cover on it.* 103 00:06:15,049 --> 00:06:18,249 *So we were lacking also Wikipedia entries on it.* 104 00:06:19,029 --> 00:06:23,000 *How do we insert this meaningful information on Wikipedia* 105 00:06:23,000 --> 00:06:24,672 *that people really access?* 106 00:06:24,672 --> 00:06:27,989 *Next year we're going to have some election,* 107 00:06:27,989 --> 00:06:30,710 *people are going to look for this kind of information on Wikipedia* 108 00:06:30,710 --> 00:06:32,433 *and they simply won't find it.* 109 00:06:32,433 --> 00:06:35,726 *So this tool looks quite useful for this purpose* 110 00:06:35,726 --> 00:06:40,214 *and the students were introduced, not only to Wikipedia,* 111 00:06:40,214 --> 00:06:42,701 *but also to Wikidata.* 112 00:06:42,701 --> 00:06:46,575 *Actually, they were introduced to Wikipedia with Wikidata,* 113 00:06:46,575 --> 00:06:50,675 *which is an experience super interesting and we had a lot of fun,* 114 00:06:50,675 --> 00:06:52,823 *and it was quite challenging to organize all that.* 115 00:06:52,823 --> 00:06:54,513 *We can talk about it later too.* 116 00:06:54,979 --> 00:06:58,582 *And they also added the background and the analysis sections* 117 00:06:58,582 --> 00:07:01,663 *on these elections articles,* 118 00:07:01,663 --> 00:07:05,336 *because we don't want them* *to just simply automate the content there.* 119 00:07:05,336 --> 00:07:06,660 *We can do better.* 120 00:07:06,660 --> 00:07:09,247 *So this is the example I'm going to show you.* 121 00:07:09,247 --> 00:07:13,106 *This is from a municipal election in Brazil.* 122 00:07:15,603 --> 00:07:17,121 *Two minutes... oh my!* 123 00:07:18,577 --> 00:07:23,268 *This example here was entirely created with the Mbabel tool.* 124 00:07:23,268 --> 00:07:29,496 *You have here this introduction text. It really sounds natural for the reader.* 125 00:07:29,496 --> 00:07:32,165 *The Wikidata infobox here--* 126 00:07:32,165 --> 00:07:34,907 *it's a masterpiece of Ederporto right there.* 127 00:07:34,907 --> 00:07:36,769 (laughter) 128 00:07:37,438 --> 00:07:42,456 *And we have here the tables with the election results for each position.* 129 00:07:42,456 --> 00:07:46,415 *And we also have these results here on the textual form too,* 130 00:07:46,415 --> 00:07:51,767 *so it really looks like an article that was made, that was handcrafted.* 131 00:07:53,893 --> 00:07:57,814 *The references here were also made with the Mbabel tool* 132 00:07:57,814 --> 00:08:01,393 *and we used identifiers to build these references here* 133 00:08:01,393 --> 00:08:03,167 *and the categories too.* 134 00:08:10,726 --> 00:08:14,999 *So, to wrap things up here, it is still a work in progress,* 135 00:08:14,999 --> 00:08:19,326 *and we have some challenges on outreach and technical* 136 00:08:19,326 --> 00:08:22,999 *to bring Mbabel to other language communities,* 137 00:08:22,999 --> 00:08:24,844 *especially the smaller ones,* 138 00:08:24,844 --> 00:08:27,210 *and how do we support those tools* 139 00:08:27,210 --> 00:08:29,819 *on lower resource language communities too.* 140 00:08:29,819 --> 00:08:33,991 *And finally, is it possible to create an Mbabel* 141 00:08:33,991 --> 00:08:36,261 *that overcomes language barriers?* 142 00:08:36,261 --> 00:08:39,740 *I think that's a question very interesting for the conference* 143 00:08:39,740 --> 00:08:43,835 *and hopefully we can figure that out together.* 144 00:08:44,818 --> 00:08:49,799 *So, thank you very much, and look for the Mbabel poster downstairs* 145 00:08:49,799 --> 00:08:53,615 *if you like to have all this information wrapped up, okay?* 146 00:08:53,615 --> 00:08:55,038 Thank you. 147 00:08:55,288 --> 00:08:57,564 (audience clapping) 148 00:09:00,311 --> 00:09:02,778 (moderator) I'm afraid we're a little too short for questions 149 00:09:02,778 --> 00:09:05,783 but yes, Erica, as she said, has a poster and is very friendly. 150 00:09:05,783 --> 00:09:07,518 So I'm sure you can talk to her afterwards, 151 00:09:07,518 --> 00:09:09,389 and if there's time at the end, I'll allow it. 152 00:09:09,389 --> 00:09:12,131 But in the meantime, I'd like to bring up our next speaker... 153 00:09:12,237 --> 00:09:13,611 Thank you. 154 00:09:15,549 --> 00:09:17,140 (audience chattering) 155 00:09:23,058 --> 00:09:27,016 Next we've got Yolanda Gil, talking about Wikidata and Geosciences. 156 00:09:27,908 --> 00:09:29,031 Thank you. 157 00:09:29,031 --> 00:09:31,624 I come from the University of Southern California 158 00:09:31,624 --> 00:09:35,164 and I've been working with Semantic Technologies for a long time. 159 00:09:35,164 --> 00:09:37,894 I want to talk about geosciences in particular, 160 00:09:37,894 --> 00:09:41,225 where this idea of crowd-sourcing from the community is very important. 161 00:09:41,791 --> 00:09:45,033 *So I'll give you a sense that individual scientists,* 162 00:09:45,033 --> 00:09:47,070 *most of them in colleges,* 163 00:09:47,070 --> 00:09:50,085 *collect their own data for their particular project.* 164 00:09:50,085 --> 00:09:51,932 *They describe it in their own way.* 165 00:09:51,932 --> 00:09:55,352 *They use their own properties, their own metadata characteristics.* 166 00:09:55,352 --> 00:09:58,560 *This is an example of some collaborators of mine* 167 00:09:58,560 --> 00:10:00,124 *that collect data from a river.* 168 00:10:00,124 --> 00:10:02,091 *They have their own sensors, their own robots,* 169 00:10:02,091 --> 00:10:05,339 *and they study the water quality.* 170 00:10:05,339 --> 00:10:11,423 *I'm going to talk today about an effort that we did to crowdsource metadata* 171 00:10:11,423 --> 00:10:14,712 *for a community that works in paleoclimate.* 172 00:10:14,712 --> 00:10:17,747 *The article just came out so it's in the slides if you're curious,* 173 00:10:17,747 --> 00:10:20,619 *but it's a pretty large community that work together* 174 00:10:20,619 --> 00:10:24,042 *to integrate data more efficiently through crowdsourcing.* 175 00:10:24,042 --> 00:10:28,631 *So, if you've heard of the hockey stick graphics for climate,* 176 00:10:28,631 --> 00:10:31,680 *this is the community that does this.* 177 00:10:31,680 --> 00:10:34,520 *This is a study for climate in the last 200 years,* 178 00:10:34,520 --> 00:10:38,188 *and it takes them literally many years to look at data* 179 00:10:38,188 --> 00:10:39,618 *from different parts of the globe.* 180 00:10:39,618 --> 00:10:42,607 *Each dataset is collected by a different investigator.* 181 00:10:42,699 --> 00:10:44,433 *The data is very, very different,* 182 00:10:44,433 --> 00:10:47,017 *so it takes them a long time to put together* 183 00:10:47,017 --> 00:10:49,230 *these global studies of climate,* 184 00:10:49,230 --> 00:10:51,665 *and our goal is to make that more efficient.* 185 00:10:51,665 --> 00:10:53,690 *So, I've done a lot of work over the years.* 186 00:10:53,690 --> 00:10:56,585 *Going back to 2005, we used to call it,* 187 00:10:56,585 --> 00:10:59,615 *"Knowledge Collection from Web Volunteers"* 188 00:10:59,615 --> 00:11:02,236 *or from netizens at that time.* 189 00:11:02,236 --> 00:11:04,267 *We had a system called "Learner."* 190 00:11:04,267 --> 00:11:07,048 *It collected 700,000 common sense,* 191 00:11:07,048 --> 00:11:09,368 *common knowledge statements about the world.* 192 00:11:09,368 --> 00:11:11,367 *We did a lot of different techniques.* 193 00:11:11,367 --> 00:11:15,333 *The forms that we did to extract knowledge from volunteers* 194 00:11:15,333 --> 00:11:19,136 *really fit the knowledge models, the data models that we used* 195 00:11:19,136 --> 00:11:21,381 *and the properties that we wanted to use.* 196 00:11:21,381 --> 00:11:25,051 *I worked with Denny in the system called "Shortipedia"* 197 00:11:25,051 --> 00:11:27,259 *when he was a Post Doc at ISI,* 198 00:11:27,259 --> 00:11:31,946 *looking at keeping track of the prominence of the assertions,* 199 00:11:31,946 --> 00:11:35,129 *and we started to build on Semantic Media Wiki software.* 200 00:11:35,129 --> 00:11:37,113 *So everything that I'm going to describe today* 201 00:11:37,113 --> 00:11:38,936 *builds on that software,* 202 00:11:38,936 --> 00:11:41,117 *but I think that now we have Wikibase,* 203 00:11:41,117 --> 00:11:43,676 *we'll be starting to work more on Wikibase.* 204 00:11:43,676 --> 00:11:48,935 *So the LinkedEarth is the project* *where we work with paleoclimate scientists* 205 00:11:48,935 --> 00:11:50,636 *to crowdsource the metadata,* 206 00:11:50,636 --> 00:11:54,328 *and seeing the title that we said, "controlled crowdsourcing."* 207 00:11:54,328 --> 00:11:57,101 *So we found a nice niche* 208 00:11:57,101 --> 00:12:00,538 *where we could let them create new properties* 209 00:12:00,538 --> 00:12:02,599 *but we had an editorial process for it.* 210 00:12:02,599 --> 00:12:04,444 *So I'll describe to you how it works.* 211 00:12:04,444 --> 00:12:10,055 *For them, if you're looking at a sample from lake sediments from 200 years ago,* 212 00:12:10,055 --> 00:12:12,622 *you use different properties to describe it* 213 00:12:12,622 --> 00:12:15,692 *than if you have coral sediments that you're looking at* 214 00:12:15,692 --> 00:12:18,979 *or coral samples that you're looking at that you extract from the ocean.* 215 00:12:18,979 --> 00:12:23,532 *Palmyra is a coral atoll in the Pacific.* 216 00:12:23,532 --> 00:12:27,918 *So if you have coral, you care about the species and the genus,* 217 00:12:27,918 --> 00:12:31,691 *but if you're just looking at lake sand, you don't have that.* 218 00:12:31,691 --> 00:12:35,313 *So each type of sample has very different properties.* 219 00:12:35,313 --> 00:12:38,798 *In LinkedEarth, they're able to see in a map* 220 00:12:38,798 --> 00:12:40,264 *where the datasets are.* 221 00:12:40,264 --> 00:12:45,500 *They actually annotate their own datasets* *or the datasets of other researchers* 222 00:12:45,500 --> 00:12:46,787 *when they're using it.* 223 00:12:46,787 --> 00:12:50,254 *So they have a reason why they want certain properties* 224 00:12:50,254 --> 00:12:52,289 *to describe those datasets.* 225 00:12:52,289 --> 00:12:56,683 *Whenever there are disagreements, or whenever there are agreements,* 226 00:12:56,683 --> 00:12:58,595 *there's community discussions about them* 227 00:12:58,595 --> 00:13:02,894 *and they're also polls to decide on what properties to settle.* 228 00:13:02,894 --> 00:13:05,659 *So it's a nice ecosystem. I'll give you examples.* 229 00:13:05,659 --> 00:13:11,322 *You look at a particular dataset, in this case it's a lake in Africa.* 230 00:13:11,322 --> 00:13:14,241 *So you have the category of the page; it can be a dataset,* 231 00:13:14,241 --> 00:13:15,491 *it can be other things.* 232 00:13:15,491 --> 00:13:21,181 *You can download the dataset itself and you have kind of canonical properties* 233 00:13:21,181 --> 00:13:23,737 *that they have all agreed to have for datasets,* 234 00:13:23,737 --> 00:13:25,992 *and then under Extra Information,* 235 00:13:25,992 --> 00:13:29,369 *those are properties that the person describing this dataset,* 236 00:13:29,369 --> 00:13:31,007 *added on their own accord.* 237 00:13:31,007 --> 00:13:32,628 *So these can be new properties.* 238 00:13:32,628 --> 00:13:36,730 *We call them "crowd properties," rather than "core properties."* 239 00:13:37,291 --> 00:13:41,319 *And then when you're describing your dataset,* 240 00:13:41,319 --> 00:13:43,774 *in this case it's an ice core that you got* 241 00:13:43,774 --> 00:13:45,716 *from a glacier dataset,* 242 00:13:45,765 --> 00:13:49,178 *and your'e adding a dataset you want to talk about measurements,* 243 00:13:49,178 --> 00:13:54,073 *you have an offering of all the existing properties* 244 00:13:54,073 --> 00:13:55,278 *that match what you're saying.* 245 00:13:55,278 --> 00:13:58,409 *So we do this search completion so that you can adopt that.* 246 00:13:58,409 --> 00:14:00,140 *That promotes normalization.* 247 00:14:00,140 --> 00:14:04,260 *The core of the properties has been agreed by the community* 248 00:14:04,260 --> 00:14:06,220 *so we're really extending that core.* 249 00:14:06,220 --> 00:14:08,795 *And that core is very important because it gives structure* 250 00:14:08,795 --> 00:14:10,735 *to all the extensions.* 251 00:14:10,735 --> 00:14:14,382 *We engage the community through many different ways.* 252 00:14:14,382 --> 00:14:17,260 *We had one face-to-face meeting at the beginning* 253 00:14:17,260 --> 00:14:21,611 *and after about a year and a half, we do have a new standard,* 254 00:14:21,611 --> 00:14:25,154 *and a new way for them to continue to evolve that standard.* 255 00:14:25,154 --> 00:14:30,569 *They have editors, very much in the Wikipedia style* 256 00:14:30,569 --> 00:14:31,582 *of editorial boards.* 257 00:14:31,582 --> 00:14:34,098 *They have working groups for different types of data.* 258 00:14:34,098 --> 00:14:36,090 *They do polls with the community,* 259 00:14:36,090 --> 00:14:40,879 *and they have pretty nice engagement of the community at large,* 260 00:14:40,879 --> 00:14:43,706 *even if they've never visited our Wiki.* 261 00:14:43,706 --> 00:14:46,183 *The metadata evolves* 262 00:14:46,183 --> 00:14:48,775 *so what we do is that people annotate their datasets,* 263 00:14:48,775 --> 00:14:52,321 *then the schema evolves, the properties evolve* 264 00:14:52,321 --> 00:14:55,379 *and we have an entire infrastructure and mechanisms* 265 00:14:55,379 --> 00:15:00,336 *to re-annotate the datasets with the new structure of the ontology* 266 00:15:00,336 --> 00:15:01,711 *and the new properties.* 267 00:15:01,711 --> 00:15:05,210 *This is described in the paper. I won't go into the details.* 268 00:15:05,210 --> 00:15:07,583 But I think that * having that kind of capability* 269 00:15:07,583 --> 00:15:10,342 *in Wikibase would be really interesting.* 270 00:15:10,342 --> 00:15:14,041 *We basically extended Semantic Media Wiki and Media Wiki* 271 00:15:14,041 --> 00:15:15,722 *to create our own infrastructure.* 272 00:15:15,722 --> 00:15:18,855 *I think a lot of this is now something that we find in Wikibase,* 273 00:15:18,961 --> 00:15:20,615 *but this is older than that.* 274 00:15:20,615 --> 00:15:24,999 *And in general, we have many projects where we look at crowdsourcing* 275 00:15:24,999 --> 00:15:29,885 *not just descriptions of datasets* *but also descriptions of hydrology models,* 276 00:15:29,885 --> 00:15:33,563 *descriptions of multi-step data analytic workflows* 277 00:15:33,563 --> 00:15:36,080 *and many other things in the sciences.* 278 00:15:36,080 --> 00:15:42,833 *So we are also interested in including in Wikidata additional things* 279 00:15:42,833 --> 00:15:46,250 *that are not just datasets or entities* 280 00:15:46,250 --> 00:15:48,512 *but also other things that have to do with science.* 281 00:15:48,512 --> 00:15:53,770 *I think Geosciences are more complex in this sense than Biology, for example.* 282 00:15:54,923 --> 00:15:56,233 *That's it.* 283 00:15:56,513 --> 00:15:57,885 Thank you. (audience clapping) 284 00:16:01,640 --> 00:16:03,772 - Do I have time for questions? - Yes. 285 00:16:03,772 --> 00:16:06,871 (moderator) We have time for just a couple of short questions. 286 00:16:07,751 --> 00:16:11,342 When answering, can go back to the microphone? 287 00:16:12,529 --> 00:16:14,520 - Yes. - Hopefully, yeah. 288 00:16:21,314 --> 00:16:25,002 (audience 1) Does the structure allow tabular datasets to be described 289 00:16:25,002 --> 00:16:26,988 and can you talk a bit about that? 290 00:16:27,225 --> 00:16:32,667 Yes. So the properties of the datasets talk more about who collected them, 291 00:16:32,667 --> 00:16:36,759 what kind of data was collected, what kind of sample it was, 292 00:16:36,759 --> 00:16:39,790 and then there's a separate standard which is called "lipid" 293 00:16:39,790 --> 00:16:43,065 that's complementary and mapped to the properties 294 00:16:43,065 --> 00:16:46,994 that describes the format of the actual files 295 00:16:47,075 --> 00:16:49,343 and the actual structure of the data. 296 00:16:49,343 --> 00:16:53,631 So, you're right that there's both, "how do I find data about x" 297 00:16:53,631 --> 00:16:55,557 but also, "Now, how do I use it? 298 00:16:55,557 --> 00:17:00,211 How do I know where the temperature that I'm looking for 299 00:17:00,211 --> 00:17:03,013 is actually in the file?" 300 00:17:03,656 --> 00:17:05,394 (moderator) This will be the last. 301 00:17:06,887 --> 00:17:09,034 (audience 2) I'll have to make it relevant. 302 00:17:09,504 --> 00:17:15,667 So, you have shown this process of how users can suggest 303 00:17:15,667 --> 00:17:18,985 or like actually already put in properties, 304 00:17:18,985 --> 00:17:22,705 and I didn't fully understand how this thing works, 305 00:17:22,705 --> 00:17:24,027 or what's the process behind it. 306 00:17:24,027 --> 00:17:28,045 Is there some kind of folksonomy approach--obviously-- 307 00:17:28,045 --> 00:17:33,387 but how is it promoted into the core vocabulary 308 00:17:33,387 --> 00:17:36,255 if something is promoted? 309 00:17:36,255 --> 00:17:37,882 Yes, yes. It is. 310 00:17:37,882 --> 00:17:42,202 So what we do is we have a core ontology and the initial one was actually 311 00:17:42,202 --> 00:17:45,618 very thoughtfully put together through a lot of discussion 312 00:17:45,618 --> 00:17:47,964 by very few people. 313 00:17:47,964 --> 00:17:51,052 *And then the idea was the whole community can extend that* 314 00:17:51,052 --> 00:17:52,971 *or propose changes to that.* 315 00:17:52,971 --> 00:17:56,919 *So, as they are describing datasets, they can add new properties* 316 00:17:56,919 --> 00:17:59,526 *and those become "crowd properties."* 317 00:17:59,526 --> 00:18:02,941 *And every now and then, the Editorial Committee* 318 00:18:02,941 --> 00:18:04,367 *looks at all of those properties,* 319 00:18:04,367 --> 00:18:07,795 *the working groups look at all of those crowd properties,* 320 00:18:07,795 --> 00:18:11,714 *and decide whether to incorporate them into the main ontology.* 321 00:18:11,714 --> 00:18:15,804 *So it could be because they're used for a lot of dataset descriptions.* 322 00:18:15,804 --> 00:18:18,920 *It could be because they are proposed by somebody* 323 00:18:18,920 --> 00:18:23,339 *and they're found to be really interesting* *or key, or uncontroversial.* 324 00:18:23,339 --> 00:18:30,267 *So there's an entire editorial process to incorporate those new crowd properties* 325 00:18:30,267 --> 00:18:32,188 or the folksonomy part of it, 326 00:18:32,188 --> 00:18:36,308 but they are really built around the core of the ontology. 327 00:18:36,404 --> 00:18:40,280 The core ontology then grows with more crowd properties 328 00:18:40,280 --> 00:18:44,311 and then people propose additional crowd properties again. 329 00:18:44,311 --> 00:18:46,979 So we've gone through a couple of these iterations 330 00:18:46,979 --> 00:18:51,386 of rolling out a new core, and then extending it, 331 00:18:51,386 --> 00:18:55,570 and then rolling out a new core and then extending it. 332 00:18:55,570 --> 00:18:57,779 - (audience 2) Great. Thank you. - Thanks. 333 00:18:57,779 --> 00:19:00,437 (moderator) Thank you. (audience applauding) 334 00:19:02,295 --> 00:19:03,777 (moderator) Thank you, Yolanda. 335 00:19:03,777 --> 00:19:07,494 And now we have Adam Shorn with "Something About Wikibase," 336 00:19:07,599 --> 00:19:09,299 according to the title. 337 00:19:09,708 --> 00:19:12,956 Uh... where's the internet? There it is. 338 00:19:13,245 --> 00:19:18,925 So, I'm going to do a live demo, which is probably a bad idea 339 00:19:18,925 --> 00:19:21,362 *but I'm going to try and do it as the birthday present later* 340 00:19:21,362 --> 00:19:24,268 *so I figure I might as well try it here.* 341 00:19:24,292 --> 00:19:27,304 *And I also have some notes on my phone because I have no slides.* 342 00:19:29,349 --> 00:19:32,248 *So, two years ago, I made these Wikibase doc images* 343 00:19:32,248 --> 00:19:34,052 *that quite a few people have tried out,* 344 00:19:34,052 --> 00:19:38,087 *and even before then, I was working on another project,* 345 00:19:38,087 --> 00:19:42,363 *which is kind of ready now, and here it is.* 346 00:19:43,690 --> 00:19:46,832 It's a website that allows you * to instantly create a Wikibase* 347 00:19:46,900 --> 00:19:48,930 with a query service and quick statements, 348 00:19:48,930 --> 00:19:51,616 without needing to know about any of the technical details, 349 00:19:51,616 --> 00:19:54,295 without needing to manage any of them either. 350 00:19:54,295 --> 00:19:57,054 There are still lots of features to go and there's still some bugs, 351 00:19:57,054 --> 00:19:59,348 but here goes the demo. 352 00:19:59,348 --> 00:20:02,628 Let me get my emails up ready... because I need them too... 353 00:20:03,315 --> 00:20:06,514 Da da da... Stopwatch. 354 00:20:07,272 --> 00:20:08,488 Okay. 355 00:20:08,829 --> 00:20:14,253 So it's a simple as... at the moment it's locked down behind... 356 00:20:14,337 --> 00:20:16,495 Oh no! German keyboard! 357 00:20:16,495 --> 00:20:18,703 (audience laughing) 358 00:20:22,556 --> 00:20:23,923 Foiled... okay. 359 00:20:24,955 --> 00:20:26,214 Okay. 360 00:20:26,634 --> 00:20:28,417 (audience continues to laugh) 361 00:20:30,434 --> 00:20:31,989 Aha! Okay. 362 00:20:32,950 --> 00:20:35,335 I'll remember that for later. (laughs) 363 00:20:36,911 --> 00:20:38,119 Yes. 364 00:20:39,438 --> 00:20:40,855 ♪ (humming) ♪ 365 00:20:40,961 --> 00:20:44,932 Oh my god... now it's American. 366 00:20:53,871 --> 00:20:56,131 All you have to do is create an account... 367 00:20:58,570 --> 00:21:00,007 da da da... 368 00:21:00,566 --> 00:21:02,432 Click this button up here... 369 00:21:02,478 --> 00:21:05,512 Come up with a name for Wiki-- "Demo1" 370 00:21:05,862 --> 00:21:07,299 "Demo1" 371 00:21:07,568 --> 00:21:09,135 "Demo user" 372 00:21:09,203 --> 00:21:11,864 Agree to the terms which don't really exist yet. 373 00:21:12,298 --> 00:21:14,247 (audience laughing) 374 00:21:15,264 --> 00:21:17,698 Click on this thing which isn't a link. 375 00:21:21,519 --> 00:21:23,886 And then you have your Wikibase. 376 00:21:23,886 --> 00:21:26,602 (audience cheers and claps) 377 00:21:28,554 --> 00:21:30,421 *Anmelden* in German. 378 00:21:30,421 --> 00:21:35,126 Demo... oh god! I'm learning lots about my demo later. 379 00:21:35,569 --> 00:21:40,069 1-6-1-4-S-G... 380 00:21:40,166 --> 00:21:42,567 - (audience 3) Y... - (Adam) It's random. 381 00:21:43,016 --> 00:21:44,567 (audience laughing) 382 00:21:46,237 --> 00:21:47,958 Oh, come on.... (audience laughing) 383 00:21:48,001 --> 00:21:50,543 Oh no. It's because this is a capital U... 384 00:21:51,333 --> 00:21:53,283 (audience chattering) 385 00:21:54,453 --> 00:21:56,545 6-1-4.... 386 00:21:57,465 --> 00:22:01,248 S-G-ENJ... 387 00:22:01,623 --> 00:22:03,794 Is J... oh no. That's... oh yeah. Okay. 388 00:22:03,843 --> 00:22:06,242 I'm really... I'm gonna have to look at the laptop 389 00:22:06,242 --> 00:22:07,836 that I'm doing this on later. 390 00:22:07,836 --> 00:22:09,129 Cool... 391 00:22:11,046 --> 00:22:13,709 Da da da da da... 392 00:22:14,687 --> 00:22:17,040 Maybe I should have some things in my clipboard ready. 393 00:22:17,539 --> 00:22:19,093 Okay, so now I'm logged in. 394 00:22:22,631 --> 00:22:25,065 Oh... keyboards. 395 00:22:28,083 --> 00:22:30,012 So you can go and create an item... 396 00:22:36,194 --> 00:22:38,508 *Yeah, maybe I should make a video. It might be easier.* 397 00:22:38,927 --> 00:22:42,207 *So, yeah. You can make items, you have quick statements here* 398 00:22:42,207 --> 00:22:43,901 *that have... oh... it is all in German.* 399 00:22:43,901 --> 00:22:45,088 (audience laughing) 400 00:22:45,088 --> 00:22:46,297 (sighs) 401 00:22:46,926 --> 00:22:49,021 *Oh, log in? Log in?* 402 00:22:50,348 --> 00:22:52,088 *It has... Oh, set up ready.* 403 00:22:52,088 --> 00:22:53,482 *Da da da...* 404 00:22:55,965 --> 00:22:57,850 *It's as easy as...* 405 00:22:58,966 --> 00:23:01,350 *I learned how to use Quick Statements yesterday...* 406 00:23:01,350 --> 00:23:03,245 *that's what I know how to do.* 407 00:23:04,657 --> 00:23:07,089 *I can then go back to the Wiki...* 408 00:23:08,008 --> 00:23:09,804 *We can go and see in Recent Changes* 409 00:23:09,804 --> 00:23:11,942 *that there are now two items, the one that I made* 410 00:23:11,942 --> 00:23:13,759 *and the one from Quick Statements...* 411 00:23:13,759 --> 00:23:14,881 *and then you go to Quick...* 412 00:23:14,881 --> 00:23:16,511 ♪ (hums a tune) ♪ 413 00:23:17,637 --> 00:23:18,770 *Stop...no...* 414 00:23:18,927 --> 00:23:20,120 *No... * 415 00:23:20,454 --> 00:23:22,437 (audience laughing) 416 00:23:28,394 --> 00:23:30,006 *Oh god...* 417 00:23:30,061 --> 00:23:32,012 *I'm glad I tried this out in advance.* 418 00:23:33,464 --> 00:23:35,678 *There you go. And the query service is updated.* 419 00:23:35,830 --> 00:23:37,763 (audience clapping) 420 00:23:42,357 --> 00:23:45,359 *And the idea of this is it'll allow people to try out Wikibases.* 421 00:23:45,359 --> 00:23:48,493 *Hopefully, it'll even be able to allow people to...* 422 00:23:49,110 --> 00:23:50,945 *have their real Wikibases here.* 423 00:23:50,945 --> 00:23:53,783 At the moment you can create as many as you want 424 00:23:53,783 --> 00:23:55,653 and they all just appear in this lovely list. 425 00:23:55,653 --> 00:23:59,182 As I said, there's lots of bugs but it's all super quick. 426 00:23:59,914 --> 00:24:03,392 Exactly how this is going to continue in the future, we don't know yet 427 00:24:03,392 --> 00:24:05,757 because I only finished writing this in the last few days. 428 00:24:05,757 --> 00:24:09,286 It's currently behind an invitation code so that if you want to come try it out, 429 00:24:09,286 --> 00:24:10,888 come and talk to me. 430 00:24:11,645 --> 00:24:15,730 And if you have any other comments or thoughts, let me know. 431 00:24:15,861 --> 00:24:19,711 Oh, three minutes...40. That's... That's not that bad. 432 00:24:19,986 --> 00:24:21,022 Thanks. 433 00:24:21,022 --> 00:24:22,622 (audience clapping) 434 00:24:28,435 --> 00:24:30,006 Any questions? 435 00:24:31,020 --> 00:24:35,553 (audience 5) Does the Quick Statements and the Query Service 436 00:24:35,553 --> 00:24:38,602 are automatically updated? 437 00:24:39,553 --> 00:24:42,345 Yes. So the idea is that there will be somebody, 438 00:24:42,345 --> 00:24:43,500 at the moment, me, 439 00:24:43,500 --> 00:24:45,144 maintaining all of the horrible stuff 440 00:24:45,144 --> 00:24:47,290 that you don't have to behind the scenes. 441 00:24:47,657 --> 00:24:50,157 So kind of think of it like GitHub.com, 442 00:24:50,157 --> 00:24:54,058 but you don't have to know anything about Git to use it. It's just all there. 443 00:24:55,241 --> 00:24:56,886 - [inaudible] - Yeah, we'll get that. 444 00:24:56,886 --> 00:25:00,247 But any of those big hosted solution things. 445 00:25:00,833 --> 00:25:03,263 - (audience 6) A feature request. - Yes. 446 00:25:03,263 --> 00:25:05,479 Is there any-- In Scope 447 00:25:05,479 --> 00:25:09,799 do you have plans on making it so you can easily import existing... 448 00:25:09,799 --> 00:25:12,549 - Wikidata... - I have loads of plans. 449 00:25:12,549 --> 00:25:14,909 Like I want there to be a button where you can just import 450 00:25:14,909 --> 00:25:17,348 another whole Wikibase and all of--yeah. 451 00:25:17,436 --> 00:25:20,723 There will, in the future list that's really long. Yeah. 452 00:25:24,454 --> 00:25:28,406 (audience 7) I understand that it's... you want to make it user-friendly 453 00:25:28,406 --> 00:25:32,242 but if I want to access to the machine itself, can I do that? 454 00:25:32,242 --> 00:25:34,673 Nope. (audience laughing) 455 00:25:37,006 --> 00:25:40,863 So again, like, in the longer term future, there are possib... 456 00:25:40,863 --> 00:25:43,810 Everything's possible, but at the moment, no. 457 00:25:45,156 --> 00:25:49,743 (audience 8) Two questions. Is there a plan to have export tools 458 00:25:49,743 --> 00:25:52,791 so that you can export it to your own Wikibase maybe at some point? 459 00:25:52,791 --> 00:25:53,824 - Yes. - Great. 460 00:25:53,824 --> 00:25:55,565 And is this a business? 461 00:25:56,003 --> 00:25:58,164 I have no idea. (audience laughing) 462 00:26:00,015 --> 00:26:01,545 Not currently. 463 00:26:05,754 --> 00:26:08,451 (audience 9) What if I stop using it tomorrow, 464 00:26:08,451 --> 00:26:11,096 how long will the data be there? 465 00:26:11,181 --> 00:26:14,632 So my plan was at the end of WikidataCon I was going to delete all of the data 466 00:26:14,632 --> 00:26:18,060 and there's a Wikibase Workshop on a Sunday, 467 00:26:18,060 --> 00:26:21,671 and we will maybe be using this for the Wikibase workshop 468 00:26:21,671 --> 00:26:23,801 so that everyone can have their own Wikibase. 469 00:26:23,801 --> 00:26:27,366 And then, from that point, I probably won't be deleting the data 470 00:26:27,366 --> 00:26:29,008 so it will all just stay there. 471 00:26:31,763 --> 00:26:32,923 (moderator) Question. 472 00:26:34,524 --> 00:26:36,114 (audience 10) It's two minutes... 473 00:26:36,175 --> 00:26:39,505 Alright, fine. I'll allow two more questions if you talk quickly. 474 00:26:39,505 --> 00:26:41,550 (audience laughing) 475 00:26:47,370 --> 00:26:49,999 - Alright, good people. - Thank you, Adam. 476 00:26:49,999 --> 00:26:52,418 Thank you for letting me test my demo... I mean... 477 00:26:52,418 --> 00:26:54,640 I'm going to do it different. (audience clapping) 478 00:26:59,512 --> 00:27:00,753 (moderator) Thank you. 479 00:27:00,753 --> 00:27:03,869 Now we have Dennis Diefenbach presenting *Q Answer.* 480 00:27:04,489 --> 00:27:08,129 Hello, I'm Dennis Diefenbach, I would like to present *Q-Answer* 481 00:27:08,129 --> 00:27:11,392 which is a question-answering system on top of Wikidata. 482 00:27:11,392 --> 00:27:16,203 So, what we need are some questions and this is the interface of QAnswer. 483 00:27:16,203 --> 00:27:23,460 For example, where is WikidataCon? 484 00:27:23,901 --> 00:27:25,975 *Alright, I think it's written like this.* 485 00:27:27,432 --> 00:27:32,432 *2019... And we get this response which is Berlin.* 486 00:27:32,458 --> 00:27:38,425 *So, other questions. For example, "When did Wikidata start?"* 487 00:27:38,430 --> 00:27:42,383 *It started the 30 October 2012 so it's birthday is approaching.* 488 00:27:44,079 --> 00:27:48,014 *It is 6 years old, so it will be their 7th birthday.* 489 00:27:49,133 --> 00:27:51,583 *Who is developing Wikidata?* 490 00:27:51,583 --> 00:27:54,371 *The Wikimedia Foundation and Wikimedia Deutschland,* 491 00:27:54,371 --> 00:27:55,988 *so thank you very much to them.* 492 00:27:57,013 --> 00:28:02,947 *Something like museums in Berlin... I don't know why this is not so...* 493 00:28:05,494 --> 00:28:07,737 *Only one museum... no, yeah, a few more.* 494 00:28:09,167 --> 00:28:10,995 *So, when you ask something like this,* 495 00:28:10,995 --> 00:28:14,178 *we allow the user to explore the information* 496 00:28:14,178 --> 00:28:16,308 *with different aggregations.* 497 00:28:16,308 --> 00:28:18,953 *For example, if there are many geo coordinates* 498 00:28:18,953 --> 00:28:21,476 *attached to the entities, we will display a map.* 499 00:28:21,476 --> 00:28:26,357 *If there are many images attached to them,* *we will display the images,* 500 00:28:26,357 --> 00:28:29,057 *and otherwise there is a list where you can explore* 501 00:28:29,057 --> 00:28:30,855 *the different entities.* 502 00:28:33,236 --> 00:28:35,605 *You can ask something like "Who is the mayor of Berlin,"* 503 00:28:36,643 --> 00:28:40,201 *"Give me politicians born in Berlin," and things like this.* 504 00:28:40,201 --> 00:28:44,428 *So you can both ask keyword questions and foreign natural language questions.* 505 00:28:45,171 --> 00:28:48,604 *The whole data is coming from Wikidata* 506 00:28:48,604 --> 00:28:55,346 *so all entities which are in Wikidata are queryable by this service.* 507 00:28:55,869 --> 00:28:59,244 *And the data is really all from Wikidata* 508 00:28:59,244 --> 00:29:01,207 *in the sense, there are some Wikipedia snippets,* 509 00:29:01,207 --> 00:29:04,851 *there are images from Wikimedia Commons,* 510 00:29:04,851 --> 00:29:07,644 *but the rest is all Wikidata data.* 511 00:29:08,760 --> 00:29:11,678 *We can do this in several languages. This is now in Chinese.* 512 00:29:11,678 --> 00:29:15,441 *I don't know what is written there so do not ask me.* 513 00:29:15,441 --> 00:29:19,893 *We are currently supporting this languages* *with more or less good quality* 514 00:29:19,893 --> 00:29:22,094 *because... yeah.* 515 00:29:23,332 --> 00:29:27,563 *So, how can this be useful for the Wikidata community?* 516 00:29:27,968 --> 00:29:30,052 *I think there are different reasons.* 517 00:29:30,052 --> 00:29:33,786 *First of all, this thing helps you to generate SPARQL queries* 518 00:29:33,786 --> 00:29:37,043 *and I know there are even some workshops about how to use SPARQL.* 519 00:29:37,043 --> 00:29:39,444 *It's not a language that everyone speaks.* 520 00:29:39,444 --> 00:29:45,147 *So, if you ask something like "a philosopher born before 1908,"* 521 00:29:45,147 --> 00:29:48,697 *to figure out, to construct a SPARQL query like this could be tricky,* 522 00:29:50,001 --> 00:29:54,257 *In fact when you ask a question, we generate many SPARQL queries* 523 00:29:54,301 --> 00:29:57,486 *and the first one is always the thing, the SPARQL query where we think* 524 00:29:57,486 --> 00:29:59,008 *this is the good one.* 525 00:29:59,017 --> 00:30:02,651 *So, if you ask your question and then you go on SPARQL list,* 526 00:30:02,691 --> 00:30:06,468 *then there is this button for the Wikidata query service* 527 00:30:06,468 --> 00:30:11,811 *and you have the SPARQL query right there and you will get the same result* 528 00:30:11,811 --> 00:30:15,184 *as you would get in the interface.* 529 00:30:16,906 --> 00:30:19,289 *Another thing where it could be useful for* 530 00:30:19,289 --> 00:30:23,468 *is for finding missing contextual information.* 531 00:30:23,468 --> 00:30:27,057 *For example, if you ask for actors in "The Lord of the Rings,"* 532 00:30:27,057 --> 00:30:30,776 *most of these entities will have associated an image* 533 00:30:30,776 --> 00:30:32,490 *but not all of them.* 534 00:30:32,490 --> 00:30:37,861 *So here there is some missing metadata that could be added.* 535 00:30:37,861 --> 00:30:40,376 *You could go to this entity at an image* 536 00:30:40,376 --> 00:30:45,462 *and then see first that there is an image missing and so on.* 537 00:30:46,457 --> 00:30:52,047 *Another thing is that you could find schema issues.* 538 00:30:52,047 --> 00:30:55,424 *For example, if you ask "books by Andrea Camilleri,"* 539 00:30:55,428 --> 00:30:57,711 *which is a famous Italian writer,* 540 00:30:57,711 --> 00:30:59,981 *you would currently get these three books.* 541 00:30:59,981 --> 00:31:02,681 *But he wrote many more. He wrote more than 50.* 542 00:31:02,681 --> 00:31:05,701 *And so the question is, are they not in Wikidata* 543 00:31:05,701 --> 00:31:09,704 *or is maybe my knowledge not correctly currently like it is.* 544 00:31:09,704 --> 00:31:12,804 *And in this case, I know there is another book from him,* 545 00:31:12,804 --> 00:31:14,737 *which is "Un mese con Montalbano."* 546 00:31:14,737 --> 00:31:18,207 *It has only an Italian label so you can only search it in Italian.* 547 00:31:18,207 --> 00:31:22,103 *And if you go to this entity, you will say that he has written it.* 548 00:31:22,103 --> 00:31:27,504 *It's a short story by Andrea Camilleri and it's an instance of literary work,* 549 00:31:27,504 --> 00:31:29,220 *but it's not instance of book* 550 00:31:29,220 --> 00:31:31,338 *so that's the reason why it doesn't appear.* 551 00:31:31,338 --> 00:31:35,904 *This is a way to track where things are missing* 552 00:31:35,904 --> 00:31:37,499 *in the Wikidata model* 553 00:31:37,499 --> 00:31:39,539 *not as you would expect.* 554 00:31:40,794 --> 00:31:42,968 *Another reason is just to have fun.* 555 00:31:43,588 --> 00:31:47,546 *I imagine that many of you added many Wikidata entities* 556 00:31:47,546 --> 00:31:50,776 *so just search for the ones that you care most* 557 00:31:50,776 --> 00:31:52,529 *or you have edited yourself.* 558 00:31:52,529 --> 00:31:56,893 *So in this case, who developed QAnswer, and that's it.* 559 00:31:56,893 --> 00:32:00,226 *For any other questions, go to www.QAnswer.eu/qa* 560 00:32:00,226 --> 00:32:03,575 *and hopefully we'll find an answer for you.* 561 00:32:03,782 --> 00:32:05,649 (audience clapping) 562 00:32:13,994 --> 00:32:17,040 - Sorry. - I'm just the dumbest person here. 563 00:32:17,530 --> 00:32:22,722 (audience 11) So I want to know how is this kind of agnostic 564 00:32:22,752 --> 00:32:25,104 to Wikibase instance, 565 00:32:25,104 --> 00:32:29,020 or has it been tied to the exact like property numbers 566 00:32:29,020 --> 00:32:31,054 and things in Wikidata? 567 00:32:31,054 --> 00:32:33,442 Has it learned in some way or how was it set up? 568 00:32:33,442 --> 00:32:36,456 There is training data and we rely on training data 569 00:32:36,456 --> 00:32:40,585 and this is also most of the cases why you will not get good resutls. 570 00:32:40,585 --> 00:32:44,881 But we're training the system by the simple yes and no answer. 571 00:32:44,881 --> 00:32:48,936 When you ask a question, and we ask always for feedback, yes or no, 572 00:32:48,936 --> 00:32:51,899 and this feedback is used by the machine learning algorithm. 573 00:32:51,899 --> 00:32:54,124 This is where machine learning comes into play. 574 00:32:54,124 --> 00:32:58,600 But basically, we put up separate Wikibase instances 575 00:32:58,600 --> 00:33:00,482 and we can plug this in. 576 00:33:00,482 --> 00:33:04,249 In fact, the system is agnostic in the sense that it only wants RDF. 577 00:33:04,249 --> 00:33:06,618 And RDF, you have in each Wikibase, 578 00:33:06,618 --> 00:33:08,059 there are some few configurations 579 00:33:08,059 --> 00:33:10,432 but you can have this on top of any Wikibase. 580 00:33:11,654 --> 00:33:13,039 (audience 11) Awesome. 581 00:33:23,573 --> 00:33:27,004 (audience 12) You mentioned that it's being trained by yes/no answers. 582 00:33:27,073 --> 00:33:32,662 So I guess this is assuming that the Wikidata instance is free of errors 583 00:33:32,722 --> 00:33:34,356 or is it also...? 584 00:33:34,356 --> 00:33:37,140 You assume that the Wikidata instances... 585 00:33:37,140 --> 00:33:40,731 (audience 12) I guess I'm asking, like, are you distinguishing 586 00:33:40,731 --> 00:33:46,289 between source level errors or misunderstanding the question 587 00:33:46,289 --> 00:33:50,856 versus a bad mapping, etc.? 588 00:33:51,706 --> 00:33:55,474 Generally, we assume that the data in Wikidata is true. 589 00:33:55,474 --> 00:33:59,172 So if you click "no" and the data in Wikidata would be false, 590 00:33:59,172 --> 00:34:03,023 then yeah... we would not catch this difference. 591 00:34:03,023 --> 00:34:05,081 But sincerely, Wikidata quality is very good, 592 00:34:05,081 --> 00:34:08,231 so I rarely have had this problem. 593 00:34:16,592 --> 00:34:22,068 (audience 12) Is this data available as a dataset by any chance, sir? 594 00:34:22,209 --> 00:34:27,218 - What is... direct service? - The... dataset of... 595 00:34:27,218 --> 00:34:30,803 "is this answer correct versus the query versus the answer?" 596 00:34:30,872 --> 00:34:33,340 Is that something you're publishing as part of this? 597 00:34:33,340 --> 00:34:38,040 - The training data that you've... - We published the training data. 598 00:34:38,040 --> 00:34:43,423 We published some old training data but no, just a-- 599 00:34:44,573 --> 00:34:47,313 There is a question there. I don't know if we have still time. 600 00:34:51,215 --> 00:34:55,104 (audience 13) Maybe I just missed this but is it running on a live, 601 00:34:55,104 --> 00:34:57,080 like the Live Query Service, 602 00:34:57,080 --> 00:34:59,393 or is it running on some static dump you loaded 603 00:34:59,393 --> 00:35:01,690 or where is the data source for Wikidata? 604 00:35:01,784 --> 00:35:07,014 Yes. The problem is to apply this technology, 605 00:35:07,014 --> 00:35:08,414 you need a local dump. 606 00:35:08,414 --> 00:35:10,673 Because we do not rely only on the SPARQL end point, 607 00:35:10,673 --> 00:35:12,873 we rely on special indexes. 608 00:35:12,873 --> 00:35:16,192 So, we are currently loading the Wikidata dump. 609 00:35:16,192 --> 00:35:18,699 We are updating this every two weeks. 610 00:35:18,699 --> 00:35:20,756 We would like to do it more often, 611 00:35:20,756 --> 00:35:23,823 in fact we would like to get the difs for each day, for example, 612 00:35:23,823 --> 00:35:25,271 to put them in our index. 613 00:35:25,271 --> 00:35:28,719 But unfortunately, right now, the Wikidata dumps are released 614 00:35:28,719 --> 00:35:31,753 only once every week. 615 00:35:31,753 --> 00:35:35,150 So, we cannot be faster than that and we also need some time 616 00:35:35,150 --> 00:35:39,073 to re-index the data, so it takes one or two days. 617 00:35:39,073 --> 00:35:41,833 So we are always behind. Yeah. 618 00:35:48,202 --> 00:35:49,780 (moderator) Any more? 619 00:35:50,430 --> 00:35:53,268 - Okay, thank you very much. - Thank you all very much. 620 00:35:53,547 --> 00:35:54,966 (audience clapping) 621 00:35:57,266 --> 00:36:00,165 (moderator) And now last, we have Eugene Alvin Villar, 622 00:36:00,165 --> 00:36:02,049 talking about Panandâ. 623 00:36:10,630 --> 00:36:12,637 Good afternoon, my name is Eugene Alvin Villar 624 00:36:12,637 --> 00:36:15,297 and I'm from the Philippines, and I'll be talking about Panandâ: 625 00:36:15,297 --> 00:36:18,185 a mobile app powered by Wikidata. 626 00:36:18,862 --> 00:36:21,678 This is a follow-up to my lightning talk that I presented two years ago 627 00:36:21,678 --> 00:36:25,004 at WikidataCon 2017 together with Carlo Moskito. 628 00:36:25,004 --> 00:36:26,557 You can download the slides 629 00:36:26,557 --> 00:36:28,727 and there's a link to that presentation there. 630 00:36:28,727 --> 00:36:30,868 I'll give you a bit of a background. 631 00:36:30,868 --> 00:36:33,471 Wiki Society of the Philippines, formerly, Wikimedia Philippines, 632 00:36:33,471 --> 00:36:37,477 had a series of projects related to Philippine heritage and history. 633 00:36:37,477 --> 00:36:41,705 So we have the usual photo contests, *Wikipedia Takes Manila,* 634 00:36:41,705 --> 00:36:43,238 *Wiki Loves Monuments,* 635 00:36:43,238 --> 00:36:46,657 and then our media project was *Cultural Heritage Mapping Project* 636 00:36:46,657 --> 00:36:49,094 *back in 2014-2015.* 637 00:36:50,044 --> 00:36:53,039 *In that project, we trained volunteers to edit articles* 638 00:36:53,039 --> 00:36:54,389 *related to cultural heritage.* 639 00:36:54,914 --> 00:36:59,032 *This is our biggest, and most successful project that we had.* 640 00:36:59,032 --> 00:37:03,037 *794 articles were created or improved, including 37 "Did You Knows"* 641 00:37:03,037 --> 00:37:05,238 *and 4 "Good Articles,"* 642 00:37:05,308 --> 00:37:08,688 *and more than 5,000 images were uploaded to Commons.* 643 00:37:08,688 --> 00:37:11,039 *As a result of that, we then launched* 644 00:37:11,039 --> 00:37:13,689 *the* Encyclopedia of Philippine Heritage *program* 645 00:37:13,689 --> 00:37:18,444 *in order to expand the scope and also include Wikidata in the scope.* 646 00:37:18,444 --> 00:37:21,695 *Here's the Core Team: myself, Carlo and Roel.* 647 00:37:21,695 --> 00:37:26,870 *Our first pilot project was to document the country's historical markers* 648 00:37:26,870 --> 00:37:29,153 *in Wikidata and Commons,* 649 00:37:29,153 --> 00:37:34,053 *starting with those created by our historical national agency, NHCP.* 650 00:37:34,053 --> 00:37:38,904 *For example, they installed a marker for our national hero, here in Berlin,* 651 00:37:38,904 --> 00:37:41,421 *so there's no Wikidata page for that marker * 652 00:37:41,421 --> 00:37:45,102 *and a collection of photos of that marker in Commons.* 653 00:37:46,166 --> 00:37:50,397 *Unfortunately, the government agency does not keep a good database* 654 00:37:50,397 --> 00:37:53,480 *up-to-date or complete of their markers,* 655 00:37:53,480 --> 00:37:58,004 *so we have to painstakingly input these to Wikidata manually.* 656 00:37:58,004 --> 00:38:02,772 *After careful research and confirmation, here's a graph of the number of markers* 657 00:38:02,772 --> 00:38:07,466 *that we've added to Wikidata over time, over the past three years.* 658 00:38:07,466 --> 00:38:11,230 *And we've developed this Historical Markers Map web app* 659 00:38:11,230 --> 00:38:15,289 *that lets users view these markers on a map,* 660 00:38:15,289 --> 00:38:21,051 *so we can browse it as a list, view a good visualization of the markers* 661 00:38:21,051 --> 00:38:23,253 *with information and inscriptions.* 662 00:38:23,253 --> 00:38:28,885 *All of this is powered by Live Query from Wikidata Query Service.* 663 00:38:29,732 --> 00:38:32,005 *There's the link if you want to play around with it.* 664 00:38:33,349 --> 00:38:37,428 *And so we developed a mobile app for this one.* 665 00:38:37,428 --> 00:38:42,117 *To better publicize our project, I developed the* Panandâ 666 00:38:42,117 --> 00:38:45,434 *which is Tagalog for "marker", as an android app,* 667 00:38:45,434 --> 00:38:48,393 *that was published back in 2018,* 668 00:38:48,393 --> 00:38:53,934 *and I'll publish the IOS version sometime in the future, hopefully.* 669 00:38:54,868 --> 00:38:57,892 *I'd like to demo the app but we have no time,* 670 00:38:57,892 --> 00:39:00,935 *so here are some of the features of the app.* 671 00:39:00,935 --> 00:39:04,586 *There's a Map and a List view, with text search,* 672 00:39:04,586 --> 00:39:07,452 *so you can drill down as needed.* 673 00:39:07,452 --> 00:39:10,169 *You can filter by region or by distance,* 674 00:39:10,169 --> 00:39:12,193 *and whether you have marked these markers,* 675 00:39:12,193 --> 00:39:15,499 *as either you have visited them or you'd like to bookmark them* 676 00:39:15,499 --> 00:39:16,949 *for future visits.* 677 00:39:16,949 --> 00:39:19,482 *Then you can use your GPS on your mobile phone* 678 00:39:19,482 --> 00:39:21,860 *to use for distance filtering.* 679 00:39:21,860 --> 00:39:26,765 *For example, if I want markers that are near me, you can do that.* 680 00:39:26,765 --> 00:39:30,918 *And when you click on the Details page, you can see the same thing,* 681 00:39:30,918 --> 00:39:35,850 *photos from Commons, inscription about the marker,* 682 00:39:35,850 --> 00:39:40,484 *how to find the marker, its location and address, etc.* 683 00:39:41,601 --> 00:39:45,993 *And one thing that's unique for this app is you can, again, visit * 684 00:39:46,011 --> 00:39:50,407 *or put a bookmark of these, so on the map or on the list,* 685 00:39:50,407 --> 00:39:51,692 *or on the Details page,* 686 00:39:51,692 --> 00:39:54,891 *you can just tap on those buttons and say that you've visited them,* 687 00:39:54,891 --> 00:39:58,520 *or you'd like to bookmark them for future visits.* 688 00:39:58,520 --> 00:40:03,527 *And my app has been covered by the press and given recognition,* 689 00:40:03,527 --> 00:40:06,743 *so plenty of local press articles.* 690 00:40:06,743 --> 00:40:11,281 *Recently, it was selected as one of the Top 5 finalists* 691 00:40:11,281 --> 00:40:15,247 *for the Android Masters competition in the App for Social Good category.* 692 00:40:15,247 --> 00:40:17,351 *The final event will be next month.* 693 00:40:17,351 --> 00:40:18,999 *Hopefully, we'll win.* 694 00:40:20,380 --> 00:40:22,378 *Okay, so some behind the scenes.* 695 00:40:22,378 --> 00:40:25,477 *How did I develop this app?* 696 00:40:25,477 --> 00:40:28,578 *Panandâ is actually a hybrid app, it's not native.* 697 00:40:28,578 --> 00:40:30,745 *Basically it's just a web app packaged as a mobile app* 698 00:40:30,745 --> 00:40:32,518 *using Apache Cordova.* 699 00:40:32,518 --> 00:40:34,026 *That reduces development time* 700 00:40:34,026 --> 00:40:36,181 *because I don't have to learn a different language.* 701 00:40:36,181 --> 00:40:37,769 *I know JavaScript, HTML.* 702 00:40:37,879 --> 00:40:42,131 *It's cross-platform, allows code reuse from the Historical Markers Map.* 703 00:40:42,385 --> 00:40:46,311 *And the app is also FIN Open Source. under the MIT license.* 704 00:40:46,311 --> 00:40:49,429 *So there's the GitHub repository over there.* 705 00:40:50,469 --> 00:40:53,624 *The challenge is the apps data is not live.* 706 00:40:54,750 --> 00:40:56,820 *Because if you query the data live,* 707 00:40:56,843 --> 00:41:00,638 *it means you pulling around half a megabyte of compressed JSON every time* 708 00:41:00,638 --> 00:41:03,594 *which is not friendly for those on mobile data,* 709 00:41:03,594 --> 00:41:06,723 *incurs too much delay when starting the app, * 710 00:41:06,723 --> 00:41:13,097 *and if there are any errors in Wikidata, that may result in poor user experience.* 711 00:41:14,253 --> 00:41:18,046 *So instead, what I did was the app is updated every few months* 712 00:41:18,046 --> 00:41:20,468 *with fresh data, compiled using a Perl script* 713 00:41:20,468 --> 00:41:23,037 *that queries Wikidata Query Service,* 714 00:41:23,037 --> 00:41:25,678 *and this script also does some data validation* 715 00:41:25,678 --> 00:41:30,944 *to highlight consistency or schema errors,* *so that allows fixes before updates* 716 00:41:30,944 --> 00:41:34,735 *in order to provide a good experience for the mobile user.* 717 00:41:35,174 --> 00:41:39,274 *And here's the... if you're tech-oriented,* *here's the more or less,* 718 00:41:39,274 --> 00:41:41,644 *the technologies that I'm using.* 719 00:41:41,644 --> 00:41:43,976 *So a bunch of JavaScript libraries.* 720 00:41:43,976 --> 00:41:46,287 *Here's the first script that queries Wikidata,* 721 00:41:46,287 --> 00:41:48,598 *some Cordova plug-ins,* 722 00:41:48,598 --> 00:41:53,035 *and building it using Cordova and then publishing this app.* 723 00:41:53,763 --> 00:41:55,586 *And that's it.* 724 00:41:55,748 --> 00:41:58,164 (audience clapping) 725 00:42:01,800 --> 00:42:04,072 (moderator) I hope you win. Alright, questions. 726 00:42:16,286 --> 00:42:17,990 (audience 14) Sorry if I missed this. 727 00:42:17,990 --> 00:42:21,317 Are you opening your code so the people can adapt your app 728 00:42:21,317 --> 00:42:24,501 and do it for other cities? 729 00:42:24,501 --> 00:42:28,516 Yes, as I've mentioned, the app is free and open source, 730 00:42:28,516 --> 00:42:31,095 - (audience 14) But where is it? - There's the GitHub repository. 731 00:42:31,095 --> 00:42:33,610 You can download the slides, and there's a link 732 00:42:33,610 --> 00:42:36,841 in one of the previous slides to the repository. 733 00:42:36,841 --> 00:42:38,732 (audience 14) Okay. Can you put it? 734 00:42:42,392 --> 00:42:43,747 Yeah, at the bottom. 735 00:42:46,577 --> 00:42:49,222 (audience 15) Hi. Sorry, maybe I also missed this, 736 00:42:49,222 --> 00:42:51,628 but how do you check for a schema errors? 737 00:42:53,055 --> 00:42:56,007 Basically, we have a Wikiproject on Wikidata, 738 00:42:56,106 --> 00:43:02,425 so we try to put the other guidelines on how to model these markers correctly. 739 00:43:02,425 --> 00:43:05,190 Although it's not updated right now. 740 00:43:06,197 --> 00:43:09,023 As far as I know, we're the only country 741 00:43:09,023 --> 00:43:12,874 that's currently modeling these in Wikidata. 742 00:43:13,930 --> 00:43:20,152 There's also an effort to add [inaudible] 743 00:43:20,161 --> 00:43:22,411 in Wikidata, 744 00:43:22,474 --> 00:43:25,705 but I think that's a different thing altogether. 745 00:43:34,056 --> 00:43:35,895 (audience 16) So I guess this may be part 746 00:43:35,895 --> 00:43:37,725 of this Wikiproject you just described, 747 00:43:37,725 --> 00:43:42,800 but for the consistency checks, have you considered moving those 748 00:43:42,800 --> 00:43:46,743 into like complex schema constraints that then can be flagged 749 00:43:46,743 --> 00:43:50,583 on the Wikidata side for what there is to fix on there? 750 00:43:52,930 --> 00:43:55,547 I'm actually interested in seeing if I can do, for example, 751 00:43:55,598 --> 00:44:00,296 shape expressions, so that, yeah, we can do those things. 752 00:44:04,256 --> 00:44:06,776 (moderator) At this point, we have quite a few minutes left. 753 00:44:06,776 --> 00:44:09,026 The speakers did very well, so if Erica is okay with it, 754 00:44:09,026 --> 00:44:11,238 I'm also going to allow some time for questions, 755 00:44:11,238 --> 00:44:13,407 still about this presentation, but also about Mbabel, 756 00:44:13,407 --> 00:44:15,498 if anyone wants to jump in with something there, 757 00:44:15,498 --> 00:44:17,318 either presentation is fair game. 758 00:44:22,790 --> 00:44:25,639 Unless like me, you're all so dazzled that you just want to go to snacks 759 00:44:25,639 --> 00:44:27,955 and think about it. (audience giggles) 760 00:44:29,308 --> 00:44:31,179 - (moderator) You know... - Yeah. 761 00:44:31,953 --> 00:44:34,491 (audience 17) I will always have questions about everything. 762 00:44:34,491 --> 00:44:37,642 So, I came in late for the Mbabel tool. 763 00:44:37,642 --> 00:44:40,350 But I was looking through and I saw there's a number of templates, 764 00:44:40,350 --> 00:44:43,232 and I was wondering if there's a place to contribute 765 00:44:43,232 --> 00:44:45,564 to adding more templates for different types 766 00:44:45,564 --> 00:44:47,620 or different languages and the like? 767 00:44:50,497 --> 00:44:53,683 (Erica) So for now, we're developing those narrative templates 768 00:44:53,683 --> 00:44:55,566 on Portuguese Wikipedia. 769 00:44:55,566 --> 00:44:57,856 I can show you if you like. 770 00:44:57,856 --> 00:45:02,051 We're inserting those templates on English Wikipedia too. 771 00:45:02,051 --> 00:45:07,017 It's not complicated to do but we have to expand for other languages. 772 00:45:07,017 --> 00:45:08,236 - French? - French. 773 00:45:08,236 --> 00:45:10,465 - Yes. - French and German already have. 774 00:45:10,465 --> 00:45:11,465 (laughing) 775 00:45:12,002 --> 00:45:13,018 Yeah. 776 00:45:15,755 --> 00:45:18,287 (inaudible chatter) 777 00:45:21,756 --> 00:45:24,446 (audience 18) I also have a question about Mbabel, 778 00:45:24,446 --> 00:45:27,676 which is, is this really just templates? 779 00:45:27,676 --> 00:45:33,893 Is this based on the LUA scripting? Is that all? Wow. Okay. 780 00:45:33,956 --> 00:45:37,404 Yeah, so it's very deployable. Okay. Cool. 781 00:45:38,102 --> 00:45:40,199 (moderator) Just to catch that for the live stream, 782 00:45:40,199 --> 00:45:42,745 the answer was an emphatic nod of the head, and a yes. 783 00:45:42,915 --> 00:45:44,648 (audience laughing) 784 00:45:44,754 --> 00:45:47,203 - (Erica) Super simple. - (moderator) Super simple. 785 00:45:47,745 --> 00:45:49,819 (audience 19) Yeah. I would also like to ask. 786 00:45:49,819 --> 00:45:53,386 Sorry I haven't delved into Mbabel earlier. 787 00:45:53,386 --> 00:45:57,018 I'm wondering, you're working also with the links, the red links. 788 00:45:57,018 --> 00:46:00,052 Are you adding some code there? 789 00:46:03,987 --> 00:46:07,970 - (Erica) For the lists? - Wherever the link comes from... 790 00:46:07,970 --> 00:46:11,595 (audience 19) The architecture. Maybe I will have to look into it. 791 00:46:11,595 --> 00:46:13,355 (Erica) I'll show you later. 792 00:46:20,506 --> 00:46:23,221 (moderator) Alright. You're all ready for snack break, I can tell. 793 00:46:23,221 --> 00:46:24,456 So let's wrap it up. 794 00:46:24,456 --> 00:46:26,429 But our kind speakers, I'm sure will stick around 795 00:46:26,429 --> 00:46:27,958 if you have questions for them. 796 00:46:27,958 --> 00:46:31,179 Please join me in giving... first of all we didn't give a round of applause yet. 797 00:46:31,179 --> 00:46:33,221 I can tell you're interested in doing so. 798 00:46:33,221 --> 00:46:34,886 (audience clapping)