1 00:00:07,278 --> 00:00:11,778 [inaudible] and I have an effort called WikiLoop, 2 00:00:11,778 --> 00:00:15,368 and this is what I'm going to introduce to you about. 3 00:00:15,728 --> 00:00:22,604 We have presented WikiLoop, the idea, to several Wikimedia related conferences. 4 00:00:22,604 --> 00:00:25,017 How many of you have heard about WikiLoop before? 5 00:00:26,020 --> 00:00:27,040 Thanks. 6 00:00:27,040 --> 00:00:31,014 And how many of you have interacted with the datasets and toolings 7 00:00:31,014 --> 00:00:32,664 that we provided before? 8 00:00:33,308 --> 00:00:36,870 Okay, fairly new. So this will be mostly an introduction. 9 00:00:36,870 --> 00:00:42,008 So we would like to tell you why we start this initiative 10 00:00:42,008 --> 00:00:44,148 and what it intends to do, 11 00:00:44,148 --> 00:00:48,803 and how you can get involved or what it will go for. 12 00:00:50,390 --> 00:00:53,810 So, to begin with, we would like to give you an example. 13 00:00:53,810 --> 00:00:58,409 This is a vandalism that happened in Italian... 14 00:01:00,621 --> 00:01:03,623 that happened in Italy Wikipedia. 15 00:01:04,142 --> 00:01:06,935 I know that most people here are interested in Wikidata. 16 00:01:06,935 --> 00:01:09,780 I will tell you why this is relevant too. 17 00:01:10,137 --> 00:01:11,879 So basically what we found is 18 00:01:11,879 --> 00:01:15,970 that someone vandalized Wikipedia on Italian 19 00:01:15,970 --> 00:01:20,590 and says, "Bezos who cannot afford a car." 20 00:01:20,809 --> 00:01:22,666 And this is an interesting question, 21 00:01:23,799 --> 00:01:28,379 if you think about it, this is blatant obvious vandalism 22 00:01:28,379 --> 00:01:33,412 but when it comes to machines and algorithms 23 00:01:33,412 --> 00:01:37,881 which find to detect vandalism and avoid serving users the information, 24 00:01:38,309 --> 00:01:41,989 how can computer understand this kind of information, 25 00:01:41,989 --> 00:01:43,286 like it would be... 26 00:01:46,869 --> 00:01:49,180 we realize that sometimes there are limitations 27 00:01:49,180 --> 00:01:54,083 of how far algorithms can go and machine can go. 28 00:01:54,931 --> 00:01:57,666 Another example here is let's say, 29 00:01:57,666 --> 00:02:02,044 there is a word or label, or a category on Wikipedia says, 30 00:02:02,044 --> 00:02:06,077 someone, a person, is a Christian scientist. 31 00:02:06,077 --> 00:02:09,627 Now, given this label, what facts do you come up with 32 00:02:09,627 --> 00:02:13,815 like what would you infer from this category? 33 00:02:14,205 --> 00:02:18,586 Do you think it would be a "Christian" or do you think it would be a "scientist"? 34 00:02:18,981 --> 00:02:21,621 In this specific case-- it does not apply everywhere-- 35 00:02:21,621 --> 00:02:23,481 but it this specific case, 36 00:02:23,481 --> 00:02:26,991 there is a religion called "Christian Science," 37 00:02:26,991 --> 00:02:30,199 and people who hold that belief is called "Christian Scientist." 38 00:02:31,549 --> 00:02:34,891 And, again, for machines, how can we know, like 39 00:02:36,272 --> 00:02:40,392 even if many people here are big [fan] 40 00:02:40,392 --> 00:02:45,242 that's the better we make our data a knowledge machine-friendly 41 00:02:45,459 --> 00:02:51,709 the easier we can work and improve the overall knowledge accessibility 42 00:02:51,709 --> 00:02:54,139 and contribute together 43 00:02:54,139 --> 00:02:55,589 but there is always things 44 00:02:55,589 --> 00:02:58,449 that we believe that machine has restrictions. 45 00:03:00,136 --> 00:03:04,479 So all in all, we start to realize 46 00:03:04,479 --> 00:03:08,307 that coming from Internet companies 47 00:03:08,307 --> 00:03:10,690 who have a strong belief of our technology 48 00:03:10,690 --> 00:03:12,571 and what machine can do, 49 00:03:12,571 --> 00:03:16,222 there is always a gap or there is always something 50 00:03:16,222 --> 00:03:18,992 that we would need to rely on human being 51 00:03:18,992 --> 00:03:22,442 and more, we would need to rely on communities 52 00:03:22,753 --> 00:03:28,383 who are actively contributing, who are doing the peer reviews to our... 53 00:03:28,383 --> 00:03:30,163 collaborating with each other. 54 00:03:30,163 --> 00:03:36,082 So this is a picture about the background effort of WikiLoop. 55 00:03:36,595 --> 00:03:39,945 For the human being, they have the knowledge, 56 00:03:40,485 --> 00:03:46,205 we have our domain expertize and we can crosscheck each other 57 00:03:46,205 --> 00:03:48,503 but we just have that enough time. 58 00:03:49,333 --> 00:03:52,803 And there are many things that machine can empower this 59 00:03:52,803 --> 00:03:56,123 but there is restrictions there. 60 00:03:56,123 --> 00:03:58,643 So the goal is to empower 61 00:03:58,643 --> 00:04:03,039 or improve the productivity of human editors. 62 00:04:03,039 --> 00:04:08,633 But also the other side of the formula is we want to loop that back 63 00:04:08,634 --> 00:04:13,234 to the research and the academic efforts 64 00:04:13,234 --> 00:04:17,312 that improve how machine can help in these cases. 65 00:04:17,875 --> 00:04:22,580 So by raise of hand, how many of you have used Google? 66 00:04:23,870 --> 00:04:25,090 Thank you. 67 00:04:25,090 --> 00:04:26,380 And how many of you 68 00:04:26,900 --> 00:04:31,455 think that companies like Google and other big knowledge companies 69 00:04:31,455 --> 00:04:34,202 should contribute more to the knowledge world? 70 00:04:35,881 --> 00:04:37,707 So what happens is that... 71 00:04:37,707 --> 00:04:42,157 we all know that our mission at Google or other similar companies-- 72 00:04:42,157 --> 00:04:47,647 we have a strong background of leveraging the open knowledge world, 73 00:04:48,347 --> 00:04:50,107 like for Google specific case 74 00:04:50,107 --> 00:04:52,740 it's like organize the world's information. 75 00:04:52,740 --> 00:04:55,059 So we help disseminate the information, 76 00:04:56,207 --> 00:04:59,996 which in one sense that helps the mission of this movement. 77 00:04:59,996 --> 00:05:06,358 But only every once a while we have sporadic help 78 00:05:07,864 --> 00:05:12,103 trying to donate knowledge and datasets, and tools, 79 00:05:12,103 --> 00:05:16,223 and we want to see if we can make this sustainable, 80 00:05:18,323 --> 00:05:21,424 both in the technical sense 81 00:05:21,424 --> 00:05:23,234 and also in the business sense. 82 00:05:24,943 --> 00:05:29,639 So this is like a one-sentence introduction. 83 00:05:29,639 --> 00:05:34,885 We want WikiLoop to become an umbrella program 84 00:05:34,885 --> 00:05:37,084 for a series of technical projects 85 00:05:37,084 --> 00:05:39,632 intended to contribute datasets and toolings 86 00:05:39,632 --> 00:05:44,734 and hopefully make this a community effort with participation of 87 00:05:44,734 --> 00:05:50,154 other likeminded people, partners and institutions 88 00:05:50,154 --> 00:05:52,410 to join with this effort. 89 00:05:52,410 --> 00:05:56,204 There are several projects that we think would be a good fit, 90 00:05:56,204 --> 00:05:59,204 and these are the criteria. 91 00:05:59,204 --> 00:06:04,281 First of all, the idea is that it needs to be source improvements 92 00:06:04,281 --> 00:06:07,251 or source improvements by and large is a good fit, 93 00:06:07,251 --> 00:06:10,801 and also the second thing that companies like us 94 00:06:10,801 --> 00:06:13,941 really cannot do very well by ourself 95 00:06:13,941 --> 00:06:17,691 is to maximize the neutrality, to avoid picking sides 96 00:06:17,691 --> 00:06:21,611 on the controversies, decisions or discussions 97 00:06:21,611 --> 00:06:26,945 and another thing is that to make this in the long-term sustainability 98 00:06:26,945 --> 00:06:31,705 and to keep it being supported by this industry. 99 00:06:31,705 --> 00:06:35,017 We want to see the productivity, the scalability 100 00:06:35,017 --> 00:06:37,632 of our contribution and efforts. 101 00:06:38,444 --> 00:06:41,078 To explain a little bit more... 102 00:06:41,584 --> 00:06:43,570 We always look trying to extract... 103 00:06:43,570 --> 00:06:47,061 for example, we are trying to extract facts from Wikipedia. 104 00:06:47,417 --> 00:06:52,539 And while we can do several separations, 105 00:06:52,539 --> 00:06:55,704 we're labeling, fairly well, 106 00:06:56,315 --> 00:06:59,915 up to certain point the bottleneck is no longer 107 00:06:59,915 --> 00:07:02,475 how good the machine, the algorithm can reach 108 00:07:02,475 --> 00:07:06,117 but sometimes there is a noise in the source, 109 00:07:06,117 --> 00:07:10,917 and if we do not remove the source 110 00:07:10,917 --> 00:07:13,624 or minimize the source noise there, 111 00:07:13,624 --> 00:07:15,634 that's how far the machine can go. 112 00:07:15,634 --> 00:07:18,024 So that's the first criteria. 113 00:07:18,024 --> 00:07:19,383 And the second criteria is, 114 00:07:19,383 --> 00:07:24,492 we don't want to get to be seen as buyers or introduce potential buyers. 115 00:07:24,492 --> 00:07:29,822 We want to rely on governance that is peer reviewed 116 00:07:29,822 --> 00:07:32,686 and that is done by the community 117 00:07:32,686 --> 00:07:36,570 so that we can avoid picking sides in the controversy questions. 118 00:07:37,319 --> 00:07:40,809 And the third thing which probably not so intuitive 119 00:07:40,809 --> 00:07:43,309 but this is the kind of... I would like... 120 00:07:43,309 --> 00:07:48,039 Let me give you an example of the projects we have in mind. 121 00:07:48,435 --> 00:07:51,665 Let's say there are smaller, minority language there. 122 00:07:51,665 --> 00:07:55,940 I have heard a very good talk earlier this morning. 123 00:07:55,940 --> 00:07:58,460 But one idea we have here is, 124 00:07:58,460 --> 00:08:02,050 let's say you are a minority language contributor, very active, 125 00:08:02,050 --> 00:08:07,063 and you want to advocate for your culture and supporting your knowledge creation. 126 00:08:07,607 --> 00:08:11,747 But because companies like Google or other consumer company, 127 00:08:11,747 --> 00:08:14,795 they have a bar for releasing a translation, 128 00:08:14,795 --> 00:08:16,165 to make it available. 129 00:08:16,165 --> 00:08:18,837 They want the precision to be high enough 130 00:08:18,837 --> 00:08:21,594 so that they can use it to serve users. 131 00:08:22,568 --> 00:08:26,568 But maybe internally they have AI modules that are experimenting, 132 00:08:26,568 --> 00:08:28,914 not good enough to the bar 133 00:08:28,914 --> 00:08:31,494 because lack of training data, 134 00:08:32,734 --> 00:08:34,834 so the translation is not available. 135 00:08:34,834 --> 00:08:38,080 But the community is doing the translation by hand anyway. 136 00:08:39,160 --> 00:08:41,170 Now, one of the things we are thinking of, 137 00:08:41,170 --> 00:08:45,170 if we can provide some of this experimental thing 138 00:08:45,170 --> 00:08:47,660 that is not good enough to serve general user purpose 139 00:08:47,660 --> 00:08:50,350 but still good for the community 140 00:08:50,350 --> 00:08:53,558 and somewhat improving the productivity, 141 00:08:53,811 --> 00:08:55,731 it would be able to 142 00:08:55,731 --> 00:09:01,381 one, improve the speed of how well a community can contribute, 143 00:09:01,381 --> 00:09:06,231 and second, what a community is creating anyway can come back as a training data 144 00:09:06,231 --> 00:09:08,881 that keeps bootstrapping the machines. 145 00:09:10,376 --> 00:09:15,406 So over time by this effort we hope to generate a model 146 00:09:15,673 --> 00:09:19,463 that both helps the human being, the editors, 147 00:09:19,463 --> 00:09:22,246 but also helps the research 148 00:09:22,246 --> 00:09:26,765 that improves the AI and other approaches. 149 00:09:28,489 --> 00:09:31,549 And this is a big overview of a few projects 150 00:09:31,549 --> 00:09:33,509 we are going to introduce. 151 00:09:33,509 --> 00:09:36,539 Due to the time limitation I will feature a few. 152 00:09:36,539 --> 00:09:41,492 The WikiLoop Game, which you can look up, 153 00:09:41,492 --> 00:09:46,732 is one that we leveraged a platform 154 00:09:46,732 --> 00:09:50,057 created by Magnus called Wikidata Game. 155 00:09:50,057 --> 00:09:54,847 We provide several datasets there to be played, to be introduced 156 00:09:54,847 --> 00:09:56,677 and commit to the Wikidata 157 00:09:56,677 --> 00:09:58,867 but by the human review. 158 00:09:59,727 --> 00:10:03,947 And Google doesn't get to contribute data directly 159 00:10:03,947 --> 00:10:06,257 to Wikipedia or Wikidata 160 00:10:06,257 --> 00:10:12,269 but having someone who is reviewing it as non-biased individuals to do so. 161 00:10:12,550 --> 00:10:16,620 And the second one I'm going to feature is WikiLoop Battlefield, 162 00:10:16,620 --> 00:10:21,420 the one that you have seen just now as a counter-vandalism platform, 163 00:10:21,420 --> 00:10:25,629 and this one also features the same criteria 164 00:10:25,629 --> 00:10:28,029 of source improvements, 165 00:10:29,918 --> 00:10:33,328 of how it can empower machines 166 00:10:33,328 --> 00:10:38,794 by looping back to the training data 167 00:10:38,794 --> 00:10:43,064 and also how it avoids companies like us 168 00:10:43,064 --> 00:10:48,526 to pick sides allowing way to rely on the community's assessment. 169 00:10:48,526 --> 00:10:53,517 And the third one is CitePool, which is creating... 170 00:10:53,517 --> 00:10:58,469 we're trying to help creating citation candidate pool 171 00:10:58,469 --> 00:11:02,731 to improve the productivity of people who want to add citation 172 00:11:02,731 --> 00:11:04,721 but also see if we can make that 173 00:11:04,721 --> 00:11:09,569 into a training data accessible to researchers. 174 00:11:10,010 --> 00:11:13,120 So let me use WikiLoop Battlefield as an example. 175 00:11:13,120 --> 00:11:18,427 If you have... try it on your phone-- battlefield.wikiloop.org. 176 00:11:18,427 --> 00:11:21,575 By the way, I want to highlight, the name is subject to change 177 00:11:21,575 --> 00:11:25,870 because some friendly community members have come to me and suggest 178 00:11:25,870 --> 00:11:32,224 that Battlefield might not be the best name for a project 179 00:11:32,224 --> 00:11:34,653 serving the Wikimedia movement. 180 00:11:34,952 --> 00:11:39,542 So if you don't like this name, come join us in the discussion, 181 00:11:39,542 --> 00:11:40,984 provide your suggestion, 182 00:11:40,984 --> 00:11:44,499 we will be very happy to converge to a name 183 00:11:44,499 --> 00:11:48,111 that has community consensus and popularity. 184 00:11:48,244 --> 00:11:51,166 But let's use that as a placeholder here. 185 00:11:52,885 --> 00:11:56,500 I don't need to introduce to this group of people 186 00:11:56,500 --> 00:11:59,097 about the typical vandalism workflow 187 00:11:59,820 --> 00:12:03,400 but if you have already... 188 00:12:04,934 --> 00:12:08,886 trying to conduct some counter-vandalism activity, 189 00:12:08,886 --> 00:12:11,566 you might know that it's not very trivial. 190 00:12:11,566 --> 00:12:16,413 How many of you have seen vandalism on Wikipedia and Wikidata? 191 00:12:16,992 --> 00:12:22,329 Okay, how many of you have reverted, by hand, some of them? 192 00:12:22,890 --> 00:12:27,680 How many of you have used certain tools or go ahead and find certain tools 193 00:12:27,680 --> 00:12:30,875 to patrol or revert vandalism? 194 00:12:31,407 --> 00:12:32,497 Okay. 195 00:12:33,474 --> 00:12:36,124 Cool, this is the highest density of people 196 00:12:36,124 --> 00:12:41,264 who have tried to revert vandalism 197 00:12:41,264 --> 00:12:43,625 that I have spoken to before. 198 00:12:44,336 --> 00:12:48,756 So maybe some of you have been very comfortably doing that 199 00:12:48,756 --> 00:12:52,966 but for me as someone who started editing actively 200 00:12:53,808 --> 00:12:57,348 only since like three years ago 201 00:12:57,562 --> 00:13:03,439 and who only started to be very serous doing vandalism detection and patrolling 202 00:13:03,879 --> 00:13:06,191 only since about last year 203 00:13:06,428 --> 00:13:10,836 I found that doing so is not super easy 204 00:13:10,836 --> 00:13:14,161 on the world of Wikimedia movement. 205 00:13:15,080 --> 00:13:21,761 If we look at the existing alternatives 206 00:13:21,761 --> 00:13:25,761 there are tools that is built featuring desktops, 207 00:13:25,761 --> 00:13:30,748 there are tools that is relying on users who have rollback permissions, 208 00:13:30,748 --> 00:13:33,976 which itself is a big barrier to get. 209 00:13:35,248 --> 00:13:39,097 We want to make this a super easy to use platform 210 00:13:39,097 --> 00:13:41,637 for all the three roles. 211 00:13:41,637 --> 00:13:46,017 The first one is user, reviewer or editor, whatever you call it. 212 00:13:46,612 --> 00:13:48,460 The second one is researcher 213 00:13:48,460 --> 00:13:52,982 who is trying to create vandalism detecting algorithms or systems. 214 00:13:52,982 --> 00:13:54,732 And the third one is developers 215 00:13:54,732 --> 00:13:59,573 who is trying to improve this WikiLoop Battlefield tooling. 216 00:13:59,573 --> 00:14:02,241 We want it to be super easy for user to use. 217 00:14:02,241 --> 00:14:04,970 You can you pull up your phone, you don't have to install it, 218 00:14:04,970 --> 00:14:07,168 you can do in on your laptop. 219 00:14:07,168 --> 00:14:10,170 And we also want to lower a barrier to review. 220 00:14:10,170 --> 00:14:16,650 The reason why other tools are trying to limit the access to the tool 221 00:14:16,650 --> 00:14:22,250 is because there needs to be a base trust level for people to use them. 222 00:14:22,250 --> 00:14:26,634 You don't want someone to come to a counter-vandalism tool 223 00:14:26,634 --> 00:14:28,226 to vandalize itself. 224 00:14:29,259 --> 00:14:32,479 So what we are trying to do is that, 225 00:14:32,479 --> 00:14:34,489 to begin with, we want to make it super easy 226 00:14:34,489 --> 00:14:39,522 but also we want to allow multiple people to label the same thing. 227 00:14:39,968 --> 00:14:42,258 Also we want to make it super convenient 228 00:14:42,258 --> 00:14:48,240 to see the [inaudible], to see other label, and all in real time. 229 00:14:48,438 --> 00:14:52,317 We also want to make it for researchers super easy to use. 230 00:14:52,317 --> 00:14:55,227 By one click you can download the labeling 231 00:14:55,227 --> 00:15:01,356 and maybe start play with the data and see how it fits in your model. 232 00:15:01,502 --> 00:15:06,129 And we provide APIs that have access to real time data. 233 00:15:06,758 --> 00:15:10,448 And for the developer we make it very easy to pick up-- 234 00:15:10,448 --> 00:15:15,433 we have one click-- you can deploy your trial instances, 235 00:15:15,433 --> 00:15:16,726 things like that. 236 00:15:17,100 --> 00:15:20,820 This is an example about building projects 237 00:15:20,820 --> 00:15:23,191 for umbrella like WikiLoop. 238 00:15:23,191 --> 00:15:27,637 We want to make sure the community trust comes the first. 239 00:15:27,947 --> 00:15:31,336 We usually need to make it open source the best. 240 00:15:31,800 --> 00:15:37,478 And we want to avoid proprietary tech, we want to avoid tech lock-down, 241 00:15:37,778 --> 00:15:42,999 and we rely on community approval for certain features. 242 00:15:44,366 --> 00:15:49,474 And if you have seen this-- this is the components that we rely on-- 243 00:15:49,474 --> 00:15:56,207 still very early stage but you get the principles behind the design. 244 00:15:56,438 --> 00:16:00,288 So what's next, we are trying to grow our usage. 245 00:16:00,288 --> 00:16:02,458 Hopefully you can try it out by yourself 246 00:16:02,458 --> 00:16:06,726 and promise to me that you don't click on the login. 247 00:16:07,782 --> 00:16:09,132 There is a login button-- 248 00:16:09,132 --> 00:16:10,452 there will be some good features 249 00:16:10,452 --> 00:16:13,292 that make it super easy to even revert something. 250 00:16:13,292 --> 00:16:15,452 Currently it's still a jump to revert. 251 00:16:16,714 --> 00:16:18,444 But we are building features, 252 00:16:18,444 --> 00:16:23,954 and we are also trying to let you choose some categories 253 00:16:23,954 --> 00:16:26,656 or the watchlist that you will be watching 254 00:16:26,656 --> 00:16:31,366 and the one that you care about to patrol. 255 00:16:31,775 --> 00:16:38,069 And also if you are researchers while doing related vandalism detection, 256 00:16:38,362 --> 00:16:41,580 try our data and give us feedback. 257 00:16:44,411 --> 00:16:47,181 And I will go through quickly about a few other projects 258 00:16:47,181 --> 00:16:48,731 that we are featuring here 259 00:16:48,731 --> 00:16:52,171 and we will look for questions and feedback from you 260 00:16:52,171 --> 00:16:57,976 about what we think and what you think should be there 261 00:16:57,976 --> 00:17:01,550 or how we should fix things if it doesn't work right. 262 00:17:01,843 --> 00:17:06,163 Wikidata Game is a platform built by a community member Magnus, 263 00:17:06,163 --> 00:17:08,913 a celebrity in this community, I think. 264 00:17:09,891 --> 00:17:13,371 And by showing this we are providing datasets 265 00:17:13,371 --> 00:17:19,748 but we also want to let people know that we are not reinventing the wheels, 266 00:17:19,748 --> 00:17:21,368 that we are not trying to... 267 00:17:21,368 --> 00:17:24,168 When we come up with some idea, we look into with community 268 00:17:24,168 --> 00:17:27,028 and see if there is existing tools that's there 269 00:17:27,028 --> 00:17:30,198 and how we can be a part of the ecosystem 270 00:17:30,198 --> 00:17:35,692 rather than building everything independently and everything separately. 271 00:17:36,661 --> 00:17:38,721 And this is the current status. 272 00:17:39,624 --> 00:17:42,668 By early results, we show that Wikidata... 273 00:17:44,945 --> 00:17:47,075 a few games that we released 274 00:17:47,075 --> 00:17:51,747 have triggered and proved activity on the entities related 275 00:17:52,546 --> 00:17:54,646 and a few follow up. 276 00:17:54,646 --> 00:17:57,261 One thing that we have come up with, 277 00:17:57,261 --> 00:17:59,971 as I have talked to a few community members 278 00:17:59,971 --> 00:18:02,388 is the PreCheck idea 279 00:18:02,388 --> 00:18:09,088 that is basically providing preliminary check about bulk uploads, 280 00:18:09,088 --> 00:18:12,268 sampled preliminary check by community member 281 00:18:12,268 --> 00:18:14,478 and use that to generate a report, 282 00:18:14,478 --> 00:18:16,185 make it easier for discussions 283 00:18:16,185 --> 00:18:20,445 about whether this big block of Wikidata datasets 284 00:18:20,445 --> 00:18:25,095 should be included or uploaded to wikidata.org 285 00:18:25,095 --> 00:18:27,484 or it should be rechecked or fixed. 286 00:18:30,994 --> 00:18:35,884 And there is another project that is mostly a dataset project 287 00:18:35,884 --> 00:18:37,300 called CatFacts. 288 00:18:37,572 --> 00:18:42,642 CatFacts is datasets that we generate 289 00:18:42,642 --> 00:18:45,552 about facts from categories, 290 00:18:45,552 --> 00:18:50,231 the one that you see, the Christian Scientist, just now 291 00:18:50,803 --> 00:18:56,495 is actually an interesting outlier of data points 292 00:18:56,495 --> 00:18:58,344 from this effort. 293 00:18:58,344 --> 00:19:01,861 This goal is to generate the facts from category 294 00:19:01,861 --> 00:19:07,363 which we think have been very rich facts online that people... 295 00:19:07,731 --> 00:19:10,087 that has been under leverage. 296 00:19:10,321 --> 00:19:13,621 But before it can be fully leveraged 297 00:19:13,621 --> 00:19:17,311 we need to make sure that quality is good enough as well 298 00:19:17,311 --> 00:19:22,261 and there is efforts of putting it onto Wikidata Game 299 00:19:22,261 --> 00:19:23,861 and there is effort that we're thinking 300 00:19:23,861 --> 00:19:27,110 maybe building PreCheck would help as well. 301 00:19:27,611 --> 00:19:29,741 And it's still in early stage. 302 00:19:29,741 --> 00:19:34,041 Feel free to come to talk us about other efforts, 303 00:19:34,041 --> 00:19:37,991 other ideas you think about datasets we could provide. 304 00:19:38,499 --> 00:19:41,539 The Bot, which is communication tools. 305 00:19:41,539 --> 00:19:45,149 We know that Bot can do many things like writhing Wikipedia article 306 00:19:45,149 --> 00:19:49,841 but we promised that we don't write actual article 307 00:19:49,841 --> 00:19:52,597 but we mostly use it 308 00:19:52,911 --> 00:19:58,329 as a way to communicate from, let's say, user talk 309 00:19:58,817 --> 00:20:04,397 to give us access to large scale conversations 310 00:20:04,397 --> 00:20:06,103 with the community members. 311 00:20:06,416 --> 00:20:09,686 Explorer is going to show all our datasets, 312 00:20:09,686 --> 00:20:11,879 our toolings, their stats 313 00:20:11,879 --> 00:20:15,491 and queries you can run on our things. 314 00:20:15,491 --> 00:20:18,238 Stay tuned, this one is releasing soon. 315 00:20:18,960 --> 00:20:20,933 And we have several other ideas 316 00:20:20,933 --> 00:20:24,003 but I would jump to this overall portfolio. 317 00:20:24,003 --> 00:20:28,443 It would be several projects to begin with datasets and tooling, 318 00:20:28,443 --> 00:20:30,338 and what we are doing currently 319 00:20:30,338 --> 00:20:33,190 is Explorer, Battlefield, CatFacts and PageRank, 320 00:20:33,190 --> 00:20:39,600 and there are some other upcoming ideas like PreCheck, CitePool and Bubbles. 321 00:20:41,294 --> 00:20:46,494 And this is one of the diagrams 322 00:20:46,494 --> 00:20:48,574 that I want to show you. 323 00:20:48,994 --> 00:20:53,385 We want to not only use one individual project 324 00:20:53,385 --> 00:20:54,734 to contribute the community 325 00:20:54,734 --> 00:20:58,007 and also generate the training data for the research, academia, 326 00:20:58,007 --> 00:21:00,807 we also have an idea 327 00:21:00,807 --> 00:21:04,519 that these projects may work together. 328 00:21:05,676 --> 00:21:08,976 For example, the CitePool, the system that we want to build 329 00:21:08,976 --> 00:21:15,352 to allow people to easier find citations for Wikipedia articles or Wikidata 330 00:21:15,887 --> 00:21:19,316 but also use the Explorer to display the result-- 331 00:21:19,499 --> 00:21:23,079 it depends on the page rank scorances of datasets 332 00:21:23,830 --> 00:21:30,284 to determine how to rank the citation page that we will recommend 333 00:21:30,423 --> 00:21:35,630 and use the PreCheck to do quality, sanity check 334 00:21:35,630 --> 00:21:40,235 and maybe create bulk batch reports by Bot 335 00:21:40,235 --> 00:21:44,255 and the PreCheck will depend on the Game as well. 336 00:21:50,727 --> 00:21:52,566 If some of our community friends 337 00:21:52,566 --> 00:21:55,476 have been following the progress of WikiLoop, 338 00:21:55,476 --> 00:21:59,005 we have been through ice-breaking phase, 339 00:21:59,655 --> 00:22:02,335 we were trying to earn the community trust 340 00:22:02,335 --> 00:22:06,152 because we know how cautious we need to be 341 00:22:06,152 --> 00:22:09,575 coming to contribute to a movement 342 00:22:09,575 --> 00:22:14,704 that relies so much on the neutrality and non-bias policies. 343 00:22:14,999 --> 00:22:19,539 And we have gradually start to have ideas 344 00:22:19,539 --> 00:22:22,545 about tools and datas and find the direction 345 00:22:22,545 --> 00:22:25,974 of how we can possibly make this sustainable. 346 00:22:26,231 --> 00:22:31,880 And we are looking into creating long-term sustainability, 347 00:22:31,880 --> 00:22:34,853 both internally and also externally, 348 00:22:35,160 --> 00:22:38,654 both in terms of getting resource and getting support, 349 00:22:39,024 --> 00:22:44,545 also externally of getting engagement, getting usage, and getting contributors, 350 00:22:45,568 --> 00:22:48,122 starting from next quarter. 351 00:22:49,364 --> 00:22:53,066 I want to quote Evan You, who is a creator 352 00:22:53,066 --> 00:22:58,588 of popular frontend framework Vue.js, 353 00:22:58,588 --> 00:23:01,154 "Software development gets tremendously harder 354 00:23:01,154 --> 00:23:05,504 when you start to have to convince people instead of just writing the code." 355 00:23:05,504 --> 00:23:08,891 This applies to editing Wikipedia or Wikidata. 356 00:23:08,891 --> 00:23:13,261 It's very easy to click a button and add individual articles 357 00:23:13,261 --> 00:23:18,879 but also it's very hard when you need to convince people. 358 00:23:23,330 --> 00:23:27,440 I hope to leave some time for questions, 359 00:23:27,440 --> 00:23:31,893 although we only have few, probably one or two minutes. 360 00:23:33,229 --> 00:23:35,993 Yes, so we have about two minutes. 361 00:23:35,993 --> 00:23:39,085 So if people want to shout questions out, I'll bring the mic over. 362 00:23:40,539 --> 00:23:41,969 Hands up maybe. 363 00:23:45,433 --> 00:23:50,273 (person 1) So where would I go to at this moment if I would like to use this 364 00:23:50,273 --> 00:23:53,563 to solve some of the problem with chemicals, 365 00:23:53,563 --> 00:23:56,553 where some Wikipedia pages about chemicals, 366 00:23:56,553 --> 00:23:59,663 they have a chem box about a specific chemical 367 00:23:59,663 --> 00:24:03,523 but are otherwise about a class of chemicals. 368 00:24:03,523 --> 00:24:05,746 Is that something where WikiLoop could help? 369 00:24:07,750 --> 00:24:12,923 I think that's the individual domain expertize part, right? 370 00:24:12,923 --> 00:24:15,523 If you are talking about topics of articles 371 00:24:15,523 --> 00:24:18,701 that are associated with specific topics. 372 00:24:18,701 --> 00:24:21,131 We are trying to... we might be able to help 373 00:24:21,131 --> 00:24:26,301 but we are trying to tackle the problem that is like more general currently. 374 00:24:26,301 --> 00:24:32,531 And overall the goal is to find the possibility of 375 00:24:35,201 --> 00:24:39,354 empowering human beings productivity 376 00:24:39,354 --> 00:24:42,204 and also trying to generate the knowledge 377 00:24:42,204 --> 00:24:44,469 that potentially helps... 378 00:24:44,469 --> 00:24:47,419 the training data that potentially helps the algorithms. 379 00:24:49,682 --> 00:24:52,231 (person 2) I think we have time for a very quick one. 380 00:24:55,292 --> 00:24:58,637 (person 3) Are you also going to do this for search of data on Commons? 381 00:24:59,522 --> 00:25:01,096 Yeah, we hope to... 382 00:25:01,096 --> 00:25:05,239 If you are referring to Battlefield or counter-vandalism tools, 383 00:25:06,451 --> 00:25:11,615 yeah, we are planning to expand it to other Wiki projects, 384 00:25:11,615 --> 00:25:14,032 including Commons in Wikidata. 385 00:25:15,280 --> 00:25:17,240 (person 2) I think that's all the questions we have time for 386 00:25:17,240 --> 00:25:19,800 but if you'd like to show your appreciation for [Victor.] 387 00:25:19,802 --> 00:25:20,932 Thank you. 388 00:25:20,932 --> 00:25:24,612 (applause)