1 00:00:00,000 --> 00:00:15,030 *34C3 preroll music* 2 00:00:15,030 --> 00:00:22,570 Herald: Hello fellow creatures. Welcome and 3 00:00:22,570 --> 00:00:30,140 I wanna start with a question. Another one: Who do we trust? 4 00:00:30,140 --> 00:00:36,500 Do we trust the TrustZones on our smartphones? 5 00:00:36,500 --> 00:00:41,710 Well Keegan Ryan, he's really fortunate to be here and 6 00:00:41,710 --> 00:00:51,710 he was inspired by another talk from the CCC before - I think it was 29C3 and his 7 00:00:51,710 --> 00:00:57,550 research on smartphones and systems on a chip used in smart phones will answer 8 00:00:57,550 --> 00:01:02,520 these questions if you can trust those trusted execution environments. Please 9 00:01:02,520 --> 00:01:06,330 give a warm round of applause to Keegan and enjoy! 10 00:01:06,330 --> 00:01:10,630 *Applause* 11 00:01:10,630 --> 00:01:14,220 Kegan Ryan: All right, thank you! So I'm Keegan Ryan, I'm a consultant with NCC 12 00:01:14,220 --> 00:01:19,740 group and this is micro architectural attacks on Trusted Execution Environments. 13 00:01:19,740 --> 00:01:23,250 So, in order to understand what a Trusted Execution Environment is we need to go 14 00:01:23,250 --> 00:01:29,729 back into processor security, specifically on x86. So as many of you are probably 15 00:01:29,729 --> 00:01:33,729 aware there are a couple different modes which we can execute code under in x86 16 00:01:33,729 --> 00:01:39,290 processors and that includes ring 3, which is the user code and the applications, and 17 00:01:39,290 --> 00:01:45,570 also ring 0 which is the kernel code. Now there's also a ring 1 and ring 2 that are 18 00:01:45,570 --> 00:01:50,229 supposedly used for drivers or guest operating systems but really it just boils 19 00:01:50,229 --> 00:01:56,159 down to ring 0 and ring 3. And in this diagram we have here we see that privilege 20 00:01:56,159 --> 00:02:02,149 increases as we go up the diagram, so ring 0 is the most privileged ring and ring 3 21 00:02:02,149 --> 00:02:05,470 is the least privileged ring. So all of our secrets, all of our sensitive 22 00:02:05,470 --> 00:02:10,030 information, all of the attackers goals are in ring 0 and the attacker is trying 23 00:02:10,030 --> 00:02:15,890 to access those from the unprivileged world of ring 3. Now you may have a 24 00:02:15,890 --> 00:02:20,150 question what if I want to add a processor feature that I don't want ring 0 to be 25 00:02:20,150 --> 00:02:26,240 able to access? Well then you add ring -1 which is often used for a hypervisor. Now 26 00:02:26,240 --> 00:02:30,610 the hypervisor has all the secrets and the hypervisor can manage different guest 27 00:02:30,610 --> 00:02:35,680 operating systems and each of these guest operating systems can execute in ring 0 28 00:02:35,680 --> 00:02:41,300 without having any idea of the other operating systems. So this way now the 29 00:02:41,300 --> 00:02:45,230 secrets are all in ring -1 so now the attackers goals have shifted from ring 0 30 00:02:45,230 --> 00:02:50,760 to ring -1. The attacker has to attack ring -1 from a less privileged ring and 31 00:02:50,760 --> 00:02:55,430 tries to access those secrets. But what if you want to add a processor feature that 32 00:02:55,430 --> 00:03:00,560 you don't want ring -1 to be able to access? So you add ring -2 which is System 33 00:03:00,560 --> 00:03:05,230 Management Mode and that's capable of monitoring power, directly interfacing 34 00:03:05,230 --> 00:03:10,350 with firmware and other chips on a motherboard and it's able to access and do 35 00:03:10,350 --> 00:03:13,820 a lot of things that the hypervisor is not able to and now all of your secrets and 36 00:03:13,820 --> 00:03:17,880 all of your attacker goals are in ring -2 and the attacker has to attack those from 37 00:03:17,880 --> 00:03:22,400 a less privileged ring. Now maybe you want to add something to your processor that 38 00:03:22,400 --> 00:03:26,900 you don't want ring -2 to be able access, so you add ring -3 and I think you get the 39 00:03:26,900 --> 00:03:31,450 picture now. And we just keep on adding more and more privilege rings and keep 40 00:03:31,450 --> 00:03:35,260 putting our secrets and our attackers goals in these higher and higher 41 00:03:35,260 --> 00:03:41,260 privileged rings but what if we're thinking about it wrong? What if instead 42 00:03:41,260 --> 00:03:46,710 we want to put all the secrets in the least privileged ring? So this is sort of 43 00:03:46,710 --> 00:03:51,490 the idea behind SGX and it's useful for things like DRM where you want that to run 44 00:03:51,490 --> 00:03:56,980 ring 3 code but have sensitive secrets or other assigning capabilities running in 45 00:03:56,980 --> 00:04:02,050 ring 3. But this picture is getting a little bit complicated, this diagram is a 46 00:04:02,050 --> 00:04:06,250 little bit complex so let's simplify it a little bit. We'll only be looking at ring 47 00:04:06,250 --> 00:04:12,100 0 through ring 3 which is the kernel, the userland and the SGX enclave which also 48 00:04:12,100 --> 00:04:16,910 executes in ring 3. Now when you're executing code in the SGX enclave you 49 00:04:16,910 --> 00:04:22,170 first load the code into the enclave and then from that point on you trust the 50 00:04:22,170 --> 00:04:26,980 execution of whatever's going on in that enclave. You trust that the other elements 51 00:04:26,980 --> 00:04:31,640 the kernel, the userland, the other rings are not going to be able to access what's 52 00:04:31,640 --> 00:04:38,020 in that enclave so you've made your Trusted Execution Environment. This is a 53 00:04:38,020 --> 00:04:44,750 bit of a weird model because now your attacker is in the ring 0 kernel and your 54 00:04:44,750 --> 00:04:48,840 target victim here is in ring 3. So instead of the attacker trying to move up 55 00:04:48,840 --> 00:04:54,070 the privilege chain, the attacker is trying to move down. Which is pretty 56 00:04:54,070 --> 00:04:57,820 strange and you might have some questions like "under this model who handles memory 57 00:04:57,820 --> 00:05:01,470 management?" because traditionally that's something that ring 0 would manage and 58 00:05:01,470 --> 00:05:05,290 ring 0 would be responsible for paging memory in and out for different processes 59 00:05:05,290 --> 00:05:10,460 in different code that's executing it in ring 3. But on the other hand you don't 60 00:05:10,460 --> 00:05:16,030 want that to happen with the SGX enclave because what if the malicious ring 0 adds 61 00:05:16,030 --> 00:05:22,410 a page to the enclave that the enclave doesn't expect? So in order to solve this 62 00:05:22,410 --> 00:05:28,950 problem, SGX does allow ring 0 to handle page faults. But simultaneously and in 63 00:05:28,950 --> 00:05:35,380 parallel it verifies every memory load to make sure that no access violations are 64 00:05:35,380 --> 00:05:40,139 made so that all the SGX memory is safe. So it allows ring 0 to do its job but it 65 00:05:40,139 --> 00:05:45,010 sort of watches over at the same time to make sure that nothing is messed up. So 66 00:05:45,010 --> 00:05:51,120 it's a bit of a weird convoluted solution to a strange inverted problem but it works 67 00:05:51,120 --> 00:05:57,580 and that's essentially how SGX works and the idea behind SGX. Now we can look at 68 00:05:57,580 --> 00:06:02,530 x86 and we can see that ARMv8 is constructed in a similar way but it 69 00:06:02,530 --> 00:06:08,450 improves on x86 in a couple key ways. So first of all ARMv8 gets rid of ring 1 and 70 00:06:08,450 --> 00:06:12,170 ring 2 so you don't have to worry about those and it just has different privilege 71 00:06:12,170 --> 00:06:17,370 levels for userland and the kernel. And these different privilege levels are 72 00:06:17,370 --> 00:06:21,520 called exception levels in the ARM terminology. And the second thing that ARM 73 00:06:21,520 --> 00:06:25,930 gets right compared to x86 is that instead of starting at 3 and counting down as 74 00:06:25,930 --> 00:06:30,730 privilege goes up, ARM starts at 0 and counts up so we don't have to worry about 75 00:06:30,730 --> 00:06:35,940 negative numbers anymore. Now when we add the next privilege level the hypervisor we 76 00:06:35,940 --> 00:06:40,860 call it exception level 2 and the next one after that is the monitor in exception 77 00:06:40,860 --> 00:06:47,210 level 3. So at this point we still want to have the ability to run trusted code in 78 00:06:47,210 --> 00:06:52,650 exception level 0 the least privileged level of the ARMv8 processor. So in order 79 00:06:52,650 --> 00:06:59,060 to support this we need to separate this diagram into two different sections. In 80 00:06:59,060 --> 00:07:03,510 ARMv8 these are called the secure world and the non-secure world. So we have the 81 00:07:03,510 --> 00:07:07,740 non-secure world on the left in blue that consists of the userland, the kernel and 82 00:07:07,740 --> 00:07:11,900 the hypervisor and we have the secure world on the right which consists of the 83 00:07:11,900 --> 00:07:17,360 monitor in exception level 3, a trusted operating system in exception level 1 and 84 00:07:17,360 --> 00:07:23,030 trusted applications in exception level 0. So the idea is that if you run anything in 85 00:07:23,030 --> 00:07:27,360 the secure world, it should not be accessible or modifiable by anything in 86 00:07:27,360 --> 00:07:32,320 the non secure world. So that's how our attacker is trying to access it. The 87 00:07:32,320 --> 00:07:36,371 attacker has access to the non secure kernel, which is often Linux, and they're 88 00:07:36,371 --> 00:07:40,120 trying to go after the trusted apps. So once again we have this weird inversion 89 00:07:40,120 --> 00:07:43,330 where we're trying to go from a more privileged level to a less privileged 90 00:07:43,330 --> 00:07:48,260 level and trying to extract secrets in that way. So the question that arises when 91 00:07:48,260 --> 00:07:53,070 using these Trusted Execution Environments that are implemented in SGX and TrustZone 92 00:07:53,070 --> 00:07:58,330 in ARM is "can we use these privilege modes in our privilege access in order to 93 00:07:58,330 --> 00:08:03,330 attack these Trusted Execution Environments?". Now transfer that question 94 00:08:03,330 --> 00:08:06,260 and we can start looking at a few different research papers. The first one 95 00:08:06,260 --> 00:08:11,360 that I want to go into is one called CLKSCREW and it's an attack on TrustZone. 96 00:08:11,360 --> 00:08:14,360 So throughout this presentation I'm going to go through a few different papers and 97 00:08:14,360 --> 00:08:18,050 just to make it clear which papers have already been published and which ones are 98 00:08:18,050 --> 00:08:21,400 old I'll include the citations in the upper right hand corner so that way you 99 00:08:21,400 --> 00:08:26,580 can tell what's old and what's new. And as far as papers go this CLKSCREW paper is 100 00:08:26,580 --> 00:08:31,430 relatively new. It was released in 2017. And the way CLKSCREW works is it takes 101 00:08:31,430 --> 00:08:38,009 advantage of the energy management features of a processor. So a non-secure 102 00:08:38,009 --> 00:08:41,679 operating system has the ability to manage the energy consumption of the different 103 00:08:41,679 --> 00:08:47,970 cores. So if a certain target core doesn't have much scheduled to do then the 104 00:08:47,970 --> 00:08:52,350 operating system is able to scale back that voltage or dial down the frequency on 105 00:08:52,350 --> 00:08:56,449 that core so that core uses less energy which is a great thing for performance: it 106 00:08:56,449 --> 00:09:00,971 really extends battery life, it makes the the cores last longer and it gives better 107 00:09:00,971 --> 00:09:07,009 performance overall. But the problem here is what if you have two separate cores and 108 00:09:07,009 --> 00:09:11,740 one of your cores is running this non- trusted operating system and the other 109 00:09:11,740 --> 00:09:15,579 core is running code in the secure world? It's running that trusted code those 110 00:09:15,579 --> 00:09:21,240 trusted applications so that non secure operating system can still dial down that 111 00:09:21,240 --> 00:09:25,629 voltage and it can still change that frequency and those changes will affect 112 00:09:25,629 --> 00:09:30,740 the secure world code. So what the CLKSCREW attack does is the non secure 113 00:09:30,740 --> 00:09:36,470 operating system core will dial down the voltage, it will overclock the frequency 114 00:09:36,470 --> 00:09:40,749 on the target secure world core in order to induce faults to make sure to make the 115 00:09:40,749 --> 00:09:45,909 computation on that core fail in some way and when that computation fails you get 116 00:09:45,909 --> 00:09:50,439 certain cryptographic errors that the attack can use to infer things like secret 117 00:09:50,439 --> 00:09:56,040 keys, secret AES keys and to bypass code signing implemented in the secure world. 118 00:09:56,040 --> 00:09:59,680 So it's a very powerful attack that's made possible because the non-secure operating 119 00:09:59,680 --> 00:10:06,099 system is privileged enough in order to use these energy management features. Now 120 00:10:06,099 --> 00:10:10,189 CLKSCREW is an example of an active attack where the attacker is actively changing 121 00:10:10,189 --> 00:10:15,470 the outcome of the victim code of that code in the secure world. But what about 122 00:10:15,470 --> 00:10:20,540 passive attacks? So in a passive attack, the attacker does not modify the actual 123 00:10:20,540 --> 00:10:25,220 outcome of the process. The attacker just tries to monitor that process infer what's 124 00:10:25,220 --> 00:10:29,200 going on and that is the sort of attack that we'll be considering for the rest of 125 00:10:29,200 --> 00:10:35,769 the presentation. So in a lot of SGX and TrustZone implementations, the trusted and 126 00:10:35,769 --> 00:10:39,759 the non-trusted code both share the same hardware and this shared hardware could be 127 00:10:39,759 --> 00:10:45,800 a shared cache, it could be a branch predictor, it could be a TLB. The point is 128 00:10:45,800 --> 00:10:53,230 that they share the same hardware so that the changes made by the secure code may be 129 00:10:53,230 --> 00:10:57,209 reflected in the behavior of the non- secure code. So the trusted code might 130 00:10:57,209 --> 00:11:02,259 execute, change the state of that shared cache for example and then the untrusted 131 00:11:02,259 --> 00:11:07,179 code may be able to go in, see the changes in that cache and infer information about 132 00:11:07,179 --> 00:11:11,720 the behavior of the secure code. So that's essentially how our side channel attacks 133 00:11:11,720 --> 00:11:16,160 are going to work. If the non-secure code is going to monitor these shared hardware 134 00:11:16,160 --> 00:11:23,050 resources for state changes that reflect the behavior of the secure code. Now we've 135 00:11:23,050 --> 00:11:27,899 all talked about how Intel and SGX address the problem of memory management and who's 136 00:11:27,899 --> 00:11:33,399 responsible for making sure that those attacks don't work on SGX. So what do they 137 00:11:33,399 --> 00:11:37,050 have to say on how they protect against these side channel attacks and attacks on 138 00:11:37,050 --> 00:11:45,490 this shared cache hardware? They don't.. at all. They essentially say "we do not 139 00:11:45,490 --> 00:11:48,931 consider this part of our threat model. It is up to the developer to implement the 140 00:11:48,931 --> 00:11:53,530 protections needed to protect against these side-channel attacks". Which is 141 00:11:53,530 --> 00:11:56,769 great news for us because these side channel attacks can be very powerful and 142 00:11:56,769 --> 00:12:00,350 if there aren't any hardware features that are necessarily stopping us from being 143 00:12:00,350 --> 00:12:06,910 able to accomplish our goal it makes us that more likely to succeed. So with that 144 00:12:06,910 --> 00:12:11,430 we can sort of take a step back from trust zone industry acts and just take a look at 145 00:12:11,430 --> 00:12:14,959 cache attacks to make sure that we all have the same understanding of how the 146 00:12:14,959 --> 00:12:19,549 cache attacks will be applied to these Trusted Execution Environments. To start 147 00:12:19,549 --> 00:12:25,619 that let's go over a brief recap of how a cache works. So caches are necessary in 148 00:12:25,619 --> 00:12:29,949 processors because accessing the main memory is slow. When you try to access 149 00:12:29,949 --> 00:12:34,079 something from the main memory it takes a while to be read into the process. So the 150 00:12:34,079 --> 00:12:40,389 cache exists as sort of a layer to remember what that information is so if 151 00:12:40,389 --> 00:12:45,040 the process ever needs information from that same address it just reloads it from 152 00:12:45,040 --> 00:12:49,699 the cache and that access is going to be fast. So it really speeds up the memory 153 00:12:49,699 --> 00:12:55,810 access for repeated accesses to the same address. And then if we try to access a 154 00:12:55,810 --> 00:13:00,069 different address then that will also be read into the cache, slowly at first but 155 00:13:00,069 --> 00:13:06,720 then quickly for repeated accesses and so on and so forth. Now as you can probably 156 00:13:06,720 --> 00:13:10,970 tell from all of these examples the memory blocks have been moving horizontally 157 00:13:10,970 --> 00:13:15,649 they've always been staying in the same row. And that is reflective of the idea of 158 00:13:15,649 --> 00:13:20,360 sets in a cache. So there are a number of different set IDs and that corresponds to 159 00:13:20,360 --> 00:13:24,189 the different rows in this diagram. So for our example there are four different set 160 00:13:24,189 --> 00:13:30,889 IDs and each address in the main memory maps to a different set ID. So that 161 00:13:30,889 --> 00:13:35,100 address in main memory will only go into that location in the cache with the same 162 00:13:35,100 --> 00:13:39,730 set ID so it will only travel along those rows. So that means if you have two 163 00:13:39,730 --> 00:13:43,410 different blocks of memory that mapped to different set IDs they're not going to 164 00:13:43,410 --> 00:13:48,899 interfere with each other in the cache. But that raises the question "what about 165 00:13:48,899 --> 00:13:53,310 two memory blocks that do map to the same set ID?". Well if there's room in the 166 00:13:53,310 --> 00:13:58,759 cache then the same thing will happen as before: those memory contents will be 167 00:13:58,759 --> 00:14:03,769 loaded into the cache and then retrieved from the cache for future accesses. And 168 00:14:03,769 --> 00:14:08,110 the number of possible entries for a particular set ID within a cache is called 169 00:14:08,110 --> 00:14:11,800 the associativity. And on this diagram that's represented by the number of 170 00:14:11,800 --> 00:14:16,819 columns in the cache. So we will call our cache in this example a 2-way set- 171 00:14:16,819 --> 00:14:22,350 associative cache. Now the next question is "what happens if you try to read a 172 00:14:22,350 --> 00:14:27,049 memory address that maps the same set ID but all of those entries within that said ID 173 00:14:27,049 --> 00:14:32,529 within the cache are full?". Well one of those entries is chosen, it's evicted from 174 00:14:32,529 --> 00:14:38,729 the cache, the new memory is read in and then that's fed to the process. So it 175 00:14:38,729 --> 00:14:43,779 doesn't really matter how the cache entry is chosen that you're evicting for the 176 00:14:43,779 --> 00:14:47,960 purpose of the presentation you can just assume that it's random. But the important 177 00:14:47,960 --> 00:14:51,899 thing is that if you try to access that same memory that was evicted before you're 178 00:14:51,899 --> 00:14:55,689 not going to have to wait for that time penalty for that to be reloaded into the 179 00:14:55,689 --> 00:15:01,329 cache and read into the process. So those are caches in a nutshell in particularly 180 00:15:01,329 --> 00:15:05,749 set associative caches, we can begin looking at the different types of cache 181 00:15:05,749 --> 00:15:09,319 attacks. So for a cache attack we have two different processes we have an attacker 182 00:15:09,319 --> 00:15:13,779 process and a victim process. For this type of attack that we're considering both 183 00:15:13,779 --> 00:15:17,290 of them share the same underlying code so they're trying to access the same 184 00:15:17,290 --> 00:15:21,829 resources which could be the case if you have page deduplication in virtual 185 00:15:21,829 --> 00:15:26,009 machines or if you have copy-on-write mechanisms for shared code and shared 186 00:15:26,009 --> 00:15:31,649 libraries. But the point is that they share the same underlying memory. Now the 187 00:15:31,649 --> 00:15:35,659 Flush and Reload Attack works in two stages for the attacker. The attacker 188 00:15:35,659 --> 00:15:39,420 first starts by flushing out the cache. They flush each and every addresses in the 189 00:15:39,420 --> 00:15:44,309 cache so the cache is just empty. Then the attacker let's the victim executes for a 190 00:15:44,309 --> 00:15:48,769 small amount of time so the victim might read on an address from main memory 191 00:15:48,769 --> 00:15:53,489 loading that into the cache and then the second stage of the attack is the reload 192 00:15:53,489 --> 00:15:58,099 phase. In the reload phase the attacker tries to load different memory addresses 193 00:15:58,099 --> 00:16:04,171 from main memory and see if those entries are in the cache or not. Here the attacker 194 00:16:04,171 --> 00:16:09,380 will first try to load address 0 and see that because it takes a long time to read 195 00:16:09,380 --> 00:16:14,429 the contents of address 0 the attacker can infer that address 0 was not part of the 196 00:16:14,429 --> 00:16:17,499 cache which makes sense because the attacker flushed it from the cache in the 197 00:16:17,499 --> 00:16:23,330 first stage. The attacker then tries to read the memory at address 1 and sees that 198 00:16:23,330 --> 00:16:29,089 this operation is fast so the attacker infers that the contents of address 1 are 199 00:16:29,089 --> 00:16:32,859 in the cache and because the attacker flushed everything from the cache before 200 00:16:32,859 --> 00:16:37,119 the victim executed, the attacker then concludes that the victim is responsible 201 00:16:37,119 --> 00:16:42,540 for bringing address 1 into the cache. This Flush+Reload attack reveals which 202 00:16:42,540 --> 00:16:47,370 memory addresses the victim accesses during that small slice of time. Then 203 00:16:47,370 --> 00:16:50,970 after that reload phase, the attack repeats so the attacker flushes again 204 00:16:50,970 --> 00:16:57,739 let's the victim execute, reloads again and so on. There's also a variant on the 205 00:16:57,739 --> 00:17:01,050 Flush+Reload attack that's called the Flush+Flush attack which I'm not going to 206 00:17:01,050 --> 00:17:05,569 go into the details of, but essentially it's the same idea. But instead of using 207 00:17:05,569 --> 00:17:08,980 load instructions to determine whether or not a piece of memory is in the cache or 208 00:17:08,980 --> 00:17:13,720 not, it uses flush instructions because flush instructions will take longer if 209 00:17:13,720 --> 00:17:19,138 something is in the cache already. The important thing is that both the 210 00:17:19,138 --> 00:17:22,819 Flush+Reload attack and the Flush+Flush attack rely on the attacker and the victim 211 00:17:22,819 --> 00:17:27,029 sharing the same memory. But this isn't always the case so we need to consider 212 00:17:27,029 --> 00:17:30,810 what happens when the attacker and the victim do not share memory. For this we 213 00:17:30,810 --> 00:17:35,670 have the Prime+Probe attack. The Prime+Probe attack once again works in two 214 00:17:35,670 --> 00:17:40,380 separate stages. In the first stage the attacker prime's the cache by reading all 215 00:17:40,380 --> 00:17:44,401 the attacker memory into the cache and then the attacker lets the victim execute 216 00:17:44,401 --> 00:17:49,750 for a small amount of time. So no matter what the victim accesses from main memory 217 00:17:49,750 --> 00:17:54,460 since the cache is full of the attacker data, one of those attacker entries will 218 00:17:54,460 --> 00:17:59,190 be replaced by a victim entry. Then in the second phase of the attack, during the 219 00:17:59,190 --> 00:18:03,529 probe phase, the attacker checks the different cache entries for particular set 220 00:18:03,529 --> 00:18:08,959 IDs and sees if all of the attacker entries are still in the cache. So maybe 221 00:18:08,959 --> 00:18:13,440 our attacker is curious about the last set ID, the bottom row, so the attacker first 222 00:18:13,440 --> 00:18:18,090 tries to load the memory at address 3 and because this operation is fast the 223 00:18:18,090 --> 00:18:23,000 attacker knows that address 3 is in the cache. The attacker tries the same thing 224 00:18:23,000 --> 00:18:28,159 with address 7, sees that this operation is slow and infers that at some point 225 00:18:28,159 --> 00:18:33,279 address 7 was evicted from the cache so the attacker knows that something had to 226 00:18:33,279 --> 00:18:37,490 evicted from the cache and it had to be from the victim so the attacker concludes 227 00:18:37,490 --> 00:18:42,840 that the victim accessed something in that last set ID and that bottom row. The 228 00:18:42,840 --> 00:18:47,230 attacker doesn't know if it was the contents of address 11 or the contents of 229 00:18:47,230 --> 00:18:51,260 address 15 or even what those contents are, but the attacker has a good idea of 230 00:18:51,260 --> 00:18:57,090 which set ID it was. So, the good things, the important things to remember about 231 00:18:57,090 --> 00:19:01,179 cache attacks is that caches are very important, they're crucial for performance 232 00:19:01,179 --> 00:19:06,059 on processors, they give a huge speed boost and there's a huge time difference 233 00:19:06,059 --> 00:19:11,569 between having a cache and not having a cache for your executables. But the 234 00:19:11,569 --> 00:19:16,080 downside to this is that big time difference also allows the attacker to 235 00:19:16,080 --> 00:19:21,620 infer information about how the victim is using the cache. We're able to use these 236 00:19:21,620 --> 00:19:24,429 cache attacks in the two different scenarios of, where memory is shared, in 237 00:19:24,429 --> 00:19:28,230 the case of the Flush+Reload and Flush+Flush attacks and in the case where 238 00:19:28,230 --> 00:19:31,739 memory is not shared, in the case of the Prime+Probe attack. And finally the 239 00:19:31,739 --> 00:19:36,659 important thing to keep in mind is that, for these cache attacks, we know where the 240 00:19:36,659 --> 00:19:40,480 victim is looking, but we don't know what they see. So we don't know the contents of 241 00:19:40,480 --> 00:19:44,360 the memory that the victim is actually seeing, we just know the location and the 242 00:19:44,360 --> 00:19:51,549 addresses. So, what does an example trace of these attacks look like? Well, there's 243 00:19:51,549 --> 00:19:56,451 an easy way to represent these as two- dimensional images. So in this image, we 244 00:19:56,451 --> 00:20:01,760 have our horizontal axis as time, so each column in this image represents a 245 00:20:01,760 --> 00:20:07,159 different time slice, a different iteration of the Prime measure and Probe. 246 00:20:07,159 --> 00:20:11,440 So, then we also have the vertical access which is the different set IDs, which is 247 00:20:11,440 --> 00:20:18,360 the location that's accessed by the victim process, and then here a pixel is white if 248 00:20:18,360 --> 00:20:24,159 the victim accessed that set ID during that time slice. So, as you look from left 249 00:20:24,159 --> 00:20:28,139 to right as time moves forward, you can sort of see the changes in the patterns of 250 00:20:28,139 --> 00:20:34,070 the memory accesses made by the victim process. Now, for this particular example 251 00:20:34,070 --> 00:20:39,860 the trace is captured on an execution of AES repeated several times, an AES 252 00:20:39,860 --> 00:20:44,519 encryption repeated about 20 times. And you can tell that this is a repeated 253 00:20:44,519 --> 00:20:49,070 action because you see the same repeated memory access patterns in the data, you 254 00:20:49,070 --> 00:20:55,320 see the same structures repeated over and over. So, you know that this is reflecting 255 00:20:55,320 --> 00:21:00,749 at what's going on throughout time, but what does it have to do with AES itself? 256 00:21:00,749 --> 00:21:05,950 Well, if we take the same trace with the same settings, but a different key, we see 257 00:21:05,950 --> 00:21:11,590 that there is a different memory access pattern with different repetition within 258 00:21:11,590 --> 00:21:18,200 the trace. So, only the key changed, the code didn't change. So, even though we're 259 00:21:18,200 --> 00:21:22,130 not able to read the contents of the key directly using this cache attack, we know 260 00:21:22,130 --> 00:21:25,610 that the key is changing these memory access patterns, and if we can see these 261 00:21:25,610 --> 00:21:30,850 memory access patterns, then we can infer the key. So, that's the essential idea: we 262 00:21:30,850 --> 00:21:35,380 want to make these images as clear as possible and as descriptive as possible so 263 00:21:35,380 --> 00:21:42,279 we have the best chance of learning what those secrets are. And we can define the 264 00:21:42,279 --> 00:21:47,389 metrics for what makes these cache attacks powerful in a few different ways. So, the 265 00:21:47,389 --> 00:21:51,759 three ways we'll be looking at are spatial resolution, temporal resolution and noise. 266 00:21:51,759 --> 00:21:56,300 So, spatial resolution refers to how accurately we can determine the where. If 267 00:21:56,300 --> 00:22:00,510 we know that the victim access to memory address within 1,000 bytes, that's 268 00:22:00,510 --> 00:22:06,820 obviously not as powerful as knowing where they accessed within 512 bytes. Temporal 269 00:22:06,820 --> 00:22:12,049 resolution is similar, where we want to know the order of what accesses the victim 270 00:22:12,049 --> 00:22:17,769 made. So if that time slice during our attack is 1 millisecond, we're going to 271 00:22:17,769 --> 00:22:22,139 get much better ordering information on those memory access than we would get if 272 00:22:22,139 --> 00:22:27,350 we only saw all the memory accesses over the course of one second. So the shorter 273 00:22:27,350 --> 00:22:32,159 that time slice, the better the temporal resolution, the longer our picture will be 274 00:22:32,159 --> 00:22:37,790 on the horizontal access, and the clearer of an image of the cache that we'll see. 275 00:22:37,790 --> 00:22:41,419 And the last metric to evaluate our attacks on is noise and that reflects how 276 00:22:41,419 --> 00:22:46,070 accurately our measurements reflect the true state of the cache. So, right now 277 00:22:46,070 --> 00:22:49,950 we've been using time and data to infer whether or not an item was in the cache or 278 00:22:49,950 --> 00:22:54,340 not, but this is a little bit noisy. It's possible that we'll have false positives 279 00:22:54,340 --> 00:22:57,370 or false negatives, so we want to keep that in mind as we look at the different 280 00:22:57,370 --> 00:23:03,081 attacks. So, that's essentially cache attacks, and then, in a nutshell and 281 00:23:03,081 --> 00:23:06,519 that's all you really need to understand in order to understand these attacks as 282 00:23:06,519 --> 00:23:11,389 they've been implemented on Trusted Execution Environments. And the first 283 00:23:11,389 --> 00:23:14,510 particular attack that we're going to be looking at is called a Controlled-Channel 284 00:23:14,510 --> 00:23:19,890 Attack on SGX, and this attack isn't necessarily a cache attack, but we can 285 00:23:19,890 --> 00:23:23,770 analyze it in the same way that we analyze the cache attacks. So, it's still useful 286 00:23:23,770 --> 00:23:30,940 to look at. Now, if you remember how memory management occurs with SGX, we know 287 00:23:30,940 --> 00:23:36,210 that if a page fault occurs during SGX Enclave code execution, that page fault is 288 00:23:36,210 --> 00:23:43,019 handled by the kernel. So, the kernel has to know which page the Enclave needs to be 289 00:23:43,019 --> 00:23:48,050 paged in. The kernel already gets some information about what the Enclave is 290 00:23:48,050 --> 00:23:54,789 looking at. Now, in the Controlled-Channel attack, there's a, what the attacker does 291 00:23:54,789 --> 00:23:59,839 from the non-trusted OS is the attacker pages almost every other page from the 292 00:23:59,839 --> 00:24:05,260 Enclave out of memory. So no matter whatever page that Enclave tries to 293 00:24:05,260 --> 00:24:09,770 access, it's very likely to cause a page fault, which will be redirected to the 294 00:24:09,770 --> 00:24:14,150 non-trusted OS, where the non-trusted OS can record it, page out any other pages 295 00:24:14,150 --> 00:24:20,429 and continue execution. So, the OS essentially gets a list of sequential page 296 00:24:20,429 --> 00:24:26,259 accesses made by the SGX Enclaves, all by capturing the page fault handler. This is 297 00:24:26,259 --> 00:24:29,669 a very general attack, you don't need to know what's going on in the Enclave in 298 00:24:29,669 --> 00:24:33,460 order to pull this off. You just load up an arbitrary Enclave and you're able to 299 00:24:33,460 --> 00:24:40,720 see which pages that Enclave is trying to access. So, how does it do on our metrics? 300 00:24:40,720 --> 00:24:44,270 First of all, this spatial resolution is not great. We can only see where the 301 00:24:44,270 --> 00:24:50,470 victim is accessing within 4096 bytes or the size of a full page because SGX 302 00:24:50,470 --> 00:24:55,519 obscures the offset into the page where the page fault occurs. The temporal 303 00:24:55,519 --> 00:24:58,760 resolution is good but not great, because even though we're able to see any 304 00:24:58,760 --> 00:25:04,450 sequential accesses to different pages we're not able to see sequential accesses 305 00:25:04,450 --> 00:25:09,970 to the same page because we need to keep that same page paged-in while we let our 306 00:25:09,970 --> 00:25:15,490 SGX Enclave run for that small time slice. So temporal resolution is good but not 307 00:25:15,490 --> 00:25:22,440 perfect. But the noise is, there is no noise in this attack because no matter 308 00:25:22,440 --> 00:25:26,149 where the page fault occurs, the untrusted operating system is going to capture that 309 00:25:26,149 --> 00:25:30,180 page fault and is going to handle it. So, it's very low noise, not great spatial 310 00:25:30,180 --> 00:25:37,490 resolution but overall still a powerful attack. But we still want to improve on 311 00:25:37,490 --> 00:25:40,700 that spatial resolution, we want to be able to see what the Enclave is doing that 312 00:25:40,700 --> 00:25:45,970 greater than a resolution of a one page of four kilobytes. So that's exactly what the 313 00:25:45,970 --> 00:25:50,179 CacheZoom paper does, and instead of interrupting the SGX Enclave execution 314 00:25:50,179 --> 00:25:55,370 with page faults, it uses timer interrupts. Because the untrusted 315 00:25:55,370 --> 00:25:59,280 operating system is able to schedule when timer interrupts occur, so it's able to 316 00:25:59,280 --> 00:26:03,320 schedule them at very tight intervals, so it's able to get that small and tight 317 00:26:03,320 --> 00:26:08,549 temporal resolution. And essentially what happens in between is this timer 318 00:26:08,549 --> 00:26:13,410 interrupts fires, the untrusted operating system runs the Prime+Probe attack code in 319 00:26:13,410 --> 00:26:18,240 this case, and resumes execution of the onclick process, and this repeats. So this 320 00:26:18,240 --> 00:26:24,549 is a Prime+Probe attack on the L1 data cache. So, this attack let's you see what 321 00:26:24,549 --> 00:26:30,529 data The Enclave is looking at. Now, this attack could be easily modified to use the 322 00:26:30,529 --> 00:26:36,000 L1 instruction cache, so in that case you learn which instructions The Enclave is 323 00:26:36,000 --> 00:26:41,419 executing. And overall this is an even more powerful attack than the Control- 324 00:26:41,419 --> 00:26:46,429 Channel attack. If we look at the metrics, we can see that the spatial resolution is 325 00:26:46,429 --> 00:26:50,360 a lot better, now we're looking at spatial resolution of 64 bytes or the size of an 326 00:26:50,360 --> 00:26:55,370 individual line. The temporal resolution is very good, it's "almost unlimited", to 327 00:26:55,370 --> 00:27:00,250 quote the paper, because the untrusted operating system has the privilege to keep 328 00:27:00,250 --> 00:27:05,179 scheduling those time interrupts closer and closer together until it's able to 329 00:27:05,179 --> 00:27:10,260 capture very small time slices of the victim process .And the noise itself is 330 00:27:10,260 --> 00:27:14,559 low, we're still using a cycle counter to measure the time it takes to load memory 331 00:27:14,559 --> 00:27:20,629 in and out of the cache, but it's, it's useful, the chances of having a false 332 00:27:20,629 --> 00:27:26,809 positive or false negative are low, so the noise is low as well. Now, we can also 333 00:27:26,809 --> 00:27:31,129 look at Trust Zone attacks, because so far the attacks that we've looked at, the 334 00:27:31,129 --> 00:27:35,130 passive attacks, have been against SGX and those attacks on SGX have been pretty 335 00:27:35,130 --> 00:27:40,669 powerful. So, what are the published attacks on Trust Zone? Well, there's one 336 00:27:40,669 --> 00:27:44,990 called TruSpy, which is kind of similar in concept to the CacheZoom attack that we 337 00:27:44,990 --> 00:27:51,629 just looked at on SGX. It's once again a Prime+probe attack on the L1 data cache, 338 00:27:51,629 --> 00:27:57,129 and the difference here is that instead of interrupting the victim code execution 339 00:27:57,129 --> 00:28:04,460 multiple times, the TruSpy attack does the prime step, does the full AES encryption, 340 00:28:04,460 --> 00:28:08,539 and then does the probe step. And the reason they do this, is because as they 341 00:28:08,539 --> 00:28:13,330 say, the secure world is protected, and is not interruptible in the same way that SGX 342 00:28:13,330 --> 00:28:20,690 is interruptable. But even despite this, just having one measurement per execution, 343 00:28:20,690 --> 00:28:24,940 the TruSpy authors were able to use some statistics to still recover the AES key 344 00:28:24,940 --> 00:28:30,460 from that noise. And their methods were so powerful, they are able to do this from an 345 00:28:30,460 --> 00:28:34,539 unapproved application in user land, so they don't even need to be running within 346 00:28:34,539 --> 00:28:39,820 the kernel in order to be able to pull off this attack. So, how does this attack 347 00:28:39,820 --> 00:28:43,360 measure up? The spatial resolution is once again 64 bytes because that's the size of 348 00:28:43,360 --> 00:28:48,559 a cache line on this processor, and the temporal resolution is, is pretty poor 349 00:28:48,559 --> 00:28:54,190 here, because we only get one measurement per execution of the AES encryption. This 350 00:28:54,190 --> 00:28:58,700 is also a particularly noisy attack because we're making the measurements from 351 00:28:58,700 --> 00:29:02,659 the user land, but even if we make the measurements from the kernel, we're still 352 00:29:02,659 --> 00:29:05,789 going to have the same issues of false positives and false negatives associated 353 00:29:05,789 --> 00:29:12,470 with using a cycle counter to measure membership in a cache. So, we'd like to 354 00:29:12,470 --> 00:29:16,389 improve this a little bit. We'd like to improve the temporal resolution, so we 355 00:29:16,389 --> 00:29:20,749 have the power of the cache attack to be a little bit closer on TrustZone, as it is 356 00:29:20,749 --> 00:29:27,149 on SGX. So, we want to improve that temporal resolution. Let's dig into that 357 00:29:27,149 --> 00:29:30,549 statement a little bit, that the secure world is protected and not interruptable. 358 00:29:30,549 --> 00:29:36,499 And to do, this we go back to this diagram of ARMv8 and how that TrustZone is set up. 359 00:29:36,499 --> 00:29:41,490 So, it is true that when an interrupt occurs, it is directed to the monitor and, 360 00:29:41,490 --> 00:29:45,530 because the monitor operates in the secure world, if we interrupt secure code that's 361 00:29:45,530 --> 00:29:49,081 running an exception level 0, we're just going to end up running secure code an 362 00:29:49,081 --> 00:29:54,239 exception level 3. So, this doesn't necessarily get us anything. I think, 363 00:29:54,239 --> 00:29:57,880 that's what the author's mean by saying that it's protected against this. Just by 364 00:29:57,880 --> 00:30:02,780 setting an interrupt, we don't have a way to redirect our flow to the non- 365 00:30:02,780 --> 00:30:08,190 trusted code. At least that's how it works in theory. In practice, the Linux 366 00:30:08,190 --> 00:30:11,840 operating system, running in exception level 1 in the non-secure world, kind of 367 00:30:11,840 --> 00:30:15,299 needs interrupts in order to be able to work, so if an interrupt occurs and it's 368 00:30:15,299 --> 00:30:18,120 being sent to the monitor, the monitor will just forward it right to the non- 369 00:30:18,120 --> 00:30:22,500 secure operating system. So, we have interrupts just the same way as we did in 370 00:30:22,500 --> 00:30:28,930 CacheZoom. And we can improve the TrustZone attacks by using this idea: We 371 00:30:28,930 --> 00:30:33,549 have 2 cores, where one core is running the secure code, the other core is running 372 00:30:33,549 --> 00:30:38,101 the non-secure code, and the non-secure code is sending interrupts to the secure- 373 00:30:38,101 --> 00:30:42,809 world core and that will give us that interleaving of attacker process and 374 00:30:42,809 --> 00:30:47,409 victim process that allow us to have a powerful prime-and-probe attack. So, what 375 00:30:47,409 --> 00:30:51,139 does this look like? We have the attack core and the victim core. The attack core 376 00:30:51,139 --> 00:30:54,909 sends an interrupt to the victim core. This interrupt is captured by the monitor, 377 00:30:54,909 --> 00:30:58,769 which passes it to the non-secure operating system. The not-secure operating 378 00:30:58,769 --> 00:31:02,979 system transfers this to our attack code, which runs the prime-and-probe attack. 379 00:31:02,979 --> 00:31:06,529 Then, we leave the interrupt, the execution within the victim code in the 380 00:31:06,529 --> 00:31:10,910 secure world resumes and we just repeat this over and over. So, now we have that 381 00:31:10,910 --> 00:31:16,690 interleaving of data... of the processes of the attacker and the victim. So, now, 382 00:31:16,690 --> 00:31:22,690 instead of having a temporal resolution of one measurement per execution, we once 383 00:31:22,690 --> 00:31:26,320 again have almost unlimited temporal resolution, because we can just schedule 384 00:31:26,320 --> 00:31:32,229 when we send those interrupts from the attacker core. Now, we'd also like to 385 00:31:32,229 --> 00:31:37,590 improve the noise measurements. The... because if we can improve the noise, we'll 386 00:31:37,590 --> 00:31:42,159 get clearer pictures and we'll be able to infer those secrets more clearly. So, we 387 00:31:42,159 --> 00:31:45,720 can get some improvement by switching the measurements from userland and starting to 388 00:31:45,720 --> 00:31:50,830 do those in the kernel, but again we have the cycle counters. So, what if, instead 389 00:31:50,830 --> 00:31:54,330 of using the cycle counter to measure whether or not something is in the cache, 390 00:31:54,330 --> 00:32:00,070 we use the other performance counters? Because on ARMv8 platforms, there is a way 391 00:32:00,070 --> 00:32:03,769 to use performance counters to measure different events, such as cache hits and 392 00:32:03,769 --> 00:32:09,809 cache misses. So, these events and these performance monitors require privileged 393 00:32:09,809 --> 00:32:15,330 access in order to use, which, for this attack, we do have. Now, in a typical 394 00:32:15,330 --> 00:32:18,779 cache text scenario we wouldn't have access to these performance monitors, 395 00:32:18,779 --> 00:32:22,259 which is why they haven't really been explored before, but in this weird 396 00:32:22,259 --> 00:32:25,250 scenario where we're attacking the less privileged code from the more privileged 397 00:32:25,250 --> 00:32:29,340 code, we do have access to these performance monitors and we can use these 398 00:32:29,340 --> 00:32:33,640 monitors during the probe step to get a very accurate count of whether or not a 399 00:32:33,640 --> 00:32:39,519 certain memory load caused a cache miss or a cache hit. So, we're able to essentially 400 00:32:39,519 --> 00:32:45,720 get rid of the different levels of noise. Now, one thing to point out is that maybe 401 00:32:45,720 --> 00:32:49,230 we'd like to use these ARMv8 performance counters in order to count the different 402 00:32:49,230 --> 00:32:53,729 events that are occurring in the secure world code. So, maybe we start the 403 00:32:53,729 --> 00:32:57,909 performance counters from the non-secure world, let the secure world run and then, 404 00:32:57,909 --> 00:33:01,669 when they secure world exits, we use the non-secure world to read these performance 405 00:33:01,669 --> 00:33:05,440 counters and maybe we'd like to see how many instructions the secure world 406 00:33:05,440 --> 00:33:09,019 executed or how many branch instructions or how many arithmetic instructions or how 407 00:33:09,019 --> 00:33:13,179 many cache misses there were. But unfortunately, ARMv8 took this into 408 00:33:13,179 --> 00:33:17,350 account and by default, performance counters that are started in the non- 409 00:33:17,350 --> 00:33:20,769 secure world will not measure events that happen in the secure world, which is 410 00:33:20,769 --> 00:33:24,570 smart; which is how it should be. And the only reason I bring this up is because 411 00:33:24,570 --> 00:33:29,320 that's not how it is an ARMv7. So, we go into a whole different talk with that, 412 00:33:29,320 --> 00:33:33,909 just exploring the different implications of what that means, but I want to focus on 413 00:33:33,909 --> 00:33:39,230 ARMv8, because that's that's the newest of the new. So, we'll keep looking at that. 414 00:33:39,230 --> 00:33:42,540 So, we instrument the primary probe attack to use these performance counters, so we 415 00:33:42,540 --> 00:33:46,509 can get a clear picture of what is and what is not in the cache. And instead of 416 00:33:46,509 --> 00:33:52,399 having noisy measurements based on time, we have virtually no noise at all, because 417 00:33:52,399 --> 00:33:55,919 we get the truth straight from the processor itself, whether or not we 418 00:33:55,919 --> 00:34:01,660 experience a cache miss. So, how do we implement these attacks, where do we go 419 00:34:01,660 --> 00:34:05,549 from here? We have all these ideas; we have ways to make these TrustZone attacks 420 00:34:05,549 --> 00:34:11,840 more powerful, but that's not worthwhile, unless we actually implement them. So, the 421 00:34:11,840 --> 00:34:16,510 goal here is to implement these attacks on TrustZone and since typically the non- 422 00:34:16,510 --> 00:34:20,960 secure world operating system is based on Linux, we'll take that into account when 423 00:34:20,960 --> 00:34:25,360 making our implementation. So, we'll write a kernel module that uses these 424 00:34:25,360 --> 00:34:29,340 performance counters and these inner processor interrupts, in order to actually 425 00:34:29,340 --> 00:34:33,179 accomplish these attacks; and we'll write it in such a way that it's very 426 00:34:33,179 --> 00:34:37,300 generalizable. So you can take this kernel module that's was written for one device 427 00:34:37,300 --> 00:34:41,650 -- in my case I did most of my attention on the Nexus 5x -- and it's very easy to 428 00:34:41,650 --> 00:34:46,739 transfer this module to any other Linux- based device that has a trust zone that has 429 00:34:46,739 --> 00:34:52,139 these shared caches, so it should be very easy to port this over and to perform 430 00:34:52,139 --> 00:34:57,810 these same powerful cache attacks on different platforms. We can also do clever 431 00:34:57,810 --> 00:35:01,500 things based on the Linux operating system, so that we limit that collection 432 00:35:01,500 --> 00:35:05,500 window to just when we're executing within the secure world, so we can align our 433 00:35:05,500 --> 00:35:10,580 traces a lot more easily that way. And the end result is having a synchronized trace 434 00:35:10,580 --> 00:35:14,930 for each different attacks, because, since we've written in a modular way, we're able 435 00:35:14,930 --> 00:35:19,440 to run different attacks simultaneously. So, maybe we're running one prime-and- 436 00:35:19,440 --> 00:35:23,050 probe attack on the L1 data cache, to learn where the victim is accessing 437 00:35:23,050 --> 00:35:27,050 memory, and we're simultaneously running an attack on the L1 instruction cache, so 438 00:35:27,050 --> 00:35:33,910 we can see what instructions the victim is executing. And these can be aligned. So, 439 00:35:33,910 --> 00:35:37,080 the tool that I've written is a combination of a kernel module which 440 00:35:37,080 --> 00:35:41,580 actually performs this attack, a userland binary which schedules these processes to 441 00:35:41,580 --> 00:35:45,860 different cores, and a GUI that will allow you to interact with this kernel module 442 00:35:45,860 --> 00:35:49,710 and rapidly start doing these cache attacks for yourself and perform them 443 00:35:49,710 --> 00:35:56,860 against different processes and secure code and secure world code. So, the 444 00:35:56,860 --> 00:36:02,820 intention behind this tool is to be very generalizable to make it very easy to use 445 00:36:02,820 --> 00:36:08,430 this platform for different devices and to allow people way to, once again, quickly 446 00:36:08,430 --> 00:36:12,360 develop these attacks; and also to see if their own code is vulnerable to these 447 00:36:12,360 --> 00:36:18,490 cache attacks, to see if their code has these secret dependent memory accesses. 448 00:36:18,490 --> 00:36:25,349 So, can we get even better... spatial resolution? Right now, we're down to 64 449 00:36:25,349 --> 00:36:30,320 bytes and that's the size of a cache line, which is the size of our shared hardware. 450 00:36:30,320 --> 00:36:35,510 And on SGX, we actually can get better than 64 bytes, based on something called a 451 00:36:35,510 --> 00:36:39,160 branch-shadowing attack. So, a branch- shadowing attack takes advantage of 452 00:36:39,160 --> 00:36:42,730 something called the branch target buffer. And the branch target buffer is a 453 00:36:42,730 --> 00:36:48,490 structure that's used for branch prediction. It's similar to a cache, but 454 00:36:48,490 --> 00:36:51,740 there's a key difference where the branch target buffer doesn't compare the full 455 00:36:51,740 --> 00:36:54,770 address, when seeing if something is already in the cache or not: It doesn't 456 00:36:54,770 --> 00:36:59,701 compare all of the upper level bits. So, that means that it's possible that two 457 00:36:59,701 --> 00:37:04,140 different addresses will experience a collision, and the same entry from that 458 00:37:04,140 --> 00:37:08,870 BTB cache will be read out for an improper address. Now, since this is just for 459 00:37:08,870 --> 00:37:12,090 branch prediction, the worst that can happen is, you'll get a misprediction and 460 00:37:12,090 --> 00:37:18,070 a small time penalty, but that's about it. The idea of behind the branch-shadowing 461 00:37:18,070 --> 00:37:22,440 attack is leveraging the small difference in this overlapping and this collision of 462 00:37:22,440 --> 00:37:28,540 addresses in order to sort of execute a shared code cell flush-and-reload attack 463 00:37:28,540 --> 00:37:35,330 on the branch target buffer. So, here what goes on is, during the attack the attacker 464 00:37:35,330 --> 00:37:39,650 modifies the SGX Enclave to make sure that the branches that are within the Enclave 465 00:37:39,650 --> 00:37:44,340 will collide with branches that are not in the Enclave. The attacker executes the 466 00:37:44,340 --> 00:37:50,440 Enclave code and then the attacker executes their own code and based on the 467 00:37:50,440 --> 00:37:55,460 outcome of the the victim code in that cache, the attacker code may or may not 468 00:37:55,460 --> 00:37:59,210 experience a branch prediction. So, the attacker is able to tell the outcome of a 469 00:37:59,210 --> 00:38:03,310 branch, because of this overlap in this collision, like would be in a flush-and- 470 00:38:03,310 --> 00:38:06,570 reload attack, where those memories overlap between the attacker and the 471 00:38:06,570 --> 00:38:14,020 victim. So here, our spatial resolution is fantastic: We can tell down to individual 472 00:38:14,020 --> 00:38:19,440 branch instructions in SGX; we can tell exactly, which branches were executed and 473 00:38:19,440 --> 00:38:25,010 which directions they were taken, in the case of conditional branches. The temporal 474 00:38:25,010 --> 00:38:29,720 resolution is also, once again, almost unlimited, because we can use the same 475 00:38:29,720 --> 00:38:33,880 timer interrupts in order to schedule our process, our attacker process. And the 476 00:38:33,880 --> 00:38:39,120 noise is, once again, very low, because we can, once again, use the same sort of 477 00:38:39,120 --> 00:38:43,980 branch misprediction counters, that exist in the Intel world, in order to measure 478 00:38:43,980 --> 00:38:51,510 this noise. So, does anything of that apply to the TrustZone attacks? Well, in 479 00:38:51,510 --> 00:38:55,040 this case the victim and attacker don't share entries in the branch target buffer, 480 00:38:55,040 --> 00:39:01,610 because the attacker is not able to map the virtual address of the victim process. 481 00:39:01,610 --> 00:39:05,340 But this is kind of reminiscent of our earlier cache attacks, so our flush-and- 482 00:39:05,340 --> 00:39:10,100 reload attack only worked when the attack on the victim shared that memory, but we 483 00:39:10,100 --> 00:39:13,930 still have the prime-and-probe attack for when they don't. So, what if we use a 484 00:39:13,930 --> 00:39:21,380 prime-and-probe-style attack on the branch target buffer cache in ARM processors? So, 485 00:39:21,380 --> 00:39:25,320 essentially what we do here is, we prime the branch target buffer by executing mini 486 00:39:25,320 --> 00:39:29,531 attacker branches to sort of fill up this BTB cache with the attacker branch 487 00:39:29,531 --> 00:39:34,770 prediction data; we let the victim execute a branch which will evict an attacker BTB 488 00:39:34,770 --> 00:39:39,120 entry; and then we have the attacker re- execute those branches and see if there 489 00:39:39,120 --> 00:39:45,120 have been any mispredictions. So now, the cool thing about this attack is, the 490 00:39:45,120 --> 00:39:50,320 structure of the BTB cache is different from that of the L1 caches. So, instead of 491 00:39:50,320 --> 00:39:59,750 having 256 different sets in the L1 cache, the BTB cache has 2048 different sets, so 492 00:39:59,750 --> 00:40:06,380 we can tell which branch it attacks, based on which one of 2048 different set IDs 493 00:40:06,380 --> 00:40:11,230 that it could fall into. And even more than that, on the ARM platform, at least 494 00:40:11,230 --> 00:40:15,730 on the Nexus 5x that I was working with, the granularity is no longer 64 bytes, 495 00:40:15,730 --> 00:40:21,830 which is the size of the line, it's now 16 bytes. So, we can see which branches the 496 00:40:21,830 --> 00:40:27,620 the trusted code within TrustZone is executing within 16 bytes. So, what does 497 00:40:27,620 --> 00:40:31,820 this look like? So, previously with the true-spy attack, this is sort of the 498 00:40:31,820 --> 00:40:37,410 outcome of our prime-and-probe attack: We get 1 measurement for those 256 different 499 00:40:37,410 --> 00:40:43,420 set IDs. When we added those interrupts, we're able to get that time resolution, 500 00:40:43,420 --> 00:40:48,090 and it looks something like this. Now, maybe you can see a little bit at the top 501 00:40:48,090 --> 00:40:52,660 of the screen, how there's these repeated sections of little white blocks, and you 502 00:40:52,660 --> 00:40:56,720 can sort of use that to infer, maybe there's the same cache line and cache 503 00:40:56,720 --> 00:41:00,870 instructions that are called over and over. So, just looking at this L1-I cache 504 00:41:00,870 --> 00:41:06,920 attack, you can tell some information about how the process went. Now, let's 505 00:41:06,920 --> 00:41:11,870 compare that to the BTB attack. And I don't know if you can see too clearly -- 506 00:41:11,870 --> 00:41:17,190 it's a it's a bit too high of resolution right now -- so let's just focus in on one 507 00:41:17,190 --> 00:41:22,580 small part of this overall trace. And this is what it looks like. So, each of those 508 00:41:22,580 --> 00:41:27,720 white pixels represents a branch that was taken by that secure-world code and we can 509 00:41:27,720 --> 00:41:31,070 see repeated patterns, we can see maybe different functions that were called, we 510 00:41:31,070 --> 00:41:35,310 can see different loops. And just by looking at this 1 trace, we can infer a 511 00:41:35,310 --> 00:41:40,110 lot of information on how that secure world executed. So, it's incredibly 512 00:41:40,110 --> 00:41:44,230 powerful and all of those secrets are just waiting to be uncovered using these new 513 00:41:44,230 --> 00:41:52,890 tools. So, where do we go from here? What sort of countermeasures do we have? Well, 514 00:41:52,890 --> 00:41:56,690 first of all I think, the long term solution is going to be moving to no more 515 00:41:56,690 --> 00:42:00,200 shared hardware. We need to have separate hardware and no more shared caches in 516 00:42:00,200 --> 00:42:05,750 order to fully get rid of these different cache attacks. And we've already seen this 517 00:42:05,750 --> 00:42:11,420 trend in different cell phones. So, for example, in Apple SSEs for a long time now 518 00:42:11,420 --> 00:42:15,521 -- I think since the Apple A7 -- the secure Enclave, which runs the secure 519 00:42:15,521 --> 00:42:21,000 code, has its own cache. So, these cache attacks can't be accomplished from code 520 00:42:21,000 --> 00:42:27,400 outside of that secure Enclave. So, just by using that separate hardware, it knocks 521 00:42:27,400 --> 00:42:30,970 out a whole class of different potential side-channel and microarchitecture 522 00:42:30,970 --> 00:42:35,610 attacks. And just recently, the Pixel 2 is moving in the same direction. The Pixel 2 523 00:42:35,610 --> 00:42:40,540 now includes a hardware security module that includes cryptographic operations; 524 00:42:40,540 --> 00:42:45,890 and that chip also has its own memory and its own caches, so now we can no longer 525 00:42:45,890 --> 00:42:51,270 use this attack to extract information about what's going on in this external 526 00:42:51,270 --> 00:42:56,530 hardware security module. But even then, using this separate hardware, that doesn't 527 00:42:56,530 --> 00:43:00,800 solve all of our problems. Because we still have the question of "What do we 528 00:43:00,800 --> 00:43:05,900 include in this separate hardware?" On the one hand, we want to include more code in 529 00:43:05,900 --> 00:43:11,370 that a separate hardware, so we're less vulnerable to these side-channel attacks, 530 00:43:11,370 --> 00:43:16,490 but on the other hand, we don't want to expand the attack surface anymore. Because 531 00:43:16,490 --> 00:43:19,060 the more code we include in these secure environments, the more like that a 532 00:43:19,060 --> 00:43:22,600 vulnerabiliyy will be found and the attacker will be able to get a foothold 533 00:43:22,600 --> 00:43:26,470 within the secure, trusted environment. So, there's going to be a balance between 534 00:43:26,470 --> 00:43:30,270 what do you choose to include in the separate hardware and what you don't. So, 535 00:43:30,270 --> 00:43:35,220 do you include DRM code? Do you include cryptographic code? It's still an open 536 00:43:35,220 --> 00:43:41,800 question. And that's sort of the long-term approach. In the short term, you just kind 537 00:43:41,800 --> 00:43:46,370 of have to write side-channel-free software: Just be very careful about what 538 00:43:46,370 --> 00:43:50,811 your process does, if there are any secret, dependent memory accesses or a 539 00:43:50,811 --> 00:43:55,310 secret, dependent branching or secret, dependent function calls, because any of 540 00:43:55,310 --> 00:44:00,010 those can leak the secrets out of your trusted execution environment. So, here 541 00:44:00,010 --> 00:44:03,460 are the things that, if you are a developer of trusted execution environment 542 00:44:03,460 --> 00:44:08,150 code, that I want you to keep in mind: First of all, performance is very often at 543 00:44:08,150 --> 00:44:13,130 odds with security. We've seen over and over that the performance enhancements to 544 00:44:13,130 --> 00:44:18,880 these processors open up the ability for these microarchitectural attacks to be 545 00:44:18,880 --> 00:44:23,750 more efficient. Additionally, these trusted execution environments don't 546 00:44:23,750 --> 00:44:27,160 protect against everything; there are still these side-channel attacks and these 547 00:44:27,160 --> 00:44:32,310 microarchitectural attacks that these systems are vulnerable to. These attacks 548 00:44:32,310 --> 00:44:37,650 are very powerful; they can be accomplished simply; and with the 549 00:44:37,650 --> 00:44:41,770 publication of the code that I've written, it should be very simple to get set up and 550 00:44:41,770 --> 00:44:46,070 to analyze your own code to see "Am I vulnerable, do I expose information in the 551 00:44:46,070 --> 00:44:52,760 same way?" And lastly, it only takes 1 small error, 1 tiny leak from your trusted 552 00:44:52,760 --> 00:44:56,670 and secure code, in order to extract the entire secret, in order to bring the whole 553 00:44:56,670 --> 00:45:03,920 thing down. So, what I want to leave you with is: I want you to remember that you 554 00:45:03,920 --> 00:45:08,520 are responsible for making sure that your program is not vulnerable to these 555 00:45:08,520 --> 00:45:13,110 microarchitectural attacks, because if you do not take responsibility for this, who 556 00:45:13,110 --> 00:45:16,645 will? Thank you! 557 00:45:16,645 --> 00:45:25,040 *Applause* 558 00:45:25,040 --> 00:45:29,821 Herald: Thank you very much. Please, if you want to leave the hall, please do it 559 00:45:29,821 --> 00:45:35,000 quiet and take all your belongings with you and respect the speaker. We have 560 00:45:35,000 --> 00:45:43,230 plenty of time, 16, 17 minutes for Q&A, so please line up on the microphones. No 561 00:45:43,230 --> 00:45:50,650 questions from the signal angel, all right. So, we can start with microphone 6, 562 00:45:50,650 --> 00:45:54,770 please. Mic 6: Okay. There was a symbol of secure 563 00:45:54,770 --> 00:46:01,160 OSes at the ARM TrustZone. Which a idea of them if the non-secure OS gets all the 564 00:46:01,160 --> 00:46:04,210 interrupts? What does is the secure OS for? 565 00:46:04,210 --> 00:46:08,880 Keegan: Yeah so, in the ARMv8 there are a couple different kinds of interrupts. So, 566 00:46:08,880 --> 00:46:11,760 I think -- if I'm remembering the terminology correctly -- there is an IRQ 567 00:46:11,760 --> 00:46:16,800 and an FIQ interrupt. So, the non-secure mode handles the IRQ interrupts and the 568 00:46:16,800 --> 00:46:20,440 secure mode handles the FIQ interrupts. So, depending on which one you send, it 569 00:46:20,440 --> 00:46:24,840 will depend on which direction that monitor will direct that interrupt. 570 00:46:29,640 --> 00:46:32,010 Mic 6: Thank you. Herald: Okay, thank you. Microphone number 571 00:46:32,010 --> 00:46:37,930 7, please. Mic 7: Does any of your present attacks on 572 00:46:37,930 --> 00:46:45,290 TrustZone also apply to the AMD implementation of TrustZone or are you 573 00:46:45,290 --> 00:46:48,380 looking into it? Keegan: I haven't looked into AMD too 574 00:46:48,380 --> 00:46:54,011 much, because, as far as I can tell, that's not used as commonly, but there are 575 00:46:54,011 --> 00:46:57,490 many different types of trusted execution environments. The 2 that I focus on were 576 00:46:57,490 --> 00:47:04,760 SGX and TrustZone, because those are the most common examples that I've seen. 577 00:47:04,760 --> 00:47:09,250 Herald: Thank you. Microphone number 8, please. 578 00:47:09,250 --> 00:47:20,370 Mic 8: When TrustZone is moved to dedicated hardware, dedicated memory, 579 00:47:20,370 --> 00:47:27,780 couldn't you replicate the userspace attacks by loading your own trusted 580 00:47:27,780 --> 00:47:32,210 userspace app and use it as an oracle of some sorts? 581 00:47:32,210 --> 00:47:35,760 Keegan: If you can load your own trust code, then yes, you could do that. But in 582 00:47:35,760 --> 00:47:39,650 many of the models I've seen today, that's not possible. So, that's why you have 583 00:47:39,650 --> 00:47:44,250 things like code signing, which prevent the arbitrary user from running their own 584 00:47:44,250 --> 00:47:50,310 code in the trusted OS... or in the the trusted environment. 585 00:47:50,310 --> 00:47:55,010 Herald: All right. Microphone number 1. Mic 1: So, these attacks are more powerful 586 00:47:55,010 --> 00:48:00,720 against code that's running in... just the execution environments than similar 587 00:48:00,720 --> 00:48:07,100 attacks would be against ring-3 code, or, in general, trusted code. Does that mean 588 00:48:07,100 --> 00:48:10,910 that trusting execution environments are basically an attractive nuisance that we 589 00:48:10,910 --> 00:48:15,080 shouldn't use? Keegan: There's still a large benefit to 590 00:48:15,080 --> 00:48:17,600 using these trusted execution environments. The point I want to get 591 00:48:17,600 --> 00:48:21,390 across is that, although they add a lot of features, they don't protect against 592 00:48:21,390 --> 00:48:25,450 everything, so you should keep in mind that these side-channel attacks do still 593 00:48:25,450 --> 00:48:28,820 exist and you still need to protect against them. But overall, these are 594 00:48:28,820 --> 00:48:35,930 better things and worthwhile in including. Herald: Thank you. Microphone number 1 595 00:48:35,930 --> 00:48:41,580 again, please Mic 1: So, AMD is doing something with 596 00:48:41,580 --> 00:48:47,780 encrypting memory and I'm not sure if they encrypt addresses, too, and but would that 597 00:48:47,780 --> 00:48:53,090 be a defense against such attacks? Keegan: So, I'm not too familiar with AMD, 598 00:48:53,090 --> 00:48:57,690 but SGX also encrypts memory. It encrypts it in between the lowest-level cache and 599 00:48:57,690 --> 00:49:02,170 the main memory. But that doesn't really have an impact on the actual operation, 600 00:49:02,170 --> 00:49:06,220 because the memories encrypt at the cache line level and as the attacker, we don't 601 00:49:06,220 --> 00:49:10,380 care what that data is within that cache line, we only care which cache line is 602 00:49:10,380 --> 00:49:16,150 being accessed. Mic 1: If you encrypt addresses, wouldn't 603 00:49:16,150 --> 00:49:20,551 that help against that? Keegan: I'm not sure, how you would 604 00:49:20,551 --> 00:49:25,070 encrypt the addresses yourself. As long as those adresses map into the same set IDs 605 00:49:25,070 --> 00:49:30,200 that the victim can map into, then the victim could still pull off the same style 606 00:49:30,200 --> 00:49:35,030 of attacks. Herald: Great. We have a question from the 607 00:49:35,030 --> 00:49:38,200 internet, please. Signal Angel: The question is "Does the 608 00:49:38,200 --> 00:49:42,410 secure enclave on the Samsung Exynos distinguish the receiver of the messag, so 609 00:49:42,410 --> 00:49:46,830 that if the user application asked to decode an AES message, can one sniff on 610 00:49:46,830 --> 00:49:52,220 the value that the secure enclave returns?" 611 00:49:52,220 --> 00:49:56,680 Keegan: So, that sounds like it's asking about the true-spy style attack, where 612 00:49:56,680 --> 00:50:01,270 it's calling to the secure world to encrypt something with AES. I think, that 613 00:50:01,270 --> 00:50:04,830 would all depend on the different implementation: As long as it's encrypting 614 00:50:04,830 --> 00:50:09,790 for a certain key and it's able to do that repeatably, then the attack would, 615 00:50:09,790 --> 00:50:16,290 assuming a vulnerable AES implementation, would be able to extract that key out. 616 00:50:16,290 --> 00:50:20,750 Herald: Cool. Microphone number 2, please. Mic 2: Do you recommend a reference to 617 00:50:20,750 --> 00:50:25,350 understand how these cache line attacks and branch oracles actually lead to key 618 00:50:25,350 --> 00:50:29,540 recovery? Keegan: Yeah. So, I will flip through 619 00:50:29,540 --> 00:50:33,620 these pages which include a lot of the references for the attacks that I've 620 00:50:33,620 --> 00:50:38,030 mentioned, so if you're watching the video, you can see these right away or 621 00:50:38,030 --> 00:50:43,200 just access the slides. And a lot of these contain good starting points. So, I didn't 622 00:50:43,200 --> 00:50:46,340 go into a lot of the details on how, for example, the true-spy attack recovered 623 00:50:46,340 --> 00:50:53,090 that AES key., but that paper does have a lot of good links, how those areas can 624 00:50:53,090 --> 00:50:56,350 lead to key recovery. Same thing with the CLKSCREW attack, how the different fault 625 00:50:56,350 --> 00:51:03,070 injection can lead to key recovery. Herald: Microphone number 6, please. 626 00:51:03,070 --> 00:51:07,900 Mic 6: I think my question might have been very, almost the same thing: How hard is 627 00:51:07,900 --> 00:51:11,920 it actually to recover the keys? Is this like a massive machine learning problem or 628 00:51:11,920 --> 00:51:18,500 is this something that you can do practically on a single machine? 629 00:51:18,500 --> 00:51:21,640 Keegan: It varies entirely by the end implementation. So, for all these attacks 630 00:51:21,640 --> 00:51:25,750 work, you need to have some sort of vulnerable implementation and some 631 00:51:25,750 --> 00:51:29,010 implementations leak more data than others. In the case of a lot of the AES 632 00:51:29,010 --> 00:51:33,880 attacks, where you're doing the passive attacks, those are very easy to do on just 633 00:51:33,880 --> 00:51:37,630 your own computer. For the AES fault injection attack, I think that one 634 00:51:37,630 --> 00:51:42,340 required more brute force, in the CLKSCREW paper, so that one required more computing 635 00:51:42,340 --> 00:51:49,780 resources, but still, it was entirely practical to do in a realistic setting. 636 00:51:49,780 --> 00:51:53,770 Herald: Cool, thank you. So, we have one more: Microphone number 1, please. 637 00:51:53,770 --> 00:51:59,080 Mic 1: So, I hope it's not a too naive question, but I was wondering, since all 638 00:51:59,080 --> 00:52:04,730 these attacks are based on cache hit and misses, isn't it possible to forcibly 639 00:52:04,730 --> 00:52:11,280 flush or invalidate or insert noise in cache after each operation in this trust 640 00:52:11,280 --> 00:52:23,520 environment, in order to mess up the guesswork of the attacker? So, discarding 641 00:52:23,520 --> 00:52:29,180 optimization and performance for additional security benefits. 642 00:52:29,180 --> 00:52:32,420 Keegan: Yeah, and that is absolutely possible and you are absolutely right: It 643 00:52:32,420 --> 00:52:36,300 does lead to a performance degradation, because if you always flush the entire 644 00:52:36,300 --> 00:52:41,190 cache every time you do a context switch, that will be a huge performance hit. So 645 00:52:41,190 --> 00:52:45,190 again, that comes down to the question of the performance and security trade-off: 646 00:52:45,190 --> 00:52:49,540 Which one do you end up going with? And it seems historically the choice has been 647 00:52:49,540 --> 00:52:54,000 more in the direction of performance. Mic 1: Thank you. 648 00:52:54,000 --> 00:52:56,920 Herald: But we have one more: Microphone number 1, please. 649 00:52:56,920 --> 00:53:01,500 Mic 1: So, I have more of a moral question: So, how well should we really 650 00:53:01,500 --> 00:53:07,720 protect from attacks which need some ring-0 cooperation? Because, basically, 651 00:53:07,720 --> 00:53:14,350 when we use TrustZone for purpose... we would see clear, like protecting the 652 00:53:14,350 --> 00:53:20,250 browser from interacting from outside world, then we are basically using the 653 00:53:20,250 --> 00:53:27,280 safe execution environment for sandboxing the process. But once we need some 654 00:53:27,280 --> 00:53:32,281 cooperation from the kernel, some of that attacks, is in fact, empower the user 655 00:53:32,281 --> 00:53:36,320 instead of the hardware producer. Keegan: Yeah, and you're right. It 656 00:53:36,320 --> 00:53:39,210 depends entirely on what your application is and what your threat model is that 657 00:53:39,210 --> 00:53:43,020 you're looking at. So, if you're using these trusted execution environments to do 658 00:53:43,020 --> 00:53:48,430 DRM, for example, then maybe you wouldn't be worried about that ring-0 attack or 659 00:53:48,430 --> 00:53:51,620 that privileged attacker who has their phone rooted and is trying to recover 660 00:53:51,620 --> 00:53:56,740 these media encryption keys from this execution environment. But maybe there are 661 00:53:56,740 --> 00:54:01,230 other scenarios where you're not as worried about having an attack with a 662 00:54:01,230 --> 00:54:05,580 compromised ring 0. So, it entirely depends on context. 663 00:54:05,580 --> 00:54:09,000 Herald: Alright, thank you. So, we have one more: Microphone number 1, again. 664 00:54:09,000 --> 00:54:10,990 Mic 1: Hey there. Great talk, thank you very much. 665 00:54:10,990 --> 00:54:13,040 Keegan: Thank you. Mic 1: Just a short question: Do you have 666 00:54:13,040 --> 00:54:16,980 any success stories about attacking the TrustZone and the different 667 00:54:16,980 --> 00:54:24,010 implementations of TE with some vendors like some OEMs creating phones and stuff? 668 00:54:24,010 --> 00:54:29,750 Keegan: Not that I'm announcing at this time. 669 00:54:29,750 --> 00:54:35,584 Herald: So, thank you very much. Please, again a warm round of applause for Keegan! 670 00:54:35,584 --> 00:54:39,998 *Applause* 671 00:54:39,998 --> 00:54:45,489 *34c3 postroll music* 672 00:54:45,489 --> 00:55:02,000 subtitles created by c3subtitles.de in the year 2018. Join, and help us!