1 00:00:00,796 --> 00:00:05,801 (piano music) 2 00:00:14,243 --> 00:00:15,451 - [Voiceover] In the near future, 3 00:00:15,451 --> 00:00:18,751 every object on earth will be generating data, 4 00:00:18,751 --> 00:00:21,343 including our homes, our cars, 5 00:00:21,343 --> 00:00:23,049 even our bodies. 6 00:00:23,049 --> 00:00:24,269 - Do you see it? 7 00:00:25,146 --> 00:00:26,883 Yeah, right up there. 8 00:00:27,400 --> 00:00:29,399 - [Voiceover] Almost everything we do today 9 00:00:29,399 --> 00:00:32,072 leaves a trail of digital exhaust, 10 00:00:32,072 --> 00:00:34,698 a perpetual stream of texts, location data, 11 00:00:34,698 --> 00:00:37,163 and other information that will live on 12 00:00:37,163 --> 00:00:40,038 well after each of us is long gone. 13 00:00:43,136 --> 00:00:45,681 We are now being exposed to as much information 14 00:00:45,681 --> 00:00:49,528 in a single day as our 15th century ancestors 15 00:00:49,528 --> 00:00:52,857 were exposed to in their entire lifetime. 16 00:00:53,734 --> 00:00:56,035 But we need to be very careful 17 00:00:56,035 --> 00:00:58,081 because in this vast ocean of data 18 00:00:58,081 --> 00:01:01,800 there's a frighteningly complete picture of us, 19 00:01:01,800 --> 00:01:04,008 where we live, where we go, 20 00:01:04,008 --> 00:01:06,472 what we buy, what we say, 21 00:01:06,472 --> 00:01:10,604 it's all being recorded and stored forever. 22 00:01:12,399 --> 00:01:15,200 This is the story of an extraordinary revolution 23 00:01:15,200 --> 00:01:18,454 that's sweeping almost invisibly through our lives 24 00:01:18,454 --> 00:01:20,546 and about how our planet is beginning to develop 25 00:01:20,546 --> 00:01:25,305 a nervous system with each of us acting as human sensors. 26 00:01:26,554 --> 00:01:29,733 This is the human face of big data. 27 00:01:33,353 --> 00:01:35,116 - All these devices and machines and everything 28 00:01:35,116 --> 00:01:37,243 we're building these days, whether it's phones or computers 29 00:01:37,243 --> 00:01:41,084 or cars or refrigerators, are throwing off data. 30 00:01:41,996 --> 00:01:45,506 - Information is being extracted out of toll booths, 31 00:01:45,506 --> 00:01:46,970 out of parking spaces, 32 00:01:46,970 --> 00:01:48,760 out of Internet searches, 33 00:01:48,760 --> 00:01:51,259 out of Facebook, out of your phone, 34 00:01:51,259 --> 00:01:53,595 tablets, photographs, videos. 35 00:01:53,595 --> 00:01:57,825 - Every single thing that you do leaves a digital trace. 36 00:01:57,825 --> 00:02:00,336 - The exhaust or evidence of humans 37 00:02:00,336 --> 00:02:04,543 interacting with technology and what side effect that has 38 00:02:04,543 --> 00:02:08,089 and that's literally, it's just this massive amount of data. 39 00:02:14,151 --> 00:02:15,440 - What we're doing is we're measuring things 40 00:02:15,440 --> 00:02:16,649 more than we ever have. 41 00:02:16,649 --> 00:02:19,691 It's that active measurement that produces data. 42 00:02:19,691 --> 00:02:22,072 - If you were some omniscient god 43 00:02:22,072 --> 00:02:24,839 and you could look at the footprints of electric devices, 44 00:02:24,839 --> 00:02:26,791 you could kind of see the world. 45 00:02:26,791 --> 00:02:30,103 If the whole world is being recorded in real time, 46 00:02:30,103 --> 00:02:31,940 you could see everything that is going on in the world 47 00:02:31,940 --> 00:02:34,230 through the footprints. 48 00:02:34,230 --> 00:02:35,519 I think it's a lot like written language, right, 49 00:02:35,519 --> 00:02:36,986 it's just at some point they got to the point 50 00:02:36,986 --> 00:02:38,370 where you had to start writing stuff down. 51 00:02:38,370 --> 00:02:40,451 You just got to the point where it wouldn't work 52 00:02:40,451 --> 00:02:42,042 unless we wrote it down, which is making the same point 53 00:02:42,042 --> 00:02:43,879 where well it ain't gonna work unless we write 54 00:02:43,879 --> 00:02:45,715 all the data down and then look at it. 55 00:02:45,715 --> 00:02:48,844 - And all that data coming in is big data. 56 00:02:49,965 --> 00:02:53,185 - We estimate that by 2020 57 00:02:53,185 --> 00:02:56,985 the data volumes will be at about 40 zigabytes. 58 00:02:56,985 --> 00:02:58,112 Just to put it in perspective, 59 00:02:58,112 --> 00:03:00,530 if you were to add up every single grain of sand 60 00:03:00,530 --> 00:03:03,702 on the planet and multiply that by 75, 61 00:03:03,702 --> 00:03:07,125 that would be 40 zigabytes of information. 62 00:03:08,549 --> 00:03:11,477 - All the data processing we did in the last two years 63 00:03:11,477 --> 00:03:13,057 is more than all the data processing 64 00:03:13,057 --> 00:03:16,646 we did in the last 3,000 years. 65 00:03:16,646 --> 00:03:18,819 - And so the more information we get, 66 00:03:18,819 --> 00:03:21,998 the larger the problems will be that we solve. 67 00:03:23,828 --> 00:03:26,873 - Every powerful tool has a dark side, every last one. 68 00:03:26,873 --> 00:03:28,256 Anything that's going to change the world, 69 00:03:28,256 --> 00:03:31,510 by definition has to be able to change it for the worse 70 00:03:31,510 --> 00:03:32,684 as much as for the better. 71 00:03:32,684 --> 00:03:35,598 It doesn't work one way without the other. 72 00:03:35,598 --> 00:03:36,934 - When it comes to big data, a lot of people 73 00:03:36,934 --> 00:03:38,433 are very nervous. 74 00:03:38,433 --> 00:03:41,106 Data can be used in any number of ways 75 00:03:41,106 --> 00:03:43,698 that you're either aware of or you're not. 76 00:03:43,698 --> 00:03:47,333 The less aware of the use of that data that you are, 77 00:03:47,333 --> 00:03:50,215 the less power you have in the coming society 78 00:03:50,215 --> 00:03:51,214 we're going to live. 79 00:03:51,214 --> 00:03:53,178 - Well sort of just in the beginning of this big data thing, 80 00:03:53,178 --> 00:03:54,297 you don't know how it's going to change it, 81 00:03:54,297 --> 00:03:55,937 but you just know it is. 82 00:03:55,937 --> 00:04:00,942 (dramatic music) 83 00:04:07,878 --> 00:04:11,760 - The first real data set to change everything in the world 84 00:04:11,760 --> 00:04:13,933 was the astronomical data set, 85 00:04:13,933 --> 00:04:18,151 meticulously collected over tens of years by Copernicus 86 00:04:18,151 --> 00:04:20,767 that ultimately revealed, even though the sun seemed to be 87 00:04:20,767 --> 00:04:23,405 moving over the sky every morning and every night, 88 00:04:23,405 --> 00:04:25,949 the sun is not moving, it is we who are moving, 89 00:04:25,949 --> 00:04:27,785 it is we who are spinning. 90 00:04:27,785 --> 00:04:30,074 It happened again when we suddenly could see 91 00:04:30,074 --> 00:04:31,342 beneath the visible level 92 00:04:31,342 --> 00:04:33,968 and the microscope in the 1650s and 60s, 93 00:04:33,968 --> 00:04:36,839 opened up the invisible world 94 00:04:36,839 --> 00:04:40,396 and we for the first time were seeing cells and bacteria 95 00:04:40,396 --> 00:04:43,324 and creatures that we couldn't imagine were there. 96 00:04:43,324 --> 00:04:46,124 It then happened again when we revealed the atomic world, 97 00:04:46,124 --> 00:04:47,578 when we said wait a second, there's a level 98 00:04:47,578 --> 00:04:50,087 below the optical microscope where we could begin 99 00:04:50,087 --> 00:04:53,632 to see things at billionths of a meter at a nanometer scale, 100 00:04:53,632 --> 00:04:55,433 where we imagined the atom and the nucleus 101 00:04:55,433 --> 00:04:57,559 and the electron, where we understood that light 102 00:04:57,559 --> 00:05:00,069 is electromagnetic frequencies. 103 00:05:00,069 --> 00:05:02,731 But now, there's actual a supervisible world 104 00:05:02,731 --> 00:05:04,192 coming into play. 105 00:05:04,192 --> 00:05:06,947 Ironically, big data is a microscope. 106 00:05:06,947 --> 00:05:09,829 We're now collecting exabytes and petabytes of data 107 00:05:09,829 --> 00:05:11,874 and we're looking through that microscope 108 00:05:11,874 --> 00:05:13,963 using incredibly powerful algorithms 109 00:05:13,963 --> 00:05:17,561 to see what we would never see before. 110 00:05:20,564 --> 00:05:22,826 - Before what we did was we 111 00:05:22,826 --> 00:05:25,826 thought of things and then we wrote it down 112 00:05:25,826 --> 00:05:28,491 and that became knowledge. 113 00:05:30,041 --> 00:05:31,215 Big data's kind of the opposite. 114 00:05:31,215 --> 00:05:35,388 You have a pile of data that isn't knowledge really 115 00:05:35,388 --> 00:05:37,886 until you start looking at it and noticing wait, 116 00:05:37,886 --> 00:05:40,105 maybe if you shift it this way and you shift it this way, 117 00:05:40,105 --> 00:05:43,019 this turns into this interesting piece of information. 118 00:05:43,019 --> 00:05:45,855 - I think that the BDAD moment, 119 00:05:45,855 --> 00:05:48,040 you know, before data, after data moment, 120 00:05:48,040 --> 00:05:49,501 is really Search. 121 00:05:49,501 --> 00:05:52,372 (tapping) 122 00:05:52,372 --> 00:05:56,040 That was the moment at which we got a tool 123 00:05:56,040 --> 00:05:59,562 that was used by hundreds of millions of people 124 00:05:59,562 --> 00:06:01,015 within a few years, 125 00:06:01,015 --> 00:06:04,444 where we could navigate an incredible amount 126 00:06:04,444 --> 00:06:06,241 of information. 127 00:06:06,241 --> 00:06:10,065 We took all of human knowledge that was in text, right, 128 00:06:10,065 --> 00:06:11,669 and we put it on the web 129 00:06:11,669 --> 00:06:13,365 and we thought to ourselves, "Well we're done. 130 00:06:13,365 --> 00:06:14,749 "Wow that was hard." 131 00:06:14,749 --> 00:06:18,084 And now we realize that was the first minute 132 00:06:18,084 --> 00:06:21,013 of the first inning of the game, right, 133 00:06:21,013 --> 00:06:23,105 because that was just the knowledge we already had 134 00:06:23,105 --> 00:06:25,359 and the knowledge that we continue to add to the web 135 00:06:25,359 --> 00:06:28,732 at a relatively slow pace, you know. 136 00:06:28,732 --> 00:06:30,531 But there is so much more information 137 00:06:30,531 --> 00:06:32,461 that we have not digitized and so much more 138 00:06:32,461 --> 00:06:35,214 information that we're about to take advantage of. 139 00:06:35,214 --> 00:06:39,040 (piano music) 140 00:06:39,040 --> 00:06:41,552 - [Voiceover] In recent years, our technology has allowed us 141 00:06:41,552 --> 00:06:45,470 to store and process mass quantities of data. 142 00:06:47,115 --> 00:06:49,392 Visualizing that data will allow us to see 143 00:06:49,392 --> 00:06:51,953 complex systems function, 144 00:06:53,074 --> 00:06:54,922 see patterns and meaning in ways 145 00:06:54,922 --> 00:06:57,752 that were previously impossible. 146 00:06:59,466 --> 00:07:03,761 Almost everything is measurable and quantifiable. 147 00:07:12,528 --> 00:07:14,294 - So when I look at data, what's exciting to me 148 00:07:14,294 --> 00:07:16,166 is kind of recontextualizing that data 149 00:07:16,166 --> 00:07:18,211 and taking it and putting it back into a form 150 00:07:18,211 --> 00:07:21,221 that we can perceive, understand, talk about, 151 00:07:21,221 --> 00:07:22,727 think about. 152 00:07:24,185 --> 00:07:25,522 - [Voiceover] This is the data for airplane traffic 153 00:07:25,522 --> 00:07:28,403 over North America for a 24-hour period. 154 00:07:28,403 --> 00:07:29,868 When it's visualized, you see everything starts 155 00:07:29,868 --> 00:07:32,447 to fade to black as everyone goes to sleep, 156 00:07:32,447 --> 00:07:34,376 then on the West Coast, planes start moving across 157 00:07:34,376 --> 00:07:36,375 on red-eye flights to the east 158 00:07:36,375 --> 00:07:38,293 and you see everyone waking up on the East Coast, 159 00:07:38,293 --> 00:07:42,136 followed by European flights in the upper right-hand corner. 160 00:07:42,136 --> 00:07:44,473 I think it's one thing to say that there's 140,000 planes 161 00:07:44,473 --> 00:07:47,646 being monitored by the federal government at any one time 162 00:07:47,646 --> 00:07:49,028 and it's another thing to see that system 163 00:07:49,028 --> 00:07:51,766 as it ebbs and flows in front of you. 164 00:07:57,082 --> 00:07:59,336 These are text messages being sent in the city of Amsterdam 165 00:07:59,336 --> 00:08:00,801 on December 31st. 166 00:08:00,801 --> 00:08:02,427 You're seeing the daily flow of text messages 167 00:08:02,427 --> 00:08:05,019 from different parts of the city until we approach midnight, 168 00:08:05,019 --> 00:08:06,434 where everyone says-- 169 00:08:06,434 --> 00:08:09,149 - [Voiceover] Happy New Year! 170 00:08:09,654 --> 00:08:12,618 - It takes people or programs or algorithms 171 00:08:12,618 --> 00:08:15,325 to connect it all together to make sense of it 172 00:08:15,325 --> 00:08:16,581 and that's what's important. 173 00:08:16,581 --> 00:08:20,090 We have every single action that we do in this world 174 00:08:20,090 --> 00:08:23,181 is triggering off some amount of data 175 00:08:23,181 --> 00:08:24,634 and most of that data is meaningless 176 00:08:24,634 --> 00:08:27,644 until someone adds some interpretation of it, 177 00:08:27,644 --> 00:08:30,485 someone adds a narrative around it. 178 00:08:36,732 --> 00:08:38,998 - Often, we sort of think of data as stranded numbers, 179 00:08:38,998 --> 00:08:41,544 but they're tethered to things 180 00:08:41,544 --> 00:08:44,425 and if we follow those tethers in the right ways, 181 00:08:44,425 --> 00:08:47,680 then we can find the real-world objects 182 00:08:47,680 --> 00:08:49,446 and the real-world stories that were there. 183 00:08:49,446 --> 00:08:52,654 So a lot of the work is that kind of work. 184 00:08:52,654 --> 00:08:56,036 It's almost investigative work of trying to follow 185 00:08:56,036 --> 00:08:59,412 that trail from the data to what actually happened. 186 00:09:05,891 --> 00:09:07,553 - Sometimes the power of large data sets 187 00:09:07,553 --> 00:09:09,801 isn't immediately obvious. 188 00:09:10,481 --> 00:09:11,989 Google flu trends is a great example 189 00:09:11,989 --> 00:09:14,988 of taking a look at a massive corpus of data 190 00:09:14,988 --> 00:09:17,033 and deriving somewhat tangential information 191 00:09:17,033 --> 00:09:19,878 that can actually be really valuable. 192 00:09:19,878 --> 00:09:21,923 - [Voiceover] Until recently, the only way to detect 193 00:09:21,923 --> 00:09:24,678 a flu epidemic was by accumulating information 194 00:09:24,678 --> 00:09:27,431 submitted by doctors about patient visits, 195 00:09:27,431 --> 00:09:30,941 a process that took about two weeks to reach the CDC. 196 00:09:30,941 --> 00:09:33,022 So the researchers turned it around. 197 00:09:33,022 --> 00:09:36,149 They asked themselves if they could predict a flu outbreak 198 00:09:36,149 --> 00:09:40,412 in real time simply using data from online searches. 199 00:09:40,412 --> 00:09:42,970 So they set out to do the near impossible, 200 00:09:42,970 --> 00:09:45,713 searching the searches, billions of them, 201 00:09:45,713 --> 00:09:48,513 spanning five years to see if user queries 202 00:09:48,513 --> 00:09:50,774 could tell them something. 203 00:09:51,976 --> 00:09:53,604 - When we do searches on Google, 204 00:09:53,604 --> 00:09:55,033 we all think of it as a one-way street, 205 00:09:55,033 --> 00:09:57,276 that we're going into Google and extracting information 206 00:09:57,276 --> 00:09:58,996 from Google, but one of the things we don't really 207 00:09:58,996 --> 00:10:00,704 think about very much is we're actually contributing 208 00:10:00,704 --> 00:10:03,329 information back simply by doing the search. 209 00:10:03,329 --> 00:10:05,711 - [Voiceover] And that's where the breakthrough occurred. 210 00:10:05,711 --> 00:10:07,965 In looking at all the data, they saw that not only 211 00:10:07,965 --> 00:10:10,544 did the number of flu-related searches correlate 212 00:10:10,544 --> 00:10:12,311 with the people who had the flu, 213 00:10:12,311 --> 00:10:14,438 but they also could identify the search terms 214 00:10:14,438 --> 00:10:17,575 that could let them accurately predict flu outbreaks 215 00:10:17,575 --> 00:10:20,862 up to two weeks before the CDC. 216 00:10:20,862 --> 00:10:23,034 - The CDC system takes about a week or two 217 00:10:23,034 --> 00:10:25,801 for the numbers to sort of fully flow in. 218 00:10:25,801 --> 00:10:28,055 What Google could do is to say based on our model, 219 00:10:28,055 --> 00:10:29,764 we'll have it on the spot. 220 00:10:29,764 --> 00:10:32,390 We'll just run the algorithm 221 00:10:32,390 --> 00:10:34,738 based on how people are searching right now. 222 00:10:34,738 --> 00:10:35,946 - And now we have, for the first time, 223 00:10:35,946 --> 00:10:38,829 this real-time feedback loop where we can see 224 00:10:38,829 --> 00:10:41,868 in real time what's going on and respond to it. 225 00:10:42,500 --> 00:10:44,206 - Now there is a flip side to this though 226 00:10:44,206 --> 00:10:46,678 and that is there was a big story this year that 227 00:10:46,678 --> 00:10:49,711 there was a lot of media attention about 228 00:10:49,711 --> 00:10:52,059 what an intense flu season this was. 229 00:10:52,059 --> 00:10:52,965 And so what did that do? 230 00:10:52,965 --> 00:10:55,022 That drove up search. 231 00:10:55,022 --> 00:10:56,567 That drove people who were more interested 232 00:10:56,567 --> 00:10:57,986 in what's going on with this flu 233 00:10:57,986 --> 00:11:01,414 or might have made more people think I must have it 234 00:11:01,414 --> 00:11:04,919 and so they were off, they got it way wrong. 235 00:11:05,923 --> 00:11:08,340 - So you know, one way to think about big data 236 00:11:08,340 --> 00:11:10,723 and all of the computational tools 237 00:11:10,723 --> 00:11:13,059 that we wrap around that big data 238 00:11:13,059 --> 00:11:16,220 to let us discover patterns that are in the data 239 00:11:16,220 --> 00:11:20,698 is when we point all that machinery at ourselves. 240 00:11:22,911 --> 00:11:25,038 - [Voiceover] At MTI, Deb Roy and his colleagues 241 00:11:25,038 --> 00:11:26,712 wanted to see if they could understand 242 00:11:26,712 --> 00:11:28,830 how children acquire language. 243 00:11:30,265 --> 00:11:32,217 - And we realize that no one really knew 244 00:11:32,217 --> 00:11:34,806 for a simple reason, there was no data. 245 00:11:34,806 --> 00:11:36,386 - [Voiceover] After he and his wife Rupal 246 00:11:36,386 --> 00:11:38,896 brought their newborn son home from the hospital, 247 00:11:38,896 --> 00:11:41,360 they did what every normal parent would do, 248 00:11:41,360 --> 00:11:43,918 mount a camera in the ceiling of each room in their home 249 00:11:43,918 --> 00:11:47,281 and record every moment of their lives for two years, 250 00:11:47,281 --> 00:11:50,470 a mere 200 gigabytes of data recorded every day. 251 00:11:53,509 --> 00:11:55,682 - [Deb] We ended up transcribing somewhere between 252 00:11:55,682 --> 00:11:57,984 eight and nine million words of speech. 253 00:11:57,984 --> 00:12:00,238 - [Voiceover] Ga ga ga. 254 00:12:00,238 --> 00:12:03,733 - And as soon as we had that, we could go and identify 255 00:12:03,733 --> 00:12:08,504 the exact moment where my son first said a new word. 256 00:12:10,380 --> 00:12:12,676 - [Deb] We started calling them births. 257 00:12:15,505 --> 00:12:17,551 - We took this idea of a word birth and we started 258 00:12:17,551 --> 00:12:20,817 thinking about why don't we trace back in time 259 00:12:20,817 --> 00:12:23,773 and look at the gestation period for that word. 260 00:12:25,532 --> 00:12:28,030 One example of this was water. 261 00:12:28,030 --> 00:12:32,679 So we looked at every time my son heard the word water, 262 00:12:32,679 --> 00:12:36,062 what was happening, where in the house were they, 263 00:12:36,062 --> 00:12:37,723 how were they moving about 264 00:12:37,723 --> 00:12:40,931 and using that visual information 265 00:12:40,931 --> 00:12:43,360 to capture something about the context 266 00:12:43,360 --> 00:12:46,161 within which the words are used. 267 00:12:46,161 --> 00:12:47,835 We call them wordscapes. 268 00:12:47,835 --> 00:12:49,252 Then we could ask the question 269 00:12:49,252 --> 00:12:52,425 how does the wordscape associated with a word 270 00:12:52,425 --> 00:12:56,466 predict when my son will actually start using that word? 271 00:12:56,466 --> 00:12:58,686 - [Voiceover] What they learned from watching Deb's son 272 00:12:58,686 --> 00:13:02,817 was that the texture of the wordscapes had predictive power. 273 00:13:02,817 --> 00:13:04,985 If most of the previous research had indicated 274 00:13:04,985 --> 00:13:08,378 that the way language was learned was through repetition, 275 00:13:08,378 --> 00:13:10,377 then this analysis of the data showed that it wasn't 276 00:13:10,377 --> 00:13:14,758 actually repetition that generated learning, but context. 277 00:13:14,758 --> 00:13:16,851 Words with more distinct wordscapes, 278 00:13:16,851 --> 00:13:19,849 that is words heard in many varied locations, 279 00:13:19,849 --> 00:13:21,728 would be learned first. 280 00:13:21,728 --> 00:13:23,646 - Not only is that true, 281 00:13:23,646 --> 00:13:26,575 but the wordscapes are far more predictive 282 00:13:26,575 --> 00:13:28,109 of when a word will be learned 283 00:13:28,109 --> 00:13:31,003 than the frequency, the number of times it's actually heard. 284 00:13:31,003 --> 00:13:33,292 It's like we're building a new kind of instrument, 285 00:13:33,292 --> 00:13:35,256 like we're building a microscope 286 00:13:35,256 --> 00:13:38,684 and we're able to examine something that is around us, 287 00:13:38,684 --> 00:13:42,067 but it has a structure and patterns and beauty 288 00:13:42,067 --> 00:13:45,402 that are invisible without the right instruments 289 00:13:45,402 --> 00:13:48,783 and all of this data is opening up 290 00:13:48,783 --> 00:13:52,963 to our ability to perceive things around us. 291 00:13:53,548 --> 00:13:55,833 (giggling) 292 00:13:55,833 --> 00:13:57,253 - He's walking. 293 00:13:57,253 --> 00:14:02,258 (beeping) 294 00:14:03,935 --> 00:14:05,608 - A lot of people don't realize 295 00:14:05,608 --> 00:14:07,991 that when a baby is born premature, 296 00:14:07,991 --> 00:14:11,076 it can develop infection in the hospital 297 00:14:11,076 --> 00:14:13,081 and it can kill them. 298 00:14:15,585 --> 00:14:19,847 In our research, we started to just look at infection. 299 00:14:19,847 --> 00:14:22,928 By the time the baby is physically showing signs 300 00:14:22,928 --> 00:14:27,025 of having infection, they are very, very unwell. 301 00:14:27,611 --> 00:14:30,748 So the very first time that I went into a neonatal 302 00:14:30,748 --> 00:14:32,922 intensive care unit, I was amazed 303 00:14:32,922 --> 00:14:35,711 by the sights, the sound, the smell, 304 00:14:35,711 --> 00:14:37,547 just the whole environment, 305 00:14:37,547 --> 00:14:40,432 but mainly for me, the data. 306 00:14:41,472 --> 00:14:45,319 What shocked me was the amount of data lost. 307 00:14:45,319 --> 00:14:46,981 They showed me the paper chart 308 00:14:46,981 --> 00:14:49,654 that the information's recorded onto. 309 00:14:49,654 --> 00:14:52,779 One number every hour for the baby's heart rate, 310 00:14:52,779 --> 00:14:55,533 the respiration, the blood oxygen. 311 00:14:55,533 --> 00:14:58,846 Now in that time, the baby's heart has beaten 312 00:14:58,846 --> 00:15:00,962 more than 7,000 times, 313 00:15:00,962 --> 00:15:03,437 they breathe more than 2,000 times, 314 00:15:03,437 --> 00:15:06,610 and the monitor showing the blood oxygen level 315 00:15:06,610 --> 00:15:10,233 has showed that more than three and a half thousand times. 316 00:15:10,233 --> 00:15:11,749 I said, "Well, where's all the data going 317 00:15:11,749 --> 00:15:13,110 "that's in those machines?" 318 00:15:13,110 --> 00:15:16,038 And they said, "Oh it scrolls out of the memory." 319 00:15:16,038 --> 00:15:21,043 So we have an enormous amount of data lost. 320 00:15:21,384 --> 00:15:23,429 So we're trying to gather that information 321 00:15:23,429 --> 00:15:25,649 and use it over a longer time 322 00:15:25,649 --> 00:15:28,148 in much more complex ways than before 323 00:15:28,148 --> 00:15:31,622 and we try and write computing code 324 00:15:31,622 --> 00:15:34,203 to look at the trends in the monitors 325 00:15:34,203 --> 00:15:35,585 and the trends in the data 326 00:15:35,585 --> 00:15:39,630 to see how that can tell us when a baby's becoming unwell. 327 00:15:39,630 --> 00:15:42,640 - [Voiceover] So Dr. McGregor did what data scientists do, 328 00:15:42,640 --> 00:15:44,511 she looked for the invisible. 329 00:15:44,511 --> 00:15:46,220 She and her team analyzed the data 330 00:15:46,220 --> 00:15:49,113 from thousands of heart beats and what they discovered 331 00:15:49,113 --> 00:15:51,194 were minute fluctuations that could predict 332 00:15:51,194 --> 00:15:53,607 the onset of life-threatening infections 333 00:15:53,607 --> 00:15:56,420 long before physical symptoms appeared. 334 00:15:56,420 --> 00:16:00,046 - When the body first starts dealing with infection, 335 00:16:00,046 --> 00:16:01,800 there are these subtle changes 336 00:16:01,800 --> 00:16:04,938 and that's why we have to watch every single heart beat. 337 00:16:04,938 --> 00:16:07,228 And what we're finding is that when you're starting 338 00:16:07,228 --> 00:16:10,401 to become unwell, the heart's ability to react, 339 00:16:10,401 --> 00:16:14,329 to speed up and slow down, gets subdued. 340 00:16:16,542 --> 00:16:19,425 The human body has always been 341 00:16:19,425 --> 00:16:21,830 exhibiting these certain things. 342 00:16:21,830 --> 00:16:25,842 The difference is we've started to gather 343 00:16:25,842 --> 00:16:28,315 more information about the body now 344 00:16:28,315 --> 00:16:32,360 so that we can build this virtual person. 345 00:16:32,360 --> 00:16:35,915 The better we have the virtual representation, 346 00:16:35,915 --> 00:16:38,368 then the better we can start to understand 347 00:16:38,368 --> 00:16:41,053 what will happen to them in the future. 348 00:16:41,053 --> 00:16:44,968 Back in 1999 I was pregnant with my first child. 349 00:16:44,968 --> 00:16:48,560 She was born premature and she passed away. 350 00:16:49,217 --> 00:16:52,526 There was no other viable outcome for her. 351 00:16:52,526 --> 00:16:56,988 But there are so many others who have just been born early 352 00:16:56,988 --> 00:17:01,626 and they just need that opportunity to grow and develop. 353 00:17:02,585 --> 00:17:06,420 We want to let the computers monitor a baby 354 00:17:06,420 --> 00:17:10,302 as it breathes, as its heart beats, as it sleeps, 355 00:17:10,302 --> 00:17:15,160 so that these algorithms are watching for certain behaviors 356 00:17:15,160 --> 00:17:19,169 and if something starts to go wrong for that baby, 357 00:17:19,169 --> 00:17:22,968 we have the ability to intervene. 358 00:17:25,055 --> 00:17:27,727 If we can just save one life, 359 00:17:27,727 --> 00:17:31,987 then for me personally, it's already worthwhile. 360 00:17:34,654 --> 00:17:38,791 - Everybody understands what it takes to digitize 361 00:17:38,791 --> 00:17:43,626 photography, a movie, a magazine, newspaper, 362 00:17:43,626 --> 00:17:46,508 but they haven't yet grasped what it means 363 00:17:46,508 --> 00:17:50,686 to digitize the medical essence of a human being. 364 00:17:52,273 --> 00:17:56,201 Everything about us now that's medically relevant 365 00:17:56,201 --> 00:17:57,955 can be captured. 366 00:17:57,955 --> 00:18:01,361 With sensors, we can digitize all of our metrics 367 00:18:01,361 --> 00:18:04,510 and with imaging, we can digitize our anatomy 368 00:18:04,510 --> 00:18:06,091 and with our sequence of our DNA, 369 00:18:06,091 --> 00:18:08,851 we can digitize our biology. 370 00:18:10,182 --> 00:18:12,552 - The data story in the genome is the fact that 371 00:18:12,552 --> 00:18:15,655 we have six billion data points sitting in our genomes 372 00:18:15,655 --> 00:18:18,738 that we've never had access to before. 373 00:18:20,696 --> 00:18:22,253 When you sequence a person's genome, 374 00:18:22,253 --> 00:18:24,624 there are known differences in the human genome 375 00:18:24,624 --> 00:18:27,216 that can predict a risk for a disease, 376 00:18:27,216 --> 00:18:29,134 or that you're a carrier for a disease, 377 00:18:29,134 --> 00:18:31,144 or that you have a certain ancestry. 378 00:18:31,144 --> 00:18:33,352 There's a lot of information packed in the genome 379 00:18:33,352 --> 00:18:36,112 that we're starting to learn more and more about. 380 00:18:38,326 --> 00:18:41,499 Getting your own personal information through your genome 381 00:18:41,499 --> 00:18:43,544 would not have been possible 382 00:18:43,544 --> 00:18:45,952 even 10 years ago because of cost. 383 00:18:45,952 --> 00:18:47,923 The technologies that have enabled this 384 00:18:47,923 --> 00:18:50,178 have dropped precipitously and now we're able to 385 00:18:50,178 --> 00:18:55,012 get a really good look at your genome for under $500. 386 00:18:55,012 --> 00:18:58,522 - And when it becomes 100 bucks or 10 bucks, 387 00:18:58,522 --> 00:19:02,166 we're going to have everyone's genome as data. 388 00:19:04,658 --> 00:19:06,332 - The results came back on Tuesday, 389 00:19:06,332 --> 00:19:08,714 it was October 2nd, 1996. 390 00:19:08,714 --> 00:19:11,643 I was diagnosed that day with breast cancer. 391 00:19:11,643 --> 00:19:14,106 A year out of treatment, I found a lump on the other breast 392 00:19:14,106 --> 00:19:17,151 in the exact same position and I went in 393 00:19:17,151 --> 00:19:20,238 and they told me that I had breast cancer again. 394 00:19:21,370 --> 00:19:23,996 Sedona's known about me being tested for the BRCA gene, 395 00:19:23,996 --> 00:19:25,531 she's known my sister has tested, 396 00:19:25,531 --> 00:19:26,925 she knows my other sister tested 397 00:19:26,925 --> 00:19:29,180 and was negative for the gene mutation 398 00:19:29,180 --> 00:19:32,002 and so she actually told me, "When I'm 18, I want to test, 399 00:19:32,002 --> 00:19:34,639 "you know, and see if I have this gene mutation or not." 400 00:19:34,639 --> 00:19:39,185 I am gonna be completely distraught 401 00:19:39,185 --> 00:19:42,368 if I hand this gene down to my kid. 402 00:19:42,368 --> 00:19:44,785 - Do you know what your chances are of having the mutation 403 00:19:44,785 --> 00:19:45,947 that your mom has? 404 00:19:45,947 --> 00:19:47,085 - I'd say 50/50. 405 00:19:47,085 --> 00:19:48,841 - You're exactly right. 406 00:19:48,841 --> 00:19:51,130 BRCA2 is a gene that we all have, 407 00:19:51,130 --> 00:19:52,978 it's called tumor suppressor gene, 408 00:19:52,978 --> 00:19:55,128 but women, if you have a mutation in the gene 409 00:19:55,128 --> 00:19:57,766 it causes the gene not to function like it should. 410 00:19:57,766 --> 00:20:01,392 So the risk mainly of breast and ovarian cancer 411 00:20:01,392 --> 00:20:04,076 is a lot higher than in the general population. 412 00:20:04,076 --> 00:20:06,575 - An average woman would have a 12% risk 413 00:20:06,575 --> 00:20:08,214 of getting breast cancer in a lifetime 414 00:20:08,214 --> 00:20:09,551 and most women aren't going out there, 415 00:20:09,551 --> 00:20:11,387 getting preventive mastectomies, 416 00:20:11,387 --> 00:20:13,600 but when you're faced with an 87% risk 417 00:20:13,600 --> 00:20:16,308 of getting breast cancer in your lifetime, 418 00:20:16,308 --> 00:20:20,486 it kind of makes that a possible choice. 419 00:20:23,316 --> 00:20:26,070 - [Voiceover] You'll need to swish this mouth wash 420 00:20:26,070 --> 00:20:28,040 for 30 seconds. 421 00:20:28,708 --> 00:20:30,510 - We are definitely moving into a world 422 00:20:30,510 --> 00:20:33,334 where the patient or the person is at the center of things 423 00:20:33,334 --> 00:20:36,350 and hopefully also at the controls. 424 00:20:36,971 --> 00:20:38,691 People will have access to the data 425 00:20:38,691 --> 00:20:43,235 that is informative around the type of disease they have 426 00:20:43,235 --> 00:20:45,827 and that data then can point much more directly 427 00:20:45,827 --> 00:20:47,953 to proper treatments, 428 00:20:47,953 --> 00:20:49,627 but the data can also say that a treatment 429 00:20:49,627 --> 00:20:51,963 works for a person or it doesn't work for a person 430 00:20:51,963 --> 00:20:53,671 based on their genetic profile 431 00:20:53,671 --> 00:20:55,345 and we're gonna start moving more and more 432 00:20:55,345 --> 00:20:57,518 into this notion of personalized medicine 433 00:20:57,518 --> 00:20:59,063 as we learn more about the genome 434 00:20:59,063 --> 00:21:01,236 and the study of pharmacogenetics, 435 00:21:01,236 --> 00:21:05,037 which is how do our genes influence the drugs we take. 436 00:21:05,037 --> 00:21:07,408 Ultimately, instead of treating disease, 437 00:21:07,408 --> 00:21:09,116 is there data that could really help us 438 00:21:09,116 --> 00:21:12,673 move away from contracting these illnesses to begin with 439 00:21:12,673 --> 00:21:15,723 and go more toward a preventive model? 440 00:21:15,723 --> 00:21:20,228 (mellow music) 441 00:21:20,228 --> 00:21:25,202 - Now you can't talk about information separate from health. 442 00:21:25,202 --> 00:21:26,484 How you feel is information, 443 00:21:26,484 --> 00:21:28,200 how you respond to a drug is information, 444 00:21:28,200 --> 00:21:30,037 your genetic code is information. 445 00:21:30,037 --> 00:21:32,001 What's really happening is when we start collecting it, 446 00:21:32,001 --> 00:21:32,919 we're going to start seeing it 447 00:21:32,919 --> 00:21:34,964 and we're going to start interpreting it. 448 00:21:35,807 --> 00:21:38,096 We're beginning the age of collecting information 449 00:21:38,096 --> 00:21:40,398 from sensors that are cheap and ubiquitous 450 00:21:40,398 --> 00:21:42,513 that we can process continuously 451 00:21:42,513 --> 00:21:45,159 and we can actually start knowing things. 452 00:21:45,159 --> 00:21:47,239 - If we monitored our health throughout the day, 453 00:21:47,239 --> 00:21:50,737 continuously every second, what would that really enable? 454 00:21:50,737 --> 00:21:53,422 - And there's now a lot of really great technology 455 00:21:53,422 --> 00:21:57,129 coming out around this sense of tracking and monitoring 456 00:21:57,129 --> 00:22:00,058 and we have all kinds of sensor companies and devices. 457 00:22:00,058 --> 00:22:01,859 - We're actually collecting a lot of physiological 458 00:22:01,859 --> 00:22:04,265 information, you know, heart rate, breathing, 459 00:22:04,265 --> 00:22:07,324 in real-time, you know, every minute, every second. 460 00:22:08,992 --> 00:22:11,491 - [Linda] People wanting to measure their daily activities 461 00:22:11,491 --> 00:22:13,571 and being able to track your own sleep, 462 00:22:13,571 --> 00:22:16,581 being able to watch and monitor your own food uptake, 463 00:22:16,581 --> 00:22:18,717 being able to track your own movement. 464 00:22:18,717 --> 00:22:20,169 - It's almost like looking down at our lives 465 00:22:20,169 --> 00:22:21,518 from 30,000 feet. 466 00:22:21,518 --> 00:22:23,272 There's a company right now in Boston 467 00:22:23,272 --> 00:22:25,527 that can actually predict that you're going to get depressed 468 00:22:25,527 --> 00:22:27,562 two days before you get depressed 469 00:22:27,562 --> 00:22:29,027 and the gentleman who created it said 470 00:22:29,027 --> 00:22:31,000 if you actually watch any one of us, 471 00:22:31,000 --> 00:22:34,208 most people have a very discernible pattern of behavior. 472 00:22:34,208 --> 00:22:37,253 And for the first week, our software basically determines 473 00:22:37,253 --> 00:22:39,008 what your normal pattern is 474 00:22:39,008 --> 00:22:40,554 and then two days before you're showing 475 00:22:40,554 --> 00:22:42,727 any outward signs of depression, 476 00:22:42,727 --> 00:22:44,610 the amount of Tweets and emails that you're sending 477 00:22:44,610 --> 00:22:47,154 go down, your radius of travel starts shrinking, 478 00:22:47,154 --> 00:22:49,153 the amount of time that you spend at home goes up. 479 00:22:49,153 --> 00:22:52,151 - You can look to see if how you exercise 480 00:22:52,151 --> 00:22:54,081 changes your social behavior, 481 00:22:54,081 --> 00:22:56,173 if what you eat changes how you sleep 482 00:22:56,173 --> 00:23:00,008 and how that impacts your medical claims. 483 00:23:00,008 --> 00:23:01,972 - All kinds of data and information 484 00:23:01,972 --> 00:23:05,063 are sitting inside the world you do every day. 485 00:23:05,063 --> 00:23:06,528 - Now, with all these devices, 486 00:23:06,528 --> 00:23:10,270 we have real-time information, real-time understanding. 487 00:23:10,270 --> 00:23:11,327 - Now that might sound interesting, 488 00:23:11,327 --> 00:23:13,617 might help you shed a few pounds, 489 00:23:13,617 --> 00:23:15,255 realize you're eating too many potato chips 490 00:23:15,255 --> 00:23:16,755 and sitting around too much perhaps 491 00:23:16,755 --> 00:23:19,009 and that's useful to you individually, 492 00:23:19,009 --> 00:23:23,146 but if hundreds of millions of people do that, 493 00:23:23,146 --> 00:23:26,145 you have a big cloud of data 494 00:23:26,145 --> 00:23:29,236 about people's behavior that can be crawled through 495 00:23:29,236 --> 00:23:31,956 by pattern recognition algorithm. 496 00:23:33,204 --> 00:23:35,622 And doctors and health policy officials 497 00:23:35,622 --> 00:23:38,213 can start to see patterns that change the way, 498 00:23:38,213 --> 00:23:40,677 collectively as a society, we understand 499 00:23:40,677 --> 00:23:44,129 not just our health, but every single area 500 00:23:44,129 --> 00:23:46,723 where data can be applied 501 00:23:46,723 --> 00:23:49,323 because we start to understand how we might, 502 00:23:49,323 --> 00:23:53,107 collectively as a culture, change our behavior. 503 00:23:56,657 --> 00:23:58,586 - And if you look at the future of this, 504 00:23:58,586 --> 00:24:02,642 we're gonna be embedded in a sea of information services 505 00:24:02,642 --> 00:24:07,360 that are connected to massive databases in the cloud. 506 00:24:07,360 --> 00:24:11,111 (rhythmic electronic music) 507 00:24:11,111 --> 00:24:12,611 - If you take a look at everything that you touch 508 00:24:12,611 --> 00:24:15,039 in everyday life, the majority of these things 509 00:24:15,039 --> 00:24:18,246 were invented many, many, many, many, many years ago 510 00:24:18,246 --> 00:24:20,385 and they're ripe for reinvention 511 00:24:20,385 --> 00:24:22,594 and when they get reinvented, 512 00:24:22,594 --> 00:24:23,848 they're gonna be connected, 513 00:24:23,848 --> 00:24:26,184 they're gonna be connected in some way 514 00:24:26,184 --> 00:24:30,031 that data that comes off of these devices that you touch 515 00:24:30,031 --> 00:24:32,855 is gonna be collected and stored in a central location 516 00:24:32,855 --> 00:24:36,457 and people are gonna run big data algorithms on this data 517 00:24:36,457 --> 00:24:37,911 and then you're gonna get the feedback 518 00:24:37,911 --> 00:24:41,043 of the collective whole rather than the individual. 519 00:24:42,931 --> 00:24:44,383 - So it's taking people who are already out there, 520 00:24:44,383 --> 00:24:45,895 who already have these devices, 521 00:24:45,895 --> 00:24:48,358 and turning all these people into contributors 522 00:24:48,358 --> 00:24:51,281 of information back to the system. 523 00:24:52,949 --> 00:24:56,487 You become one of the nodes on the network. 524 00:24:57,586 --> 00:24:59,539 I think the Internet, as wondrous as it's been 525 00:24:59,539 --> 00:25:01,666 over the last 20 years, was like a layer 526 00:25:01,666 --> 00:25:03,920 that needed to be in place for all these sensors 527 00:25:03,920 --> 00:25:06,764 and devices to be able to communicate with each other. 528 00:25:06,764 --> 00:25:09,181 - You know, we're building this global brain 529 00:25:09,181 --> 00:25:12,854 that has these new functions and we're accessing them 530 00:25:12,854 --> 00:25:14,899 primarily now through our mobile devices, 531 00:25:14,899 --> 00:25:17,409 or obviously also on our desktops, 532 00:25:17,409 --> 00:25:19,117 but increasingly mobile. 533 00:25:19,117 --> 00:25:22,535 - I think this data revolution has a strange impact really 534 00:25:22,535 --> 00:25:26,009 of people feeling like there's somebody listening to them 535 00:25:26,009 --> 00:25:29,728 and that could mean listening in the sense of Big Brother, 536 00:25:29,728 --> 00:25:31,355 someone's listening in, 537 00:25:31,355 --> 00:25:34,492 or it could be someone's really hearing me. 538 00:25:34,492 --> 00:25:37,189 This device in my hand knows who I am, 539 00:25:37,189 --> 00:25:40,746 it can somewhat anticipate what I want 540 00:25:40,746 --> 00:25:44,162 or where I'm going and react to that. 541 00:25:45,748 --> 00:25:48,886 The implications of that are huge 542 00:25:48,886 --> 00:25:50,350 for the decisions that we make 543 00:25:50,350 --> 00:25:52,902 and for the systems that we're part of. 544 00:25:55,486 --> 00:25:57,742 I think about living in a city 545 00:25:57,742 --> 00:26:00,252 and how you're experience of living in that city 546 00:26:00,252 --> 00:26:02,413 would be, in 10 or 15 years. 547 00:26:02,413 --> 00:26:03,782 You've got places like Chicago 548 00:26:03,782 --> 00:26:05,258 where they're being hugely innovative 549 00:26:05,258 --> 00:26:07,431 and they're taking massive data sets, 550 00:26:07,431 --> 00:26:09,128 combining them in interesting ways, 551 00:26:09,128 --> 00:26:10,895 running interesting algorithms on them 552 00:26:10,895 --> 00:26:13,521 and figuring out ways that they can intervene 553 00:26:13,521 --> 00:26:15,695 in this system to sort of see patterns 554 00:26:15,695 --> 00:26:18,362 and be able to react to those patterns. 555 00:26:19,030 --> 00:26:23,295 When you take in data, it affects you as an individual 556 00:26:23,295 --> 00:26:24,783 and then you affect the system 557 00:26:24,783 --> 00:26:26,421 and that affects the data again 558 00:26:26,421 --> 00:26:29,722 and this round trip that you start to see yourself part of 559 00:26:29,722 --> 00:26:33,309 makes me understand that I'm an actor in a larger system. 560 00:26:33,309 --> 00:26:35,564 For instance, if you know by looking at the data, 561 00:26:35,564 --> 00:26:37,703 and you have to put different data sets together 562 00:26:37,703 --> 00:26:40,480 to be able to see this, that some of the street lights, 563 00:26:40,480 --> 00:26:43,165 you know, when they go out, they cause higher crime 564 00:26:43,165 --> 00:26:44,966 in that particular block, 565 00:26:44,966 --> 00:26:46,419 (siren blares) 566 00:26:46,419 --> 00:26:49,173 you start to see ways that if you can query that data 567 00:26:49,173 --> 00:26:51,765 in intelligent ways, that you can prioritize 568 00:26:51,765 --> 00:26:54,275 the limited resources that you have in a city 569 00:26:54,275 --> 00:26:56,774 to take care of the things that have, you know, 570 00:26:56,774 --> 00:26:59,903 follow along effects and follow along costs. 571 00:27:00,408 --> 00:27:02,202 - In the end, you know, you're going to hope that 572 00:27:02,202 --> 00:27:05,421 this is just our reaction as a species 573 00:27:05,421 --> 00:27:07,386 to this scale problem, right, 574 00:27:07,386 --> 00:27:08,932 how do you get another, you know, 575 00:27:08,932 --> 00:27:11,473 two billion people on the planet? 576 00:27:11,473 --> 00:27:13,600 You can't do it unless you start instrumenting 577 00:27:13,600 --> 00:27:16,111 every little thing and dialing it in just right. 578 00:27:16,111 --> 00:27:18,284 - And you know, right now you wait for the bus 579 00:27:18,284 --> 00:27:20,945 because the bus is coming on a particular schedule 580 00:27:20,945 --> 00:27:23,118 and it's great, we're now at the point where 581 00:27:23,118 --> 00:27:26,128 your phone will tell you when the bus is really coming, 582 00:27:26,128 --> 00:27:28,720 not just when the bus is scheduled to come. 583 00:27:29,631 --> 00:27:31,724 You know, take that a little bit forward. 584 00:27:31,724 --> 00:27:33,060 What about when there's more use 585 00:27:33,060 --> 00:27:34,896 on one line than the other? 586 00:27:34,896 --> 00:27:36,650 Well instead of sticking with the schedule, 587 00:27:36,650 --> 00:27:40,033 does the system start to understand 588 00:27:40,033 --> 00:27:44,576 that maybe this route doesn't need 10 buses today 589 00:27:44,576 --> 00:27:46,751 and automatically shift those resources 590 00:27:46,751 --> 00:27:50,210 over to the lines where the buses are full. 591 00:27:50,210 --> 00:27:53,557 - Boston just created a new smartphone app 592 00:27:53,557 --> 00:27:57,229 which uses the accelerometer in your phone. 593 00:27:57,229 --> 00:27:59,646 So if you're driving through the streets of south Boston 594 00:27:59,646 --> 00:28:03,122 and all of a sudden there's a big dip in the street, 595 00:28:03,122 --> 00:28:05,830 the phone realizes it. 596 00:28:05,830 --> 00:28:07,584 So anybody in the city of Boston 597 00:28:07,584 --> 00:28:09,339 that has this up and running 598 00:28:09,339 --> 00:28:12,012 is feeding real-time data on the quality of the roads 599 00:28:12,012 --> 00:28:13,639 to the city of Boston. 600 00:28:13,639 --> 00:28:15,394 - Then you start to feel that your city 601 00:28:15,394 --> 00:28:17,311 is sort of a responsive organism 602 00:28:17,311 --> 00:28:21,398 just like your body puts your blood where it needs it. 603 00:28:22,657 --> 00:28:26,040 Think about ways that we could live in cities 604 00:28:26,040 --> 00:28:29,050 when they're that responsive to our needs 605 00:28:29,050 --> 00:28:31,304 and think about the implications of that for the planet 606 00:28:31,304 --> 00:28:33,640 because really cities are also really 607 00:28:33,640 --> 00:28:36,973 how we're going to survive the 21st century. 608 00:28:36,973 --> 00:28:39,645 You can live in a city with a far smaller footprint 609 00:28:39,645 --> 00:28:42,015 than anywhere else in the world 610 00:28:42,015 --> 00:28:45,619 and I think data and sort of the responsive systems 611 00:28:45,619 --> 00:28:48,414 will play an enormous role in that. 612 00:28:51,093 --> 00:28:53,010 - I think one of the most exciting things about data 613 00:28:53,010 --> 00:28:56,765 is that, you know, it's giving us extra senses, 614 00:28:56,765 --> 00:28:58,391 it's expanding upon, you know, 615 00:28:58,391 --> 00:29:01,529 our ability to perceive the world 616 00:29:01,529 --> 00:29:03,993 and it actually ends up giving us the opportunity 617 00:29:03,993 --> 00:29:06,201 to make things tangible again 618 00:29:06,201 --> 00:29:08,084 and to actually get a perspective on ourselves, 619 00:29:08,084 --> 00:29:11,506 both as individuals and also as society. 620 00:29:13,510 --> 00:29:16,847 - And there's always that moment in data visualization 621 00:29:16,847 --> 00:29:18,438 when you're looking at, you know, 622 00:29:18,438 --> 00:29:20,193 tons and tons and tons of data. 623 00:29:20,193 --> 00:29:22,610 The point is not to look at the tons and tons 624 00:29:22,610 --> 00:29:25,283 and tons of data, but what are the stories 625 00:29:25,283 --> 00:29:27,451 that emerge out of it. 626 00:29:28,956 --> 00:29:31,002 - If you said look, give me the home street address 627 00:29:31,002 --> 00:29:35,429 of everyone who entered New York State prison last year 628 00:29:35,429 --> 00:29:37,300 and the home street address of everyone 629 00:29:37,300 --> 00:29:39,438 who left New York State prison last year 630 00:29:39,438 --> 00:29:42,146 and we said look, let's get the numbers, put it on a map 631 00:29:42,146 --> 00:29:44,154 and actually show it to people. 632 00:29:44,154 --> 00:29:47,251 And when we first produced our Brooklyn map, 633 00:29:47,251 --> 00:29:49,082 which was the first one we did, 634 00:29:49,082 --> 00:29:51,673 they hit the floor, not because nobody knew this. 635 00:29:51,673 --> 00:29:53,126 You know, everyone knew anecdotally 636 00:29:53,126 --> 00:29:57,472 how concentrated the effect of incarceration was, 637 00:29:57,472 --> 00:30:00,308 but no one had actually seen it based on actual data. 638 00:30:00,308 --> 00:30:04,318 We started to show these remarkably intensive 639 00:30:04,318 --> 00:30:06,986 concentrations of people going in and out of prison, 640 00:30:06,986 --> 00:30:09,125 highly disproportionately located 641 00:30:09,125 --> 00:30:12,443 in very small areas around the city. 642 00:30:16,214 --> 00:30:19,003 - [Voiceover] And what we found is that the home addresses 643 00:30:19,003 --> 00:30:22,268 of incarcerated people correlates very highly 644 00:30:22,268 --> 00:30:25,819 with poverty and with people of color. 645 00:30:28,940 --> 00:30:31,415 - You have a justice system, which by all accounts 646 00:30:31,415 --> 00:30:32,822 is supposed to be essentially based on 647 00:30:32,822 --> 00:30:37,179 a case-by-case, individual decision of justice. 648 00:30:37,179 --> 00:30:39,015 Well when you looked at the map over time, 649 00:30:39,015 --> 00:30:43,315 what you really were seeing was this mass population 650 00:30:43,315 --> 00:30:48,162 movement out and mass population resettlement back, 651 00:30:48,162 --> 00:30:50,564 this cyclical movement of people. 652 00:30:51,276 --> 00:30:52,952 - So once we had mapped the data, 653 00:30:52,952 --> 00:30:55,334 we quantified it in terms of how much it cost 654 00:30:55,334 --> 00:30:58,132 to house those same people in prison. 655 00:30:58,132 --> 00:30:59,050 - And that's where we started to think 656 00:30:59,050 --> 00:31:01,560 about million dollar blocks. 657 00:31:01,560 --> 00:31:06,395 We found over 35 individual city blocks in Brooklyn alone 658 00:31:06,395 --> 00:31:08,654 for which the state was spending 659 00:31:08,654 --> 00:31:11,072 more than a million dollars every year 660 00:31:11,072 --> 00:31:14,042 to remove and return people to prison. 661 00:31:16,663 --> 00:31:18,882 We needed to reframe that conversation 662 00:31:18,882 --> 00:31:21,682 and what immediately emerged out of this was 663 00:31:21,682 --> 00:31:23,937 this idea of justice reinvestment. 664 00:31:23,937 --> 00:31:25,819 We weren't building anything in those places 665 00:31:25,819 --> 00:31:27,889 for those dollars. 666 00:31:27,889 --> 00:31:30,329 How can we demand sort of more equity 667 00:31:30,329 --> 00:31:31,991 for that investment 668 00:31:31,991 --> 00:31:33,873 to extract those neighborhoods 669 00:31:33,873 --> 00:31:37,801 from what decades of criminalization has done? 670 00:31:37,801 --> 00:31:40,788 And that shift had to come from the data 671 00:31:40,788 --> 00:31:43,961 and a new way of thinking about information. 672 00:31:46,314 --> 00:31:48,900 These maps did that. 673 00:31:52,450 --> 00:31:54,612 - The amount of data that now is being collected 674 00:31:54,612 --> 00:31:59,086 about those areas that are stuck in cycles of poverty, 675 00:31:59,086 --> 00:32:02,549 cycles of famine, cycles of war, 676 00:32:02,549 --> 00:32:05,978 gives people or governments and NGOs 677 00:32:05,978 --> 00:32:09,349 an opportunity to do good. 678 00:32:09,349 --> 00:32:12,602 Understanding on the ground, information on the ground, 679 00:32:12,602 --> 00:32:15,124 data on the ground can change the way 680 00:32:15,124 --> 00:32:18,158 people apply resources 681 00:32:18,158 --> 00:32:21,255 which are intended to try to help. 682 00:32:22,596 --> 00:32:24,015 - We really fundamentally believe 683 00:32:24,015 --> 00:32:25,897 that data has intrinsic value 684 00:32:25,897 --> 00:32:27,559 and we also fundamentally believe 685 00:32:27,559 --> 00:32:30,488 that the individuals who create that data 686 00:32:30,488 --> 00:32:33,742 should be able to benefit from that data. 687 00:32:34,863 --> 00:32:36,669 But we're working with one of the big mobile phone 688 00:32:36,669 --> 00:32:39,714 operators in Kenya, we're looking at the dynamics 689 00:32:39,714 --> 00:32:42,550 of these mobile phone subscribers. 690 00:32:42,550 --> 00:32:44,886 Millions of phones in Kenya. 691 00:32:46,101 --> 00:32:47,355 We're looking at how the population 692 00:32:47,355 --> 00:32:49,813 was moving over the country. 693 00:32:50,772 --> 00:32:53,201 And we're overlaying that movement data 694 00:32:53,201 --> 00:32:56,397 with data about parasite prevalence 695 00:32:56,397 --> 00:32:59,622 from household surveys and data from hospitals. 696 00:33:02,545 --> 00:33:05,346 We can start identifying these malaria hot spots, 697 00:33:05,346 --> 00:33:09,016 regions within Kenya that desperately needed 698 00:33:09,016 --> 00:33:11,311 the eradication dollars. 699 00:33:13,769 --> 00:33:15,860 It's fascinating to start extracting models 700 00:33:15,860 --> 00:33:17,418 and plotting graphs of the behavior 701 00:33:17,418 --> 00:33:19,660 of tens of millions of people in Kenya, 702 00:33:19,660 --> 00:33:22,508 but it's meaningful when you can make those insights count, 703 00:33:22,508 --> 00:33:25,007 when you can take the insights that you've gleaned 704 00:33:25,007 --> 00:33:26,807 and put them into practice 705 00:33:26,807 --> 00:33:29,771 and measure what the impact was 706 00:33:29,771 --> 00:33:32,108 and hopefully making the lives of the people 707 00:33:32,108 --> 00:33:34,186 who are generating this data better. 708 00:33:34,186 --> 00:33:37,163 (children yelling) 709 00:33:37,163 --> 00:33:41,544 (siren blaring) 710 00:33:41,544 --> 00:33:45,344 - That afternoon when the earthquake struck in January, 711 00:33:45,344 --> 00:33:48,645 I was watching CNN and saw the breaking news 712 00:33:48,645 --> 00:33:52,317 and I had taken my wife in Port-au-Prince at the time 713 00:33:52,317 --> 00:33:54,154 and for the better part of 12 hours 714 00:33:54,154 --> 00:33:56,362 had no idea whether any one of my friends 715 00:33:56,362 --> 00:33:58,450 were alive or dead. 716 00:33:58,450 --> 00:34:01,495 - [Voiceover] Meier was a Tufts University PhD student 717 00:34:01,495 --> 00:34:04,040 and directed crisis mapping for Ushahidi, 718 00:34:04,040 --> 00:34:06,260 a nonprofit that collects, visualizes, 719 00:34:06,260 --> 00:34:08,375 and then maps crisis data. 720 00:34:08,375 --> 00:34:10,095 - And so I went on social media 721 00:34:10,095 --> 00:34:12,315 and I found dozens and dozens of Haitians 722 00:34:12,315 --> 00:34:15,522 tweeting live about the damage 723 00:34:15,522 --> 00:34:17,486 and a lot of the time they were sharing 724 00:34:17,486 --> 00:34:19,276 where this damage was happening. 725 00:34:19,276 --> 00:34:22,286 So they would say the church on the corner of X and Y 726 00:34:22,286 --> 00:34:25,261 has been destroyed or is collapsed 727 00:34:25,261 --> 00:34:27,423 and they would refer to street names and so on. 728 00:34:27,423 --> 00:34:29,933 So it's about really becoming a digital detector 729 00:34:29,933 --> 00:34:33,637 and then trying to understand where on the map this was. 730 00:34:33,637 --> 00:34:35,148 - [Voiceover] So he called everyone he knew 731 00:34:35,148 --> 00:34:37,937 and put together a mostly volunteer team in Boston 732 00:34:37,937 --> 00:34:40,575 to prioritize the most life and death tweets 733 00:34:40,575 --> 00:34:42,876 and map them for rescue workers. 734 00:34:42,876 --> 00:34:45,967 - For the first time, it wasn't the government 735 00:34:45,967 --> 00:34:47,838 emergency management organization 736 00:34:47,838 --> 00:34:50,011 that had the best data of what was happening, 737 00:34:50,011 --> 00:34:53,301 but it was legions of volunteers that came together 738 00:34:53,301 --> 00:34:55,346 and crowdmapped the location 739 00:34:55,346 --> 00:34:57,101 of buildings that had collapsed, 740 00:34:57,101 --> 00:34:58,948 people that were trapped in rubble, 741 00:34:58,948 --> 00:35:00,738 locations where water was needed, 742 00:35:00,738 --> 00:35:03,867 where physicians were needed and the like. 743 00:35:04,500 --> 00:35:06,673 - I think we've seen, not only in Haiti 744 00:35:06,673 --> 00:35:08,556 but almost every disaster since Haiti, 745 00:35:08,556 --> 00:35:13,065 just an explosion of social media content. 746 00:35:13,065 --> 00:35:14,727 - [Voiceover] Disaster mapping groups like Meier's 747 00:35:14,727 --> 00:35:16,610 realized that there was so much at stake 748 00:35:16,610 --> 00:35:19,015 and so much raw data coming from social media 749 00:35:19,015 --> 00:35:20,619 during natural disasters. 750 00:35:20,619 --> 00:35:22,328 They needed to come up with new algorithms 751 00:35:22,328 --> 00:35:24,536 to sort through the flood of information. 752 00:35:24,536 --> 00:35:28,383 - We are drawing on artificial intelligence, 753 00:35:28,383 --> 00:35:31,090 machine learning, working with data scientists 754 00:35:31,090 --> 00:35:34,182 to develop semi-automated ways 755 00:35:34,182 --> 00:35:38,063 to extract relevant, informative and actionable information 756 00:35:38,063 --> 00:35:40,156 from social media during disasters. 757 00:35:40,156 --> 00:35:41,329 So one of our projects is called 758 00:35:41,329 --> 00:35:44,455 Artificial Intelligence for Disaster Response. 759 00:35:46,332 --> 00:35:47,958 During the Hurricane Sandy, 760 00:35:47,958 --> 00:35:51,971 we collected five million tweets during the first few days. 761 00:35:52,593 --> 00:35:55,858 With the Sandy data, we've been able to show empirically 762 00:35:55,858 --> 00:35:58,485 that we can automatically identify whether or not 763 00:35:58,485 --> 00:36:02,622 a tweet has been written by an eye witness. 764 00:36:02,622 --> 00:36:04,074 So somebody who is writing something 765 00:36:04,074 --> 00:36:06,620 saying the bridge is down, 766 00:36:06,620 --> 00:36:08,840 we can say with a degree of accuracy 767 00:36:08,840 --> 00:36:11,304 of about 80% and higher whether that tweet 768 00:36:11,304 --> 00:36:13,012 has actually been posted by an eye witness, 769 00:36:13,012 --> 00:36:16,214 which is really important for disaster response. 770 00:36:18,230 --> 00:36:20,729 I think that goes to the heart of why 771 00:36:20,729 --> 00:36:23,367 something like social media and Twitter is so important. 772 00:36:23,367 --> 00:36:26,540 Having these millions of eyes and ears on the ground. 773 00:36:26,540 --> 00:36:28,341 It's about empowering the crowd, 774 00:36:28,341 --> 00:36:30,002 it's about empowering those who are effected 775 00:36:30,002 --> 00:36:32,134 and those who want to help. 776 00:36:32,134 --> 00:36:34,040 These are real lives that we're capturing. 777 00:36:34,040 --> 00:36:36,481 This is not abstract information. 778 00:36:36,481 --> 00:36:39,398 These are real people who are affected by disasters 779 00:36:39,398 --> 00:36:41,815 who are trying to either help or seek help. 780 00:36:41,815 --> 00:36:44,321 It doesn't get more real than this. 781 00:36:48,788 --> 00:36:51,170 - Today, technology allows, 782 00:36:51,170 --> 00:36:53,624 in a lot of our communication tools, 783 00:36:53,624 --> 00:36:56,052 allows an idea to be spread instantly 784 00:36:56,052 --> 00:37:00,023 and with the original source of truth. 785 00:37:00,023 --> 00:37:02,894 I can have an idea and I can decide that 786 00:37:02,894 --> 00:37:04,324 I want to bring this around the world 787 00:37:04,324 --> 00:37:07,999 and I can do it almost instantaneously. 788 00:37:09,795 --> 00:37:11,666 - Tunisia's a great example. 789 00:37:11,666 --> 00:37:15,175 There were little uprisings happening all over Tunisia 790 00:37:15,175 --> 00:37:17,431 and each one was brutally squashed 791 00:37:17,431 --> 00:37:19,650 and there was no media attention 792 00:37:19,650 --> 00:37:24,276 so no one knew that any other little village had an issue. 793 00:37:24,276 --> 00:37:27,157 But what happened was in one village 794 00:37:27,157 --> 00:37:30,214 there was the man who self-immolated in protest 795 00:37:30,214 --> 00:37:33,165 and the images were put online 796 00:37:33,165 --> 00:37:37,977 by a distant group onto Facebook 797 00:37:37,977 --> 00:37:39,883 and then Al Jazeera picked it up 798 00:37:39,883 --> 00:37:42,696 and broadcasted the image across their region 799 00:37:42,696 --> 00:37:44,857 and then all of Tunisia realized 800 00:37:44,857 --> 00:37:47,287 wait a second, we're about to have an uprising 801 00:37:47,287 --> 00:37:48,449 and it just went. 802 00:37:48,449 --> 00:37:53,454 (yelling) 803 00:37:55,095 --> 00:37:58,513 So Tunisia was really activists on the ground, 804 00:37:58,513 --> 00:38:02,395 social media and mainstream media working together, 805 00:38:02,395 --> 00:38:05,486 spreading across Tunisia this idea that 806 00:38:05,486 --> 00:38:07,031 you're not the only ones 807 00:38:07,031 --> 00:38:10,745 and it gave everyone the courage to do the uprising. 808 00:38:12,412 --> 00:38:14,713 Technology has fundamentally changed 809 00:38:14,713 --> 00:38:17,167 the way people interact with government. 810 00:38:17,167 --> 00:38:19,432 That's another layer of the stack 811 00:38:19,432 --> 00:38:21,012 that's sort of being opened up. 812 00:38:21,012 --> 00:38:23,104 I think that's one of the key challenges that big data 813 00:38:23,104 --> 00:38:26,067 has so much opportunity for both good 814 00:38:26,067 --> 00:38:28,287 and for also really screwing up our system. 815 00:38:28,287 --> 00:38:30,414 - You can't talk about data without talking about people 816 00:38:30,414 --> 00:38:31,959 because people create the data 817 00:38:31,959 --> 00:38:33,923 and people utilize the data. 818 00:38:33,923 --> 00:38:38,171 (whirring) 819 00:38:44,360 --> 00:38:47,265 - So a handful of years ago there's a guy named Andrew Pole 820 00:38:47,265 --> 00:38:50,077 who is a statistician who gets hired by Target. 821 00:38:50,077 --> 00:38:51,496 He's sitting at his desk and some guys 822 00:38:51,496 --> 00:38:53,076 from the marketing department come by and they say, 823 00:38:53,076 --> 00:38:55,505 "Look, if we wanted to figure out 824 00:38:55,505 --> 00:38:58,003 "which of our customers are pregnant, 825 00:38:58,003 --> 00:39:00,049 "could you tell us that?" 826 00:39:00,049 --> 00:39:01,722 So what Andrew Pole started doing is he said 827 00:39:01,722 --> 00:39:05,569 the women who had signed up for the baby registry, 828 00:39:05,569 --> 00:39:07,477 let's track what they're buying 829 00:39:07,477 --> 00:39:09,602 and see if there's any patterns. 830 00:39:09,602 --> 00:39:11,531 I mean, obviously if someone starts buying a crib 831 00:39:11,531 --> 00:39:13,332 or a stroller, you know they're pregnant. 832 00:39:13,332 --> 00:39:15,622 But by using all of this data they had collected, 833 00:39:15,622 --> 00:39:18,388 they were able to start seeing these patterns 834 00:39:18,388 --> 00:39:21,331 that you couldn't actually guess at. 835 00:39:22,394 --> 00:39:25,526 When women were in their second trimester, 836 00:39:25,526 --> 00:39:28,524 they suddenly stopped buying scented lotion 837 00:39:28,524 --> 00:39:30,697 and started buying unscented lotion 838 00:39:30,697 --> 00:39:32,777 and about at the end of their second trimester, 839 00:39:32,777 --> 00:39:35,078 the beginning of their third trimester, they would start 840 00:39:35,078 --> 00:39:38,887 buying a lot of cotton balls and wash cloths. 841 00:39:38,887 --> 00:39:42,850 - And then they could start to subtly send you coupons 842 00:39:42,850 --> 00:39:45,901 for things that might be related to your pregnancy. 843 00:39:46,720 --> 00:39:48,114 - The decided to do a little test case. 844 00:39:48,114 --> 00:39:50,648 So they send out some of these ads to a local community 845 00:39:50,648 --> 00:39:52,787 and a couple weeks later this father comes in 846 00:39:52,787 --> 00:39:55,704 to one of the stores and he's furious 847 00:39:55,704 --> 00:39:58,923 and he's got a flyer in his hand that was sent to his house 848 00:39:58,923 --> 00:40:02,049 and he finds the manager and he says to the manager, 849 00:40:02,049 --> 00:40:03,932 he says, "Look, I'm so upset. 850 00:40:03,932 --> 00:40:07,313 "You know, my daughter is 18 years old. 851 00:40:07,313 --> 00:40:10,277 "I don't know what you're doing sending her this trash. 852 00:40:10,277 --> 00:40:12,497 "You sent her these coupons for diapers 853 00:40:12,497 --> 00:40:15,077 "and for cribs and for nursing equipment. 854 00:40:15,077 --> 00:40:16,623 "She's 18 years old 855 00:40:16,623 --> 00:40:18,877 "and it's like you're encouraging her to get pregnant." 856 00:40:18,877 --> 00:40:21,004 Now the manager, who has no idea what's going on 857 00:40:21,004 --> 00:40:23,839 with the pregnancy prediction machine 858 00:40:23,839 --> 00:40:25,222 that Andrew Pole built, 859 00:40:25,222 --> 00:40:26,896 says "Look, I'm so sorry. 860 00:40:26,896 --> 00:40:30,231 "I apologize, it's not going to happen again." 861 00:40:30,231 --> 00:40:32,568 And a couple days later the guy feels so bad about this 862 00:40:32,568 --> 00:40:35,159 that he calls the father at home and he says to the father, 863 00:40:35,159 --> 00:40:36,879 "I just wanted to apologize again. 864 00:40:36,879 --> 00:40:38,622 "I'm so sorry this happened." 865 00:40:38,622 --> 00:40:40,167 And the father kind of pauses for a moment. 866 00:40:40,167 --> 00:40:42,597 He says, "Well, I want you to know 867 00:40:42,597 --> 00:40:44,305 "I had a conversation with my daughter 868 00:40:44,305 --> 00:40:47,106 "and there's been some activities in my household 869 00:40:47,106 --> 00:40:49,023 "that I haven't been aware of 870 00:40:49,023 --> 00:40:50,778 "and she's due in August. 871 00:40:50,778 --> 00:40:53,777 "So I owe you an apology." 872 00:40:53,777 --> 00:40:55,368 And when I asked Andrew Pole about this, 873 00:40:55,368 --> 00:40:56,996 before he stopped talking to me, 874 00:40:56,996 --> 00:41:00,122 before Target told him that he couldn't talk to me anymore, 875 00:41:00,122 --> 00:41:03,469 he said, "Oh look, like you gotta understand, 876 00:41:03,469 --> 00:41:05,305 "like this science is just at the beginning, 877 00:41:05,305 --> 00:41:07,257 "like we're still playing with what we can figure out 878 00:41:07,257 --> 00:41:08,598 "about your life." 879 00:41:08,598 --> 00:41:13,603 (mellow electronic music) 880 00:41:18,126 --> 00:41:19,962 - Everybody who's on Facebook is involved 881 00:41:19,962 --> 00:41:22,298 in a transaction in which they're donating their data 882 00:41:22,298 --> 00:41:24,262 to Facebook, who then sells their data 883 00:41:24,262 --> 00:41:26,040 and in return they get this service 884 00:41:26,040 --> 00:41:27,388 which allows them to post pictures 885 00:41:27,388 --> 00:41:28,353 and connect to their friends 886 00:41:28,353 --> 00:41:30,608 and so on and so on and so on and so on. 887 00:41:30,608 --> 00:41:32,153 That's the transaction, 888 00:41:32,153 --> 00:41:34,442 but nobody knows that's the transaction. 889 00:41:34,442 --> 00:41:36,395 Most people, I think, don't understand that. 890 00:41:36,395 --> 00:41:39,626 They just literally think they're getting Facebook for free 891 00:41:39,626 --> 00:41:41,125 and it's not a free thing, 892 00:41:41,125 --> 00:41:46,130 we're paying for it by allowing them access to our data. 893 00:41:48,377 --> 00:41:51,143 - There are a lot of people on Facebook who don't know, 894 00:41:51,143 --> 00:41:54,653 for example, how much information is really out there 895 00:41:54,653 --> 00:41:57,453 about themselves and probably and apparently don't care 896 00:41:57,453 --> 00:42:00,286 as long as they can put up pictures of their cats. 897 00:42:00,286 --> 00:42:04,005 I think most people, when they think about privacy, 898 00:42:04,005 --> 00:42:06,338 they don't seem to connect 899 00:42:06,338 --> 00:42:09,647 their willingness to share their personal information 900 00:42:09,647 --> 00:42:12,553 with the world, either through social media 901 00:42:12,553 --> 00:42:14,900 or through shopping online or anything else, 902 00:42:14,900 --> 00:42:18,614 they don't seem to equate that with surveillance. 903 00:42:21,083 --> 00:42:24,593 - Every time I receive a text message, 904 00:42:24,593 --> 00:42:26,544 every time I make a phone call, 905 00:42:26,544 --> 00:42:28,473 my location is being recorded. 906 00:42:28,473 --> 00:42:32,518 That data about me is being pushed off to a server 907 00:42:32,518 --> 00:42:35,319 that is owned by my mobile operator. 908 00:42:35,319 --> 00:42:36,865 If I call that mobile phone operator and say 909 00:42:36,865 --> 00:42:39,619 "Hey, I'd like to have my data, please. 910 00:42:39,619 --> 00:42:40,735 "At the minimum, share it with me. 911 00:42:40,735 --> 00:42:45,337 "I'd like to see my locations over time." 912 00:42:45,337 --> 00:42:47,798 They won't give it to me. 913 00:42:47,798 --> 00:42:50,854 - The increased ability of these devices that we have 914 00:42:50,854 --> 00:42:53,387 to become recording and sensing objects, 915 00:42:53,387 --> 00:42:55,435 so data collection devices essentially, 916 00:42:55,435 --> 00:42:59,658 in public space, that changes a lot of things. 917 00:43:00,186 --> 00:43:02,359 - Even if the phone company took away 918 00:43:02,359 --> 00:43:04,207 all of your personal identifying information, 919 00:43:04,207 --> 00:43:06,625 it would know within about 30 centimeters 920 00:43:06,625 --> 00:43:08,135 where you woke up every morning 921 00:43:08,135 --> 00:43:09,553 and where you went to work every day 922 00:43:09,553 --> 00:43:10,762 and the path that you took 923 00:43:10,762 --> 00:43:12,145 and who you were walking with 924 00:43:12,145 --> 00:43:14,016 and so even if they didn't know who you are, 925 00:43:14,016 --> 00:43:16,021 they know who you are. 926 00:43:16,724 --> 00:43:20,280 What I'm really worried about is the cost to democracy. 927 00:43:20,280 --> 00:43:23,744 Now, today, it's nearly impossible to be truly anonymous 928 00:43:23,744 --> 00:43:27,742 and so the ability to everything to be connected to you 929 00:43:27,742 --> 00:43:29,426 and for everything you do in the real world 930 00:43:29,426 --> 00:43:30,879 to be connected to you, everything you're doing 931 00:43:30,879 --> 00:43:33,308 in cyberspace, and then the ability for 932 00:43:33,308 --> 00:43:35,399 whoever it is to take that, put it together, 933 00:43:35,399 --> 00:43:37,236 and turn it into a story. 934 00:43:37,236 --> 00:43:40,689 My fear really is that once there's so much data out there 935 00:43:40,689 --> 00:43:42,489 and once governments and companies 936 00:43:42,489 --> 00:43:45,836 start to be able to use that data to profile people, 937 00:43:45,836 --> 00:43:48,871 to filter them out, everybody is going to start to worry 938 00:43:48,871 --> 00:43:51,886 about their activities. 939 00:43:52,390 --> 00:43:56,854 - We're at a very, very important point 940 00:43:56,854 --> 00:44:01,536 where I think our society has come to realize this fact 941 00:44:01,536 --> 00:44:06,505 and just begun in earnest to debate the implictions of it. 942 00:44:07,254 --> 00:44:11,055 - You have, I think, an attitude in the NSA 943 00:44:11,055 --> 00:44:14,436 that they have a right to every bit of information 944 00:44:14,436 --> 00:44:16,305 they can collect. 945 00:44:16,305 --> 00:44:20,523 We have constructed a world where 946 00:44:20,523 --> 00:44:22,943 the government is collecting secretly 947 00:44:22,943 --> 00:44:25,870 all of the data it can on each individual citizen, 948 00:44:25,870 --> 00:44:29,783 whether that individual citizen has done anything or not. 949 00:44:29,783 --> 00:44:32,932 They have been collecting massive amounts of data 950 00:44:32,932 --> 00:44:36,012 through cell phone providers, Internet providers, 951 00:44:36,012 --> 00:44:38,801 that is then sifted through secretly 952 00:44:38,801 --> 00:44:42,577 by people over whom no democratic institution 953 00:44:42,577 --> 00:44:44,440 has effective control. 954 00:44:45,665 --> 00:44:47,955 There's a feeling that if you're not 955 00:44:47,955 --> 00:44:49,033 communing with terrorists, 956 00:44:49,033 --> 00:44:51,334 what do you care if the government gathers your information. 957 00:44:51,334 --> 00:44:53,345 This is probably the most pernicious, 958 00:44:53,345 --> 00:44:56,053 anti Bill of Rights line of thought that there is 959 00:44:56,053 --> 00:44:57,970 because these are rights we hold in common. 960 00:44:57,970 --> 00:44:59,481 Every violation of somebody else's rights 961 00:44:59,481 --> 00:45:01,612 is a violation of yours. 962 00:45:02,408 --> 00:45:04,116 - What's going to happen, I think, is that we now 963 00:45:04,116 --> 00:45:06,580 have so much information out there about ourselves 964 00:45:06,580 --> 00:45:08,544 and the ability for people to abuse it, 965 00:45:08,544 --> 00:45:09,962 people are going to get hurt, 966 00:45:09,962 --> 00:45:11,124 people are going to lose their jobs, 967 00:45:11,124 --> 00:45:12,751 people are going to get divorced, 968 00:45:12,751 --> 00:45:14,598 people are going to get killed 969 00:45:14,598 --> 00:45:16,307 and it's going to become really painful 970 00:45:16,307 --> 00:45:17,544 and everyone's going to realize 971 00:45:17,544 --> 00:45:19,439 we have to do something about this 972 00:45:19,439 --> 00:45:21,055 and then we're going to start to change. 973 00:45:21,055 --> 00:45:23,483 Now the question is how bad is it. 974 00:45:23,483 --> 00:45:26,447 - [Voiceover] You can't have a secret operation 975 00:45:26,447 --> 00:45:29,957 validated by a secret court based on secret evidence 976 00:45:29,957 --> 00:45:31,165 in a democratic republic. 977 00:45:31,165 --> 00:45:33,920 So the system closes and no information gets out 978 00:45:33,920 --> 00:45:37,684 except it gets leaked or it gets dumped on the world 979 00:45:37,684 --> 00:45:39,521 by outside actors, whether that's WikiLeaks, 980 00:45:39,521 --> 00:45:40,812 or whether that's Bradley Manning, 981 00:45:40,812 --> 00:45:42,275 or whether that's Edward Snowden. 982 00:45:42,275 --> 00:45:43,856 That's the way that people find out 983 00:45:43,856 --> 00:45:46,029 what their government is up to. 984 00:45:46,029 --> 00:45:47,412 We're living in a future where we've lost 985 00:45:47,412 --> 00:45:48,574 our right to privacy. 986 00:45:48,574 --> 00:45:49,992 We've given it away for convenience sake 987 00:45:49,992 --> 00:45:51,840 in our economic and social lives 988 00:45:51,840 --> 00:45:55,415 and we've lost it for fear's sake vis-a-vis our government. 989 00:45:58,270 --> 00:46:01,068 - Any time you're looking at an ability to segment, 990 00:46:01,068 --> 00:46:04,734 analyze, you've got to think about both sides. 991 00:46:05,240 --> 00:46:06,995 But there's so much good here, 992 00:46:06,995 --> 00:46:10,167 there's so much chance to improve the quality of life 993 00:46:10,167 --> 00:46:12,213 that to basically close the box and say, 994 00:46:12,213 --> 00:46:13,084 "You know what, we're not going to look 995 00:46:13,084 --> 00:46:15,304 "at all this information, we're not going to collect it," 996 00:46:15,304 --> 00:46:16,756 that's not practical. 997 00:46:16,756 --> 00:46:20,098 What we're going to have to do is think as a community. 998 00:46:20,557 --> 00:46:22,940 - We have cultures that have never been in dialogue 999 00:46:22,940 --> 00:46:26,031 with more than a hundred or 200 or 400 people 1000 00:46:26,031 --> 00:46:29,366 now connected to three billion. 1001 00:46:29,366 --> 00:46:34,371 (mellow music) 1002 00:46:35,918 --> 00:46:38,347 The phone is the on-ramp to the information network. 1003 00:46:38,347 --> 00:46:40,218 Once you're on the information network, 1004 00:46:40,218 --> 00:46:42,689 you're in, everybody's in. 1005 00:46:42,689 --> 00:46:44,479 - Billions and billions of people 1006 00:46:44,479 --> 00:46:46,909 who have been excluded from the discussion, 1007 00:46:46,909 --> 00:46:48,949 who couldn't afford to step into the world 1008 00:46:48,949 --> 00:46:50,158 of being connected, 1009 00:46:50,158 --> 00:46:51,738 step into the world of information, 1010 00:46:51,738 --> 00:46:54,992 step into the world of being able to learn things 1011 00:46:54,992 --> 00:46:58,380 they could never learn are suddenly on the network. 1012 00:47:00,303 --> 00:47:01,186 - [Voiceover] The world of the Internet, 1013 00:47:01,186 --> 00:47:02,430 from an innovation perspective, 1014 00:47:02,430 --> 00:47:05,149 is push innovation out of large institutions 1015 00:47:05,149 --> 00:47:07,700 to people on the edges. 1016 00:47:09,821 --> 00:47:13,377 - [Voiceover] I suspect as we equip these next billion 1017 00:47:13,377 --> 00:47:17,793 consumers with these devices that connect them 1018 00:47:17,793 --> 00:47:21,141 with the rest of the world and with the Internet, 1019 00:47:21,141 --> 00:47:24,855 we'll have a lot to learn about how they use them. 1020 00:47:26,730 --> 00:47:28,276 - All of these people in these countries 1021 00:47:28,276 --> 00:47:29,869 are now connecting with each other, 1022 00:47:29,869 --> 00:47:33,750 sharing data about prices of crops, prices of parts. 1023 00:47:33,750 --> 00:47:35,621 The Africans are talking to the Chinese 1024 00:47:35,621 --> 00:47:37,504 who are talking to the Indians 1025 00:47:37,504 --> 00:47:41,335 and the world is connected in its nooks and crannies. 1026 00:47:43,931 --> 00:47:46,777 - The person that is in Rwanda 1027 00:47:46,777 --> 00:47:50,659 that has their first phone that now has access 1028 00:47:50,659 --> 00:47:53,077 to an education system 1029 00:47:53,077 --> 00:47:55,831 that they never could have dreamed of before 1030 00:47:55,831 --> 00:47:58,632 can start finding solutions 1031 00:47:58,632 --> 00:48:02,467 for his or her little town, 1032 00:48:02,467 --> 00:48:04,762 his or her village. 1033 00:48:05,942 --> 00:48:09,277 - Once we have that ability to connect people 1034 00:48:09,277 --> 00:48:10,486 and they are able to be connected, 1035 00:48:10,486 --> 00:48:12,531 there's gonna be some genius, you know, 1036 00:48:12,531 --> 00:48:14,751 in some remote location who would never 1037 00:48:14,751 --> 00:48:16,332 have been discovered, who would never have had 1038 00:48:16,332 --> 00:48:19,423 the capability to get to the education, 1039 00:48:19,423 --> 00:48:23,974 to get to the resources that he or she needs and... 1040 00:48:26,059 --> 00:48:30,313 that young woman is going to change the world 1041 00:48:30,313 --> 00:48:33,073 rather than just changing her village. 1042 00:48:33,694 --> 00:48:38,668 - The idea that that genius will be able to find 1043 00:48:38,668 --> 00:48:41,423 his or her way into the greater culture 1044 00:48:41,423 --> 00:48:44,630 through the tiny, little two-by-two window 1045 00:48:44,630 --> 00:48:48,010 of a feature phone is very exciting. 1046 00:48:48,010 --> 00:48:51,221 - A billion people in India, a billion people in China, 1047 00:48:51,221 --> 00:48:52,534 you're talking, you know, 1048 00:48:52,534 --> 00:48:54,451 500 million to a billion in Africa. 1049 00:48:54,451 --> 00:48:56,997 Suddenly the world has a lot more minds 1050 00:48:56,997 --> 00:49:01,575 connected in the simplest, least expensive possible way 1051 00:49:01,575 --> 00:49:03,511 to make the world better. 1052 00:49:04,342 --> 00:49:05,969 - So you look at the agricultural revolution 1053 00:49:05,969 --> 00:49:07,514 and the Industrial Revolution. 1054 00:49:07,514 --> 00:49:09,676 Is the Internet and then the data revolution 1055 00:49:09,676 --> 00:49:11,605 associated with it of that scale? 1056 00:49:11,605 --> 00:49:13,197 It's certainly possible. 1057 00:49:13,197 --> 00:49:14,684 - I don't think there's any question that 1058 00:49:14,684 --> 00:49:16,289 we're at a moment in human history 1059 00:49:16,289 --> 00:49:19,043 that we will look back on in 50 or a hundred years 1060 00:49:19,043 --> 00:49:24,048 and say right around 2000 or so it all changed. 1061 00:49:25,970 --> 00:49:27,724 And I do think we will date 1062 00:49:27,724 --> 00:49:32,323 before the explosion of data and after. 1063 00:49:32,323 --> 00:49:33,903 I don't think it's an issue of climate change 1064 00:49:33,903 --> 00:49:36,414 or health or jobs, I think it's all issues. 1065 00:49:36,414 --> 00:49:39,737 Everything has information at its core, everything. 1066 00:49:39,737 --> 00:49:43,259 So if information matters, then reorganizing 1067 00:49:43,259 --> 00:49:45,386 the entire information network of the planet 1068 00:49:45,386 --> 00:49:47,721 is like wiring up the brain of a two-year-old child. 1069 00:49:47,721 --> 00:49:49,186 Suddenly that child can talk 1070 00:49:49,186 --> 00:49:51,568 and think and act and behave, right. 1071 00:49:51,568 --> 00:49:55,032 The world is wiring up a cerebral cortex, if you will, 1072 00:49:55,032 --> 00:49:57,228 of billions of connected elements 1073 00:49:57,228 --> 00:49:59,494 that are going to exchange billions of ideas, 1074 00:49:59,494 --> 00:50:01,087 billions of points of knowledge, 1075 00:50:01,087 --> 00:50:04,047 and billions of ways of working together. 1076 00:50:04,047 --> 00:50:07,591 - Together, there becomes an enormous wave of change 1077 00:50:07,591 --> 00:50:09,892 and that wave of change is going to take us 1078 00:50:09,892 --> 00:50:13,861 in directions that we can't begin to imagine. 1079 00:50:14,355 --> 00:50:18,690 - The ability to turn that data into actionable insight 1080 00:50:18,690 --> 00:50:20,538 is what computers are very good at, 1081 00:50:20,538 --> 00:50:23,048 the ability to take action is what we're really good at 1082 00:50:23,048 --> 00:50:26,244 and I think it's really important to separate those two 1083 00:50:26,244 --> 00:50:28,928 because people conflate them and get scared 1084 00:50:28,928 --> 00:50:31,056 and think the computers are taking over. 1085 00:50:31,056 --> 00:50:33,565 The computers are this extraordinary tool 1086 00:50:33,565 --> 00:50:37,610 that we have at our disposal to accelerate our ability 1087 00:50:37,610 --> 00:50:38,946 to solve the problems that, frankly, 1088 00:50:38,946 --> 00:50:40,457 we've gotten ourselves into. 1089 00:50:40,457 --> 00:50:42,247 - I am fundamentally optimistic, 1090 00:50:42,247 --> 00:50:46,257 but I'm not blindly, foolishly optimistic. 1091 00:50:46,257 --> 00:50:48,964 You got to remember, the financial crisis was brought to us 1092 00:50:48,964 --> 00:50:52,020 by big data people as well because 1093 00:50:52,020 --> 00:50:53,856 they weren't actually thinking very hard 1094 00:50:53,856 --> 00:50:55,853 about how do they create value for the world. 1095 00:50:55,853 --> 00:50:57,074 They were just thinking about 1096 00:50:57,074 --> 00:51:00,110 how do they create value for themselves. 1097 00:51:00,661 --> 00:51:02,660 You know, we have a fair amount of literature, 1098 00:51:02,660 --> 00:51:04,624 a fair amount of understanding that if you take 1099 00:51:04,624 --> 00:51:07,538 more out of the ecosystem than you put back in, 1100 00:51:07,538 --> 00:51:09,595 the whole thing breaks down. 1101 00:51:09,595 --> 00:51:13,733 That's why I think we have to actually earn our future. 1102 00:51:13,733 --> 00:51:16,104 We can't just sort of pat ourselves on the back 1103 00:51:16,104 --> 00:51:18,323 and think it's just going to fall into our laps. 1104 00:51:18,323 --> 00:51:22,194 We have to care about what kind of future we're making 1105 00:51:22,194 --> 00:51:23,959 and we have to invest in that future 1106 00:51:23,959 --> 00:51:26,168 and we have to make the right choices. 1107 00:51:26,168 --> 00:51:30,057 - It is, to me, paramount that a culture understands, 1108 00:51:30,057 --> 00:51:31,975 our culture understands 1109 00:51:31,975 --> 00:51:36,980 that we must take this data thing as ours, 1110 00:51:38,102 --> 00:51:40,075 that we are the platform for it, 1111 00:51:40,075 --> 00:51:42,400 humans, individuals are the platform for it, 1112 00:51:42,400 --> 00:51:45,083 that it is not something done to us, 1113 00:51:45,083 --> 00:51:49,012 but rather it is ours to do with something as we wish. 1114 00:51:52,644 --> 00:51:54,842 When I was young, we landed on the moon 1115 00:51:54,842 --> 00:51:58,931 and so the future to me meant going further than that. 1116 00:51:58,931 --> 00:52:00,697 We looked outward. 1117 00:52:00,697 --> 00:52:04,032 Today, I think there's a new energy around 1118 00:52:04,032 --> 00:52:06,497 the future and it has much more to do 1119 00:52:06,497 --> 00:52:08,960 with looking at where we are now 1120 00:52:08,960 --> 00:52:11,714 and the globe we stand on 1121 00:52:11,714 --> 00:52:14,977 and solving for that. 1122 00:52:14,977 --> 00:52:17,859 The tools that are in our hands now 1123 00:52:17,859 --> 00:52:20,483 are going to allow us to do that. 1124 00:52:20,483 --> 00:52:23,574 Now it's like no wait a minute, this is our place 1125 00:52:23,574 --> 00:52:27,792 and we're going to figure out how to make it blossom. 1126 00:52:27,792 --> 00:52:32,797 (dramatic music) 1127 00:53:15,806 --> 00:53:20,811 (mid tempo orchestral music)