The scenery ahead: April 2022

Tuesday, April 26, 2022

Anticipating others’ behavior on the road

A new machine-learning system may someday help driverless cars predict the next moves of nearby drivers, cyclists, and pedestrians in real-time.

Humans may be one of the biggest roadblocks keeping fully autonomous vehicles off city streets.

If a robot is going to navigate a vehicle safely through downtown Boston, it must be able to predict what nearby drivers, cyclists, and pedestrians are going to do next.

Behavior prediction is a tough problem, however, and current artificial intelligence solutions are either too simplistic (they may assume pedestrians always walk in a straight line), too conservative (to avoid pedestrians, the robot just leaves the car in park), or can only forecast the next moves of one agent (roads typically carry many users at once.)

MIT researchers have devised a deceptively simple solution to this complicated challenge. They break a multiagent behavior prediction problem into smaller pieces and tackle each one individually, so a computer can solve this complex task in real-time.

Their behavior-prediction framework first guesses the relationships between two road users — which car, cyclist, or pedestrian has the right of way, and which agent will yield — and uses those relationships to predict future trajectories for multiple agents.

These estimated trajectories were more accurate than those from other machine-learning models, compared to real traffic flow in an enormous dataset compiled by autonomous driving company Waymo. The MIT technique even outperformed Waymo’s recently published model. And because the researchers broke the problem into simpler pieces, their technique used less memory.

“This is a very intuitive idea, but no one has fully explored it before, and it works quite well. The simplicity is definitely a plus. We are comparing our model with other state-of-the-art models in the field, including the one from Waymo, the leading company in this area, and our model achieves top performance on this challenging benchmark. This has a lot of potential for the future,” says co-lead author Xin “Cyrus” Huang, a graduate student in the Department of Aeronautics and Astronautics and a research assistant in the lab of Brian Williams, professor of aeronautics and astronautics and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL).

Joining Huang and Williams on the paper are three researchers from Tsinghua University in China: co-lead author Qiao Sun, a research assistant; Junru Gu, a graduate student; and senior author Hang Zhao PhD ’19, an assistant professor. The research will be presented at the Conference on Computer Vision and Pattern Recognition.

Multiple small models

The researchers’ machine-learning method, called M2I, takes two inputs: past trajectories of the cars, cyclists, and pedestrians interacting in a traffic setting such as a four-way intersection, and a map with street locations, lane configurations, etc.

Using this information, a relation predictor infers which of two agents has the right of way first, classifying one as a passer and one as a yielder. Then a prediction model, known as a marginal predictor, guesses the trajectory for the passing agent, since this agent behaves independently.

A second prediction model, known as a conditional predictor, then guesses what the yielding agent will do based on the actions of the passing agent. The system predicts a number of different trajectories for the yielder and passer, computes the probability of each one individually, and then selects the six joint results with the highest likelihood of occurring.

M2I outputs a prediction of how these agents will move through traffic for the next eight seconds. In one example, their method caused a vehicle to slow down so a pedestrian could cross the street, then speed up when they cleared the intersection. In another example, the vehicle waited until several cars had passed before turning from a side street onto a busy, main road.

While this initial research focuses on interactions between two agents, M2I could infer relationships among many agents and then guess their trajectories by linking multiple marginal and conditional predictors.

Thursday, April 7, 2022

An optimized solution for face recognition

When artificial intelligence is tasked with visually identifying objects and faces, it assigns specific components of its network to face recognition — just like the human brain.

The human brain seems to care a lot about faces. It’s dedicated a specific area to identifying them, and the neurons there are so good at their job that most of us can readily recognize thousands of individuals. With artificial intelligence, computers can now recognize faces with a similar efficiency — and neuroscientists at MIT’s McGovern Institute for Brain Research have found that a computational network trained to identify faces and other objects discovers a surprisingly brain-like strategy to sort them all out.

The finding, reported March 16 in Science Advances, suggests that the millions of years of evolution that have shaped circuits in the human brain have optimized our system for facial recognition.

“The human brain’s solution is to segregate the processing of faces from the processing of objects,” explains Katharina Dobs, who led the study as a postdoc in the lab of McGovern investigator Nancy Kanwisher, the Walter A. Rosenblith Professor of Cognitive Neuroscience at MIT. The artificial network that she trained did the same. “And that’s the same solution that we hypothesize any system that’s trained to recognize faces and to categorize objects would find,” she adds.

“These two completely different systems have figured out what a — if not the — good solution is. And that feels very profound,” says Kanwisher.

Functionally specific brain regions

More than 20 years ago, Kanwisher and her colleagues discovered a small spot in the brain’s temporal lobe that responds specifically to faces. This region, which they named the fusiform face area, is one of many brain regions Kanwisher and others have found that are dedicated to specific tasks, such as the detection of written words, the perception of vocal songs, and understanding language.

Kanwisher says that as she has explored how the human brain is organized, she has always been curious about the reasons for that organization. Does the brain really need special machinery for facial recognition and other functions? “‘Why questions’ are very difficult in science,” she says. But with a sophisticated type of machine learning called a deep neural network, her team could at least find out how a different system would handle a similar task.

Dobs, who is now a research group leader at Justus Liebig University Giessen in Germany, assembled hundreds of thousands of images with which to train a deep neural network in face and object recognition. The collection included the faces of more than 1,700 different people and hundreds of different kinds of objects, from chairs to cheeseburgers. All of these were presented to the network, with no clues about which was which. “We never told the system that some of those are faces, and some of those are objects. So it’s basically just one big task,” Dobs says. “It needs to recognize a face identity, as well as a bike or a pen.”

As the program learned to identify the objects and faces, it organized itself into an information-processing network with that included units specifically dedicated to face recognition. Like the brain, this specialization occurred during the later stages of image processing. In both the brain and the artificial network, early steps in facial recognition involve more general vision processing machinery, and final stages rely on face-dedicated components.

It’s not known how face-processing machinery arises in a developing brain, but based on their findings, Kanwisher and Dobs say networks don’t necessarily require an innate face-processing mechanism to acquire that specialization. “We didn’t build anything face-ish into our network,” Kanwisher says. “The networks managed to segregate themselves without being given a face-specific nudge.”

Kanwisher says it was thrilling seeing the deep neural network segregate itself into separate parts for face and object recognition. “That’s what we’ve been looking at in the brain for 20-some years,” she says. “Why do we have a separate system for face recognition in the brain? This tells me it is because that is what an optimized solution looks like.”

Now, she is eager to use deep neural nets to ask similar questions about why other brain functions are organized the way they are. “We have a new way to ask why the brain is organized the way it is,” she says. “How much of the structure we see in human brains will arise spontaneously by training networks to do comparable tasks?”

Dan Huttenlocher ponders our human future in an age of artificial intelligence

What does it mean to be human in an age where artificial intelligence agents make decisions that shape human actions? That’s a deep question with no easy answers, and it’s been on the mind of Dan Huttenlocher SM ’84, PhD ’88, dean of the MIT Schwarzman College of Computing, for the past few years.

“Advances in AI are going to happen, but the destination that we get to with those advances is up to us, and it is far from certain,” says Huttenlocher, who is also the Henry Ellis Warren Professor in the Department of Electrical Engineering and Computer Science.

Along with former Google CEO Eric Schmidt and elder statesman Henry Kissinger, Huttenlocher recently explored some of the quandaries posed by the rise of AI, in the book, “The Age of AI: And Our Human Future.” For Huttenlocher and his co-authors, “Our belief is that, to get there, we need much more informed dialogue and much more multilateral dialogue. Our hope is that the book will get people interested in doing that from a broad range of places,” he says.

Now, with nearly two and a half years as the college dean, Huttenlocher doesn’t just talk the talk when it comes to interdisciplinarity. He is leading the college as it incorporates computer science into all fields of study at MIT while teaching students to use formidable tools like artificial intelligence ethically and responsibly.

That mission is being accomplished, in part, through two campus-wide initiatives that Huttenlocher is especially excited about: the Common Ground for Computing Education and Social and Ethical Responsibilities of Computing (SERC). SERC is complemented by numerous research and scholarly activities, such as AI for Health Care Equity and the Research Initiative for Combatting Systemic Racism. The Common Ground supports the development of cross-disciplinary courses that integrate computing into other fields of study, while the SERC initiative provides tools that help researchers, educators, and students understand how to conceptualize issues about the impacts of computing early in the research process.

“When I was a grad student, you worked on computer vision assuming that it was going to be a research problem for the rest of your lifetime,” he says. “Now, research problems have practical applications almost overnight in computing-related disciplines. The social impacts and ethical implications around computing are things that need to be considered from the very beginning, not after the fact.”

Budding interest in a nascent field

A deep thinker from an early age, Huttenlocher began pondering questions at the intersection of human intelligence and computing when he was a teenager.

With a mind for math, the Chicago native learned how to program before he entered high school, which was a rare thing in the 1970s. His parents, both academics who studied aspects of the human mind, influenced the path he would follow. His father was a neurologist at the University of Chicago Medical School who studied brain development, while his mother was a professor of cognitive psychology at the same institution.

Huttenlocher pursued a joint major in computer science and cognitive psychology as an undergraduate at the University of Michigan in an effort to bring those two disciplines together. When it came time to apply to graduate school, he found the perfect fit for his dual interests in the nascent field of AI, and enrolled at MIT.

While earning his master’s degree and PhD (in 1984 and 1988, respectively), he researched speech recognition, object recognition, and computer vision. He became fascinated by how machines can directly perceive the world around them. Huttenlocher was also drawn in by the entrepreneurial activity that was then ramping up around Cambridge. He spent summers interning at Silicon Valley startups and small tech companies in the Boston area, which piqued his interest in industry.

“I grew up in an academic household and had a healthy skepticism of following in my parents’ footsteps. So when I graduated, I wasn’t quite sure if I wanted an academic path or not. And to be honest, I’ve been a little bit ambivalent about it ever since. For better or worse, I’ve often ended up doing both at the same time,” he says.

Big problems, direct impact

Huttenlocher joined the computer science faculty at Cornell University in 1988 and also took a position at the Xerox Palo Alto Research Center (PARC), where he had interned as a graduate student. He taught computer science courses and worked on academic research projects when Cornell was in session, and spent his summers at Xerox PARC, as well as one day a week consulting remotely. (Long before Zoom, remote connectivity was “still pretty sketchy” in those days, he says.)

“I’ve long wanted to pair the deeper, bigger problems that we tend to try to make progress on in academia with a more direct and immediate impact on people, so spending time at Xerox PARC and at Cornell was a good way to do that,” he says.

Early in his research career, Huttenlocher took a more algorithmic approach to solving computer vision problems, rather than taking the generic optimization approaches that were more common at the time. Some of the techniques he and his collaborators developed, such as using a graph-based representation of an image, are still being used more than 20 years later.

Later, he and his colleagues conducted some of the first studies on how communities come together on social networks. In those pre-Facebook days, they studied LiveJournal, a social networking site that was popular in the early 2000s. Their work revealed that a person’s tendency to join an online community is not only influenced by the number of friends they have in that community, but also by how those friends are connected to one another.

In addition to research, Huttenlocher was passionate about bridging gaps between disciplines. He was named dean of the interdisciplinary Faculty of Computing and Information Science at Cornell in 2009. Three years later, he took his bridge-building skills to New York City when he became the founding dean of Cornell Tech, a new graduate school being established on Roosevelt Island.

That role was a tremendous challenge but also an extraordinary opportunity to create a campus that combined academia in computing-related disciplines with the growing tech community in New York, he says.

In a way, the role prepared him well to be founding dean of the MIT Schwarzman College of Computing, whose launch represented the most significant structural change to the Institute since the early 1950s.

“I think this place is very special. MIT has its own culture. It is a distinctive place in the positive sense of the word ‘distinctive.’ People are insanely curious here and very collaborative when it comes to solving problems. Just the opportunity to help build something new at MIT, something that will be important for the Institute but also for the country and the world, is amazing,” he says.

Making connections

While Huttenlocher was overseeing the creation of Cornell Tech, he was also forging connections around New York City. Before the Roosevelt Island campus was built, the school rented space in Google’s Eighth Avenue building, which is how he met then-Google CEO Eric Schmidt. The two enjoyed talking about (and sometimes arguing about) the promises and perils of artificial intelligence. At the same time, Schmidt was discussing AI with Henry Kissinger, whom he had befriended at a conference. By happenstance, the three got together and started talking about AI, which led to an article in The Atlantic and, eventually, the book.

“What we realized when we started talking about these questions is that the broader historical and philosophical context for an AI age is not something that has been looked at very much. When people are looking at social and ethical issues around computing, it is usually focused on the current problem, which is vital, but we think this broader framing is also important,” he says.

And when it comes to questions about AI, Huttenlocher feels a sense of urgency.

Advancements are happening so rapidly that there is immense pressure for educational institutions to keep up. Academic courses need to have computing woven through them as part of their intellectual fabric, especially as AI continues to become a larger part of everyday life, he says. This underscores the important work the college is doing, and the challenge it faces moving forward.

For Huttenlocher, who has found himself working in the center of a veritable Venn diagram of disciplines since his days as an undergraduate, it is a challenge he has fully embraced.

“It should not just be computer scientists or engineers looking at these problems. But it should not just be social scientists or humanists looking at them either,” he says. “We really need to bring different groups together.”

The scenery ahead

Tuesday, April 26, 2022

Anticipating others’ behavior on the road

Thursday, April 7, 2022

An optimized solution for face recognition

Dan Huttenlocher ponders our human future in an age of artificial intelligence

Dusun DSOM-042R: A Rockchip RK3588M System-on-Module for Automotive AIoT Applications

Labels

Blog Archive