Author: Eugene Charniak
Publisher: MIT Press
Size: 16 Mb
Content: Your author is a long-time artificial-intelligence researcher whose field of expertise, natural-language processing, has been revolutionized by deep learning. Unfortunately, it took him (me) a long time to catch on to this fact. I can rationalize this since this is the third time neural networks have threatened a revolution but only the first time they have delivered.
Nevertheless, I suddenly found myself way behind the times and struggling to catch up. So I did what any self-respecting professor would do, scheduled myself to teach the stuff, started a crash course by surfing the web, and got my students to teach it to me. (This last is not a joke. In particular, the head undergraduate teaching assistant for the course, Siddarth (Sidd) Karramcheti, deserves special mention.)
This explains several prominent features of this book. First, it is short. I am a slow learner. Second, it is very much project driven. Many texts, particularly in computer science, have a constant tension between topic organization and organizing material around specific projects. Splitting the difference is often a good idea, but I find I learn computer science material best by sitting down and writing programs, so my book largely reflects my learning habits.
It was the most convenient way to put it down, and I am hoping many in the expected audience will find it helpful as well. Which brings up the question of the expected audience. While I hope many CS practitioners will find this book useful for the same reason I wrote it, as a teacher my first loyalty is to my students, so this book is primarily intended as a textbook for a course on deep learning.
The course I teach at Brown is for both graduate and undergraduates and covers all the material herein, plus some “culture” lectures (for graduate credit a student must add a significant final project). Both linear algebra and multivariate calculus are required. While the actual quantity of linear algebra material is not that great, students have told me that without it they would have found thinking about multilevel networks, and the tensors they require, quite difficult.
Multivariate calculus, however, was a much closer call. It appears explicitly only in Chapter 1, when we build up to back-propagation from scratch and I would not be surprised if an extra lecture on partial derivatives would do. Last, there is a probability and statistics prerequisite.
This simplifies the exposition and I certainly want to encourage students to take such a course. I also assume a rudimentary knowledge of programming in Python.
I do not include this in the text, but my course has an extra “lab” on basic Python. That your author was playing catch-up when writing this book also explains the fact that in almost every chapter’s section on further reading you will find, beyond the usual references to important research papers, many reference to secondary sources — others’ educational writings. I would never have learned this material without them.