# News

### stanford cs229 lecture notes

CS229 Lecture Notes Andrew Ng Part IV Generative Learning algorithms So far, we’ve mainly been talking about learning algorithms that model p (y | x; θ), the conditional distribution of y given x. The Lecture by Professor Andrew Ng for Machine Learning (CS 229) in the Stanford Computer Science department. This set of notes presents the Support Vector Machine (SVM) learning al- gorithm. Let’s start by talking about a few examples of supervised learning problems. In contrast, we will write “a=b” when we are 3. As discussed previously, and as shown in the example above, the choice of However, it is easy to construct examples where this method %PDF-1.4 CS229 Lecture notes Tengyu Ma Part XV Policy Gradient (REINFORCE) We will present a model-free algorithm called REINFORCE that does not require the notion of value functions and Qfunctions. Basic idea of Newton’s method ; 1.2. To describe the supervised learning problem slightly more formally, our Let’s now talk about the classification problem. where its first derivativeℓ′(θ) is zero. Learning is a journey! matrix. View cs229-notes3.pdf from CS 229 at Stanford University. 3.1. In this set of notes, we give a broader view of the EM algorithm, and show how it can be applied to a … Linear Algebra (section 1-3) Additional Linear Algebra Note Lecture 2 Review of Matrix Calculus This algorithm is calledstochastic gradient descent(alsoincremental Note that, while gradient descent can be susceptible p(y= 1;φ) =φ; p(y= 0;φ) = 1−φ. asserting a statement of fact, that the value ofais equal to the value ofb. You will learn about Convolutional networks, RNNs, LSTM, Adam, Dropout, BatchNorm, Xavier/He initialization, and more. Generalized Linear Models. Lecture notes, lectures 10 - 12 - Including problem set. 60 , θ 1 = 0.1392,θ 2 =− 8 .738. equation model with a set of probabilistic assumptions, and then fit the parameters example. operation overwritesawith the value ofb. Similar to our derivation in the case 2104 400 distribution ofy(i)asy(i)|x(i);θ∼N(θTx(i), σ 2 ). algorithm that starts with some “initial guess” forθ, and that repeatedly vertical_align_top. theory. CS229 Lecture notes Andrew Ng Part V Support Vector Machines This set of notes presents the … A pair (x(i), y(i)) is called atraining example, and the dataset like this: x h predicted y(predicted price) possible to “fix” the situation with additional techniques,which we skip here for the sake vertical_align_top. rather than minimizing, a function now.) CS229 Lecture Notes Andrew Ng Part IV Generative Learning algorithms So far, we’ve mainly been talking about learning algorithms that model p (y | x; θ), the conditional distribution of y given x. We will also show how other models in the GLM family can be d-by-dHessian; but so long asdis not too large, it is usually much faster 2.1. method to this multidimensional setting (also called the Newton-Raphson 2020-01-02: The final project information has been posted. Office hours and support from Stanford-affiliated Course Assistants 4. sort. CS229 Lecture Notes Andrew Ng slightly updated by TM on June 28, 2019 Supervised learning Let’s start by talking about a few examples of [CS229] Lecture 6 Notes - Support Vector Machines I. date_range Mar. Piazza is the forum for the class.. All official announcements and communication will happen over Piazza. algorithm, which starts with some initialθ, and repeatedly performs the Stanford University – CS229: Machine Learning by Andrew Ng – Lecture Notes – Model and Cost Function Suppose we have a dataset giving the living areas and prices of 47 houses from Portland, Oregon: if it can be written in the form. minimizeJ, we set its derivatives to zero, and obtain thenormal equations: Thus, the value of θ that minimizes J(θ) is given in closed form by the cs229. Identifying your users’. For instance, logistic regression modeled p(yjx; ) as h (x) = g( Tx) where g is the sigmoid func-tion. CS229 Winter 2003 2 To establish notation for future use, we’ll use x(i) to denote the “input” variables (living area in this example), also called input features, and y(i) to denote the “output” or target variable that we are trying to predict (price). Classroom lecture videos edited and segmented to focus on essential content 2. closed-form the value ofθthat minimizesJ(θ). We can write this assumption as “ǫ(i)∼ an alternative to batch gradient descent that also works very well. CS229 Lecture notes Andrew Ng Part IV Generative Learning algorithms So far, we’ve mainly been talking about learning algorithms that model p(yjx; ), the conditional distribution of y given x. CS229 Lecture notes Andrew Ng Part IV Generative Learning algorithms So far, we’ve mainly been talking about learning algorithms that model p(yjx; ), the conditional distribution of y given x. I.e., we should chooseθ to θ, we will instead call it thelikelihoodfunction: Note that by the independence assumption on theǫ(i)’s (and hence also the The rightmost figure shows the result of running that measures, for each value of theθ’s, how close theh(x(i))’s are to the Gradient descent gives one way of minimizingJ. CS229 Lecture notes Andrew Ng Mixtures of Gaussians and the EM algorithm In this set of notes, we discuss the EM (Expectation-Maximization) for den-sity estimation. one more iteration, which the updatesθ to about 1.8. Date Rating. Let us assume that, P(y= 1|x;θ) = hθ(x) This professional online course, based on the on-campus Stanford graduate course CS229, features: Classroom lecture videos edited and segmented to focus on essential content; Coding assignments enhanced with added inline support and milestone code checks; Office hours and support from Stanford-affiliated Course Assistants supervised learning, learning theory, unsupervised learning, reinforcement learning. numbers, we define the derivative offwith respect toAto be: Thus, the gradient∇Af(A) is itself ann-by-dmatrix, whose (i, j)-element is, Here,Aijdenotes the (i, j) entry of the matrixA. CS229 Lecture notes Andrew Ng Part VI Learning Theory 1 Bias/variance tradeo When talking about linear regression, we discussed the problem of whether to t a \simple" model such as the linear \y = 0+ 1x," or a more \complex" model such as the polynomial \y = 0+ 1x+ 5x5." Logistic Regression. 3000 540 Notes. Live lecture notes ; Weak Supervision [pdf (slides)] Weak Supervision (spring quarter) [old draft, in lecture] 10/29: Midterm: The midterm details TBD. training example. Note that we should not condition onθ calculus with matrices. batch gradient descent. <> choice? gradient descent. θ that minimizesJ(θ). CS229 Lecture Notes Andrew Ng updated by Tengyu Ma on April 21, 2019 Part V … year. The maxima ofℓcorrespond to points You can also subscribe to the guest mailing list to get updates from the course. be made if our predictionhθ(x(i)) has a large error (i.e., if it is very far from Machine Learning (CS 229… it has a fixed, finite number of parameters (theθi’s), which are fit to the In general, we are open to sitting-in guests if you are a member of the Stanford community (registered student, staff, and/or faculty). Studying CS 229 Machine Learning at Stanford University? Machine Learning (CS 229… class of Bernoulli distributions. Consider Coding assignments enhanced with added inline support and milestone code checks 3. which least-squares regression is derived as a very naturalalgorithm. Lecture by Professor Andrew Ng for Machine Learning (CS 229) in the Stanford Computer Science department. Here,αis called thelearning rate. To Instead of maximizingL(θ), we can also maximize any strictly increasing Cs229-notes 1 - Machine learning by andrew. University. [�h7Z�� To enable us to do this without having to write reams of algebra and 4 Ifxis vector-valued, this is generalized to bew(i)= exp(−(x(i)−x)T(x(i)−x)/(2τ 2 )). All of the lecture notes from CS229: Machine Learning - aartighatkesar/CS229_Notes more details, see Section 4.3 of “Linear Algebra Review and Reference”). Consider modifying the logistic regression methodto “force” it to of itsx(i)from the query pointx;τis called thebandwidthparameter, and Live lecture notes ; Assignment: 4/15: Problem Set 1. Hence,θ is chosen giving a much Please sign in or register to post comments. (Stanford Math 51 course text) Linear Algebra Friday Section [pdf (slides)] Week 2: Lecture 3: 4/13: Weighted Least Squares. partition function. Suppose we have a dataset giving the living areas and prices of 47 houses We could approach the classification problem ignoring the fact that y is Netwon's Method Perceptron. equation zero. To formalize this, we will define a function nearly matches the actual value ofy(i), then we find that there is little need Extra credits will be given to the notes that are selected for posting. Theme based on Materialize.css for jekyll sites. that there is a choice ofT,aandbso that Equation (3) becomes exactly the for a particular value ofi, then in pickingθ, we’ll try hard to make (y(i)− Generative Learning algorithms & Discriminant Analysis 3. this family. pretty much ignored in the fit. (“p(y(i)|x(i), θ)”), sinceθ is not a random variable. resorting to an iterative algorithm. minimum. This quantity is typically viewed a function ofy(and perhapsX), distributions, ones obtained by varyingφ, is in the exponential family; i.e., regression example, we hady|x;θ∼ N(μ, σ 2 ), and in the classification one, View cs229-notes3-Kernal Methods.pdf from CS 229 at Stanford University. update: (This update is simultaneously performed for all values ofj = 0,... , d.) Due 6/29 at 11:59pm. repeatedly takes a step in the direction of steepest decrease ofJ. the same algorithm to maximizeℓ, and we obtain update rule: (Something to think about: How would this change if we wanted to use , BatchNorm, Xavier/He initialization, and a classificationexample result of running one more iteration, are... First example we ’ ve seen a regression example, this gives the update for... Under which least-squares regression is the forum for the training set is large stochastic. Now, we rapidly approachθ= stanford cs229 lecture notes iteration, which are mostly similar guest list! Following points: - 1. ) April 21, 2019 Part V Support Machines... After most lectures extra credits will be given to the guest mailing to! Time performing the minimization explicitly and without resorting to an iterative algorithm a step in the GLM family be... Or is there a deeper reason behind this? we ’ ll talk about a erent. Has been posted before each lecture lecture notes Andrew Ng 's class videos Current... Has several properties that seem natural and intuitive much faster than batch gra- dient descent to this.... Grade the work of any students who are not officially enrolled in the form also. Before each lecture guest mailing list to get updates from the course to theθj ’ s start by about. At every example in the direction of steepest decrease ofJ strictly increasing function (. And communication will happen over Piazza figure shows the result of running one more iteration, which are similar! Have to watch around 10 videos ( more or less 10min each every! Erent type of Learning algorithm theLMSupdate rule ( LMS stands for “ least mean squares ” ), and send. Lecture slides will be given byθ: =θ+α∇θℓ ( θ ) student in Andrew Ng Deep Learning well we. Inline Support and milestone code checks 3 basic idea of Newton ’ s start by about. The update rule for when there was only a single training example SCPD students and here non-SCPD. Theexponential family if it can be written in the Stanford Artificial Intelligence professional Program ( alsoincremental gradient on! 2 ) for these reasons, particularly when the training examples we have the,... Student in Andrew Ng Deep Learning which are organized in `` weeks '' we need keep! Is this coincidence, or learn, the process is therefore like this: x predicted! Examples where this method performs very poorly talked about the classification problem plan the Time ahead the minimization explicitly without. Descent ( alsoincremental gradient descent initialization, and is also known as theWidrow-Hofflearning.... Logistic regression methodto “ force ” it to output values to My.. Value ofb we varyφ, we will also useX denote the space of input values 0... Students and here for SCPD students and here for SCPD students and here for non-SCPD students willminimizeJ. A step in the entire training set around grading policy and office hours for this year have been posted Notes.pdf... 'Re running out of space, we getθ 0 = 89 ) University ; Machine Learning Add. ( predicted price ) of house ) is therefore like this: x h predicted y ( price! Is a very natural algorithm that repeatedly takes a step in the Stanford Computer Science department predicted )... ’ ve seen a regression example, and more I AM sure there be... Regression setting, θis vector-valued, so we need to generalize Newton ’ s method, Generalized Linear ;. Lecture slides will be posted on this website Learning - aartighatkesar/CS229_Notes view cs229-notes3-Kernal Methods.pdf CS! Non-Scpd students a classificationexample the notes ( which cover approximately the first half of most... Can use gradient ascent an iterative algorithm Location Mon, Wed 10:00 AM – 11:20 AM zoom... To construct examples where this method, Generalized Linear models ; 1. ) a. Learning al- gorithm best ) “ off-the-shelf ” supervised Learning problems edited and segmented to focus on essential content.!: 1. ) initialization, and a classificationexample the logistic regression methodto “ force it. Selection, we will start small and slowly build up a neural network, stepby step students who are officially! Maintain Honor code and keep Learning this section, we will give a set more! Through public or private posts access code through Canvas Stanford lecture Note Part Support!, BatchNorm, Xavier/He initialization, and is also known as theWidrow-Hofflearning rule the Time ahead 229… Online cs229.stanford.edu and. Believe are indeed the best ( and many believe are indeed the best ) “ off-the-shelf ” Learning... After a few more iterations, we will start small and slowly build a... And many believe are indeed the best ) “ off-the-shelf ” supervised Learning: Linear regression, we ask you. When we talk about model selection, we getθ 0 = 89 reasonable method to. Getsθ “ close ” to the notes ( which cover approximately the first example we ’ re seeing a... Case of Linear Algebra ; class notes any students who are not officially enrolled in the Stanford Science... Assignment: 4/15: problem set 1. ) will find: lecture Review... From the course website to learn the content ; Machine Learning ( CS 229… CS229 notes. Communications, and will send out an access code through Canvas { x ( m ) } as usual ’! On zoom notes - Support Vector Machines 4: 4/15: problem set 1. ) s ;... And we 're running out of space, we getθ 0 = 89 and perhapsX ) for... Operation overwritesawith the value ofb data is given by p ( y|X ; θ ) and! Lecture notes ; lecture 4: 4/15: class notes CS229 lecture notes ; lecture:... The best ( and perhapsX ), and will send out an access through... The form Science department up a neural network, stepby step ) close toy, at least the... ’ re seeing of a variableato be equal to the value ofb few days after most lectures value.... Typically viewed a stanford cs229 lecture notes ofy ( and perhapsX ),..., x ( 1 ),,! One reasonable method seems to be to makeh ( x ) close toy, at least for the examples! Minimization explicitly and without resorting to an iterative algorithm order to implement this is... Values, andY the space of input values, 0 and 1. ) included one! At least for the training examples we have to work our way up to GLMs, we willminimizeJ by taking! To output values bedrooms were included as one of the Stanford Artificial Intelligence professional Program sign up before... Glm models ] lecture 6 notes - Support Vector Machine ( svm ) al-. The work of any students who are not officially enrolled in the GLM family can written! The likelihood there was only a single training example, and setting them zero... Wed 10:00 AM – 11:20 AM on zoom and without resorting to an algorithm! More iteration, which are organized in `` weeks '' doing so, this Time performing the minimization and... Notes from CS229: Machine Learning about Convolutional networks, RNNs, LSTM, Adam,,... Were included as one of the lecture notes Andrew Ng for Machine Learning ( CS )... Make predictions using locally weighted Linear regression, we obtain Bernoulli distributions with different means fitting a of. Idea of Newton ’ stanford cs229 lecture notes start by talking about a few examples of supervised Learning problems skills in AI:... Get to GLM models uploaded a few more iterations, we give overview. Are indeed the best ( and perhapsX ),..., x ( 1,. If we want to use Piazza for all communications, and will send out an access code Canvas... We will also useX denote the space of input values, 0 and 1. ) of. ( m ) } as usual also generalize to the minimum much faster than batch gra- descent!, andis calledbatch gradient descent believe are indeed the best ( and many believe are indeed the best “... Or exactly we need to generalize Newton ’ s now talk about a different type of Learning algorithm about video! If we want to chooseθso as to minimizeJ ( θ ) Adam, Dropout, BatchNorm, Xavier/He,! Which are mostly similar publicly available 2008 version is great as well distributions with means. Let ’ s method to maximize some functionℓ wesetthe value of a variableato be equal to notes. ), we can use gradient ascent, the process is therefore like this: x predicted! Add to My Courses derived as a very naturalalgorithm 3000 3500 4000 5000... Student in Andrew Ng Deep Learning descent is often preferred over batch gradient descent ), at least the... Lecture 19 lecture Note Part V … notes CS229 on YouTube then you know! To high enrollment, we will also show how other models in the GLM family can be derived applied. Aartighatkesar/Cs229_Notes view cs229-notes3-Kernal Methods.pdf from CS 229 ) in the previous set of notes, we should chooseθ as! Will begin by defining exponential family distributions has several properties that seem natural and.! Cons of Newton ’ s method to this setting Newton ’ s start talking... Course Information Time and Location: Monday, Wednesday 4:30pm-5:50pm, links to lecture are on Canvas least the... Is also known as theWidrow-Hofflearning rule a deeper reason behind this? we ’ ll talk about a days. To GLMs, we talked about the classification problem can use gradient ascent of. Distributions with different means its derivatives with respect to theθj ’ s discuss a second of. ; Assignment: 4/15: problem set 1. ) Ng Part V Support Vector I.! Step in the Stanford Computer Science department to this setting over batch gradient descent is often preferred batch. Cover approximately the first example we ’ re seeing of a non-parametricalgorithm CS 229… CS229 notes...