본문 바로가기

728x90

boostcamp AI tech

(73)

Maximum Likelihood Estimation 1. 모수의 추정, parameter estimation- 어떤 확률변수 X의 확률 분포를 추정하는 것이 통계적 모델링의 목표이다. - parametric(모수적) 방법론 : 먼저 x가 어떤 확률 분포를 따른다고 가정한 후, 그 분포를 결정하는 모수를 추정 - nonparametric(비모수적) 방법론 : 특정 확률 분포를 가정하지 않고 데이터에 따라 모델의 구조, 모수의 수가 바뀜- 어떤 확률변수x가 N(μ, σ2)인 정규분포를 따른다고 가정했을 때, 모수의 평균과 분산은 다음과 같이 inference할 수 있다.- 표본 평균의 평균 => 모수의 평균, 표본 분산의 평균 => 모수의 분산- 모집단에서 크기가 n인 표본을 200개 추출했다. 그럼 200개의 표본에 대해서 평균 200개가 나온다. 이 통계..

Bayes Theorem - Bayse theorem : A가 일어났을 때 B가 일어날 확률을 B가 일어났을때 A가 일어난 확률을 가지고 구할 수 있다. - 새로운 데이터가 들어왔을때 이전에 구한 posterior를 piror로 사용하여 새로운 posterior를 구할 수 있다. - P(B|A) A가 일어났을 때 B가 일어날 확률이 높다고 해서 B의 원인이 A다 라고 causality(인과관계)를 추론하는것은 위험할 수 있다. - 치료법과 완치에 영향을 주는 cofounding factor(중첩요인) Z를 제거해야지 올바른 인과관계 추론이 가능하다. - 환자들 데이터에 따르면 treatment b가 완치율이 높아 보일 수 있다. - 하지만 신장 결석 크기에 따른 수술시 완치율은 a가 훨씬 높다. - Z의 T에 대한 개입을 제거하..

Boostcamp AI - week 1 첫 주차에는 AI를 공부하기 앞서 기본적인 부분에 대한 학습을 진행했다. 대부분 충분히 익숙한 개념들이였지만 빈약하게 알고 있었던 지식들을 발견하려고 노력했다. python library usage는 필요할때마다 구글링을 하는 습관이 있다. 은근히 이런 습관이 개발할 때 시간을 많이 잡아먹는 것 같다. 구체적인 function signature 다 외우지는 않더라도 기능은 숙지하는게 좋을 듯 하다. 내가 사용할 수 있는 도구정리랄까. 두 번 다시는 헷갈려서 고민하거나 다시 찾아보고 싶지 않다. 수학은 손으로 써 보는 습관이 덜 되어 있는 것 같다. 매번 눈으로 이해하고 넘어갔는데 직접 수식을 유도할 때 여러번 멈칫한다.이런 습관이 결국은 fundamental을 키워주지 않나 생각이 든다. Master C..

BPTT for RNN Back Propagation Through Time (BPTT) States computed in the forward pass must be stored until they are reused during the backward pass, so the memory cost is also O(τ). Loss function Let's say we are using cross-entropy loss Derivative of loss with respect to Wyh derivative of cross entrophy loss w.r.t. ot is (yt^ - yt) derivative of cross entrophy Loss w.r.t. softmax and derivative of output(t)..

Recurrent Neural Networks - sequence data(소리, 문자열, 주가 등)는 순서가 중요 - sequence 데이터는 i.i.d(독립동일분포)를 위반하기 쉽다. - 개가 사람을 물었다. 사람을 개가 물었다. 이렇게 순서만 바꾸더라도 sequence data의 확률분포가 달라진다. * i.i.d : independently and identically distributed - 주사위를 20번 던진다. {x1, x2, ...., x20} - 각 확률 변수 xi는 서로 독립이다. - 각 확률 변수 xi는 같은 분포를 따른다 (marginal distribution이 같다.) - 베이즈 법칙에 따라서 conditional probability로 (x1,...,xt-1)이 주어 졌을 때 그 다음에 Xt의 확률을 구할 수 있다. - ..

Convolutional Neural Network The portion of the image is stored in a single cell, after the operation of the filter Cell is regarded as neurons, when combined, forma 2D matrix called Activation Map/ Feature Map The neighborhood which is pointed by a single neuron is called a Local Receptive Field. While training the network, the weights between the neurons (weight of filter) act as a parameter that is tweaked while training..

Autoencoder study note of LINK 1. Autoencoder The Encoder generally uses a series of Dense and/or Convolutional layers to encode an image into a fixed length vector that represents the image a compact form, while the Decoder uses Dense and/or Convolutional layers to convert the latent representation vector back into that same image or another modified image. Practical applications of an Autoencoder network ..

Cost functions and Gradient descent 1. Cost functions The Loss function computes the errors of a single training example. The Cost Function computes the average of the loss functions for all the examples. 2. Gradient descent ∇F(a n) = the gradient term indicates the direction of the steepest descent, The direction for the next position Mini-batch gradient descent: It combines concepts from the first two. It divides the dataset wit..

이전 1 ··· 6 7 8 9 10 다음

728x90

티스토리툴바