기계학습, 통계처리에 꼭 알아 둘만한 확률밀도함수
- You met the Bernoulli distribution above, over two discrete outcomes—tails or heads. Think of it, however, as a distribution over 0 and 1, over 0 heads (i.e. tails) or 1 heads. Above, both outcomes were equally likely, and that’s what’s illustrated in the diagram. The Bernoulli PDF has two lines of equal height, representing the two equally-probable outcomes of 0 and 1 at either end. The Bernoulli distribution could represent outcomes that aren’t equally likely, like the result of an unfair coin toss. Then, the probability of heads is not 0.5, but some other value p, and the probability of tails is 1-p. Like many distributions, it’s actually a family of distributions defined by parameters, like p here. When you think “Bernoulli,” just think “(possibly unfair) coin toss.”
- 동전의 앞/뒤처럼 이벤트가 0 또는 1밖에 일어나지 않는 분포를 말합니다. 동전은 확률이 0.5/0.5 겠지만 다른 경우도 있을 수 있겠죠
- the uniform distribution, characterized by its flat PDF. Imagine rolling a fair die. The outcomes 1 to 6 are equally likely. It can be defined for any number of outcomes n or even as a continuous distribution. Associate the uniform distribution with “rolling a fair die.”
- 주사위처럼 모든 결과에 대한 확률이 동일한 확률분포입니다.
- The binomial distribution may be thought of as the sum of outcomes of things that follow a Bernoulli distribution. Toss a fair coin 20 times; how many times does it come up heads? This count is an outcome that follows the binomial distribution. Its parameters are n, the number of trials, and p, the probability of a “success” (here: heads, or 1). Each flip is a Bernoulli-distributed outcome, or trial. Reach for the binomial distribution when counting the number of successes in things that act like a coin flip, where each flip is independent and has the same probability of success.
- 동전을 n번 던졌을 때 p번만큼 앞면이 나올 확률은? Binomial은 이렇게 0 또는 1이 나오는 이벤트(각각이 Bernoulli확률을 갖는 이벤트)에 대해 1이 발생활 횟수에 대한 확률입니다.
- What about the count of customers calling a support hotline each minute? That’s an outcome whose distribution sounds binomial, if you think of each second as a Bernoulli trial in which a customer doesn’t call (0) or does (1). However, as the power company knows, when the power goes out, 2 or even hundreds of people can call in the same second. Viewing it as 60,000 millisecond-sized trials still doesn’t get around the problem—many more trials, much smaller probability of 1 call, let alone 2 or more, but, still not technically a Bernoulli trial. However, taking this to its infinite, logical conclusion works. Let n go to infinity and let p go to 0 to match so that np stays the same. This is like heading towards infinitely many infinitesimally small time slices in which the probability of a call is infinitesimal. The limiting result is the Poisson distribution.
- 1시간에 평균 10번의 전화통화가 온다고 해봅시다. 그렇다면 한시간에 12번 전화통화가 올 확률은? 이것이 바로 poisson(포아송) 확률입니다. 이것은, 예를들어, 60분 중 48번의 실패(0)와 12번의 성공(1)을 하면 되겠죠? 또는, 60분이 아니라 더 잘게 쪼개서 988번의 실패와 12번의 성공을 하면 되겠죠? 이처럼 시행횟수가 크고 이벤트가 일어날 확률이 작은 bionomial 분포가 바로 poisson 분포에 수렴합니다. (이 때문에 binomial의 근사로도 쓰이지요.)
- Imagining this odd situation has a point, because makes it simple to explain the hypergeometric distribution. This is the distribution of that same count if the balls were drawn without replacement instead. Undeniably it’s a cousin to the binomial distribution, but not the same, because the probability of success changes as balls are removed. If the number of balls is large relative to the number of draws, the distributions are similar because the chance of success changes less with each draw.
- When people talk about picking balls from urns without replacement, it’s almost always safe to interject, “the hypergeometric distribution, yes,” because I have never met anyone who actually filled urns with balls and then picked them out, and replaced them or otherwise, in real life. (I don’t even know anyone who owns an urn.) More broadly, it should come to mind when picking out a significant subset of a population as a sample.
- 까만공과 하얀공이 절반씩 있는데 그것을 여러번 뽑는다고 해보죠. 그럼 이것은 Binomial과 같겠죠? 땡~ 아닙니다. 왜냐면 공을 뽑을 때 만약 그 공을 다시 채워넣지 않는다면 남아있는 공의 확률은 바뀔테니 말이죠. Binomial의 경우와 달리 replacement(다시 보충)를 허용하지 않는 것이 바로 hypergeometric 확률입니다.
- From simple Bernoulli trials arises another distribution. How many times does a flipped coin come up tails before it first comes up heads? This count of tails follows a geometric distribution. Like the Bernoulli distribution, it’s parameterized by p, the probability of that final success. It’s not parameterized by n, a number of trials or flips, because the number of failure trials is the outcome itself.
- 주사위를 굴렸을 때 한번에 6이 나올 확률은? 두번만에 6이 나올 확률은? 세번만에, 네번만에… 이처럼 geometric 분포는 어떤 이벤트가 일어날 때까지의 횟수에 대한 확률입니다. 재밌는건 이벤트의 확률이 어떠하든 늘 "가장 첫번째"에 이벤트가 발생할 확률이 가장 크다는 것이죠.
- The negative binomial distribution is a simple generalization. It’s the number of failures until r successes have occurred, not just 1. It’s therefore parameterized also by r. Sometimes it’s described as the number of successes until r failures. As my life coach says, success and failure are what you define them to be, so these are equivalent, as long as you keep straight whether p is the probability of success or failure.
- Geometric이 한번 성공할 때까지 걸리는 횟수에 대한 분포라면 negative binominal은 n번 성공할 때까지 걸리는 횟수에 대한 분포입니다. (이름을 왜 geometric이랑 비슷하게 안지은거야?;
- You get the exponential distribution, which accurately describes the distribution of time until a call. It’s a continuous distribution, the first encountered here, because the outcome time need not be whole seconds. Like the Poisson distribution, it is parameterized by a rate λ.
- bionomial의 연속버전이 poisson이었다면, geometric의 연속버전이 exponential분포입니다. 다시말해 "평균 5분만에 전화가 걸려온다고 할 때 다음 전화가 7분 후에 걸려올 확률은?"과 같이요.
- “time until failure.” In fact, this is so important that more general distributions exist to describe time-to-failure, like the Weibull distribution. Whereas the exponential distribution is appropriate when the rate—of wear, or failure for instance—is constant, the Weibull distribution can model increasing (or decreasing) rates of failure over time. The exponential is merely a special case.
- exponential이 "다음 이벤트가 성공할 때 까지의 실패구간은"에 대한 함수였다면 반대로 Weibull은 "첫 실패가 발생할 때까지 이번 이벤트가 성공할 구간"에 대한 확률입니다.
- 너무 유명한 확률분포죠. 특히 매우 많은 수의 동일 확률분포를 가진 샘플들의 산술평균은 그 샘플들이 어떤 분포를 따르든(binomial이든 exponential이든 아님 다른거든) 결국 Gaussian 분포로 수렴한다는 "중심극한정리"가 매우 유용하기에 이 분포는 매우 많은 곳에 적용될 수 있죠.
- An outcome that follows a log-normal distribution takes on values whose logarithm is normally distributed. Or: the exponentiation of a normally-distributed value is log-normally distributed. If sums of things are normally distributed, then remember that products of things are log-normally distributed.
- 변수의 log 값이 Gaussian을 나타내는 분포입니다. 다시말해 Gaussian을 exponential 한 함수지요.
- Student’s t-distribution is the basis of the t-test that many non-statisticians learn in other sciences. It’s used in reasoning about the mean of a normal distribution, and also approaches the normal distribution as its parameter increases. The distinguishing feature of the t-distribution are its tails, which are fatter than the normal distribution’s.
- 정규분포의 mean 값에 대한 판단을 내릴 떄 사용하는 확률분포입니다.
- the chi-squared distribution is the distribution of the sum of squares of normally-distributed values. It’s the distribution underpinning the chi-squared test which is itself based on the sum of squares of differences, which are supposed to be normally distributed.
- Gaussian 분포를 가진 확률변수의 제곱들의 합에 대한 분포입니다. 예를 들어 k자유도의 chi-squared는 k개의 독립적인 Gaussian들에 대한 합의 확률분포죠.