Statistics in data analysis

What is Statistics?

Statistics in data analysis

Distribution

Statistical methods are mainly useful to ensure that your data are interpreted correctly. And that apparent relationships are really significant or meaningful and it is not simply happen by chance. Actually, the statistical analysis helps to find meaning to the meaningless numbers. One of important concept of statistic is distribution. This article will give you some example for applying distribution in data analytic

Uniform distribution

A distribution in statistics is a function that shows the possible values for a variable and how often they occur. One of the basic distribution is uniform distribution, which measure for events having the same probability of happening. For instance, if your company create a marketing campaign for 100 first customers get the reward an Iphone Promax, the probability of each customer winning the price is equal 1 %.

Normal distribution

Other distribution is normal distribution, which is an arrangement of a data set in which most values cluster in the middle of the range and the rest taper off symmetrically toward either extreme. The example for normal distribution is that the height or weight of all people in nation. However, normal distribution can be a trap because some data is misunderstanding. For instance, when people analyze salary of industry or company, they often say that the average salary is around 10000 USD per year and increase fast by year. However, there are some people whose salary are higher than 1 000 000 USD per year and other employee with salary are around 1000 USD per year. The company may boast that it doubled the average salary of its employees, but actually increased the salary of the leader 10x but only increased the salary of the employees by 1%

Pareto distribution

Another type of distribution applied in data analytic is Pareto distribution. It measures the case that a small value event occur frequently and high value event occur rarely. In marketing analyst, they can use this distribution to find customer segmentation, which lead them to focus on important customer. Other instance is in risk modelling. Risk modeler need to define and calculate probability, stress test of some low probability event that can cause serious effect on system

Conclusion

From my point of view, mastering distribution will bring huge benefit in understanding data. Then, a good decision can be made for your business

Reference

sticker #1
Subscribe to Dwarves Memo

Receive the latest updates directly to your inbox.