Descriptive statistics consists of the application of statistical techniques to describe, organize and summarize large volumes of data , enabling their use in major decisions, projects, training, among other applications.
In this article, whether you are just starting out in your data career or looking to deepen your knowledge, we will introduce you to the basic concepts of descriptive statistics and their practical applications. Enjoy your reading!
What is descriptive statistics?
Descriptive statistics is the initial stage of data analysis , used to summarize and understand data. With technological advances, there has been a significant increase in the amount of data and efficient computational methods, contributing to the prominence of this sub-area of statistics.
In addition, descriptive statistics can be used in data analysis when applying the Lean Six Sigma methodology . This is an operational excellence methodology that measures and analyzes data to solve highly complex problems related to waste and process variability.
When is the best time to use descriptive statistics?
Descriptive statistics are widely used when the analyst is faced with a large amount of data to evaluate and needs to summarize it to facilitate interpretation . This can be done using the mean, median, mode, standard deviation, among other resources, which will be explained later.
Interestingly, despite companies dealing with large amounts of data on a daily basis, whether from employees or consumers, many still do not know how to use it to their advantage.
For example, according to a survey conducted by TOTVS , 42% of companies report a lack of qualified professionals to interpret data. Do you know what this means?
In addition to missing out on the opportunity to reach more advanced levels of maturity in the digitalization of processes , this situation could be resolved with strategic partnerships and, above all, through the qualification of their employees .
Therefore, PM3 relies on the Data Sprints methodology , which prepares professionals to feel able to organize, analyze and interpret data, obtaining insights and solving problems assertively.
Measures of central tendency or measures of position
Within descriptive statistics, both measures of central tender conclusion of paytm database ncy and measures of position are applied to identify the location of data . Let's understand!
Average
The average is the sum of all the values in the database divided by the total number of elements. The formula is:
Weighted average
In this context, each piece of data is given a specific weight and multiplied by it. The sum of these products is then divided by the total of the weights. The formula is:
Weighted average calculation
Fashion
A number represents the mode of a database, that is, it is the most frequent value present in that database. However, if no value repeats, there is no mode in that specific case.
Median
The Median is a measure of the central position of data. It is the central value of a data set when it is ordered in ascending or descending order.
If the number of values sorted is odd, the median is the number that is exactly in the middle of the list. If the number of values sorted is even, the median is calculated as the average of the two middle values.
Percentiles
Within descriptive statistics, percentiles are measures that divide the sample into 100 equal parts, ordering the data in ascending order. Thus:
The 1st percentile represents the value below which 1% of the data lies;
The 50th percentile is the median, where 50% of the data is below this value;
The 98th percentile indicates the value below which 98% of the data lies.
Its formula is:
Percentiles
Where:
K = the position where the percentile will be in the data;
i = the desired percentile number;
n = number of samples.
Quartiles
Finally, quartiles are values that divide the ordered data into four equal parts.
Using quartiles, it is possible to quickly assess both the dispersion and the central tendency of a set of samples, essential steps for understanding your data.
Its formula is:
Quartile formula
In which:
Q = the position where the quartile will be in the data;
i = the quartile we want to find;
n = number of samples.
Dispersion measures
Now that you know the measures of central tendency, let's look at the measures of dispersion, applied to assess how the data is distributed according to the desired pattern .
The goal is to find a value that summarizes the variability of a specific data set. Let's explore!
Amplitude
Range reveals how spread out the sample data is. It is one of the simplest and most practical ways to assess data dispersion.
To calculate the range of a sample set, simply subtract the smallest value from the largest value. If the range is high, it means the data is distributed over a large range. If it is low, the ranges are small.
Interquartile range
The interquartile range, which is part of descriptive statistics, is used to measure the degree of dispersion in relation to the central measure of the data.