Showing posts with label Basic Statistics. Show all posts
Showing posts with label Basic Statistics. Show all posts

Five Number Summary

 


It consists of the following things:

  • The minimum value of a data set is the least value in the set.
  • The maximum value of a data set is the greatest value in the set.
  • The range of a data set is the distance between the maximum and minimum value. To compute the range of a data set, we subtract the minimum from the maximum:
    range = maximum – minimum.
  • The interquartile range of a data set is the distance between the two quartiles.
    Interquartile range = Q3 – Q1

It provides  a way of determining the shape of the distribution .i.e. to see if there is symmetric or not in data.

For Symmetry:

a.     Difference between Second Quartile and Minimum Value is equal to difference between Maximum Value and Second Quartile (Q2-Xmin=Xmax-Q2)

b.     Difference between First Quartile and Minimum Value is equal to difference between Maximum value and Third quartile. (Q1-Xmin=Xmax-Q3)

For Right Skewed:

a.      Difference between Maximum value and Second quartile is greater than difference between Second Quartile and Minimum value. (Xmax-Q2>Q2-Xmin)

b.     Difference between Maximum value and Third Quartile is greater than First Quartile and Minimum Value. (Xmax-Q3>Q1-Xmin)

For Left Skewed

a.      Difference between Second Quartile and Minimum Value is greater than difference between Maximum value and Second Quartile. (Q2-Xmin>Xmax-Q2)

b.     Difference between First Quartile and Minimum Value is greater than difference between Maximum value and Third Quartile. (Q1-Xmin>Xmax-Q3)

Example 1:  Find the five-number summary for the data set {3, 7, 8, 5, 12, 14, 21, 13,

18}.

Minimum: 3         Q1 : 6            Median: 12            Q3 : 16           Maximum: 21

Example 2: Find the five-number summary for the data set {3, 7, 8, 5, 12, 14, 21, 15, 18, 14}.

Minimum: 3         Q1 : 7            Median: 13            Q3 : 15           Maximum: 21  

 

Methods of Data Collection

 


a.      Methods of Primary Data Collection

i.Observation

It is the process of recognizing and noting people, objects and occurrences rather than asking for information. Communication with people is absent in this method. It allows everybody to study people in their natural setting without influencing their behavior. Observational data consists of detailed information about groups or situations.

Methods of Observation:

1.Covert and Overt Observation

Covert Observation: There is not identification of the researcher so that the subjects behavior is not influenced by his or her presence. The researcher observes the situations from a distance.

Overt Observation: There is identification of the researcher and the researcher explains about the purpose of observation. The problem with this method is that the subject teds to modify their behavior when they know they are being watched.

2.Structured and Unstructured Observation

Structured Observation: It is the systematic and highly predetermined method of data collection. The main purpose of this observation is to quantify behavior. It dos not give the complete picture of the situation or behavior under study.

Unstructured Observation: It is the holistic way to observe and record behavior without the use of a pre-determined guide. It attempts to provide as complete and selective description as possible.

Advantages of Observation:

·        It is free from subjective biasness.

·        Data is not affected by past behavior or future intentions.

·        Natural behavior of the group can be observed.

Disadvantages of Observation:

·        It is expensive.

·        Obtained information is limited.

·        Unforeseen events may interfere observational task.

ii.Interview

It is the scientific investigation technique based on the process of verbal communication between two persons in order to collect information. Interview is a method of data collection that involves two groups of people, where the first group is the interviewer (the researcher(s) asking questions and collecting data) and the interviewee (the subject or respondent that is being asked questions). Interviews can be carried out in the following ways:

a. Direct Personal Interview:

Direct Personal Interview requires an interviewer or a group of interviewers to ask questions from the interviewee in a face to face fashion. 

It can be direct or indirect, structured or structure, focused or unfocused, etc. Some of the tools used in carrying out in-person interviews include a notepad or recording device to take note of the conversation—very important due to human forgetful nature. Non-verbal communication likes gestures and facial expressions give meaning to the respondent answer.

b.     Indirect Oral interview

In this method, the information is collected by interviewer from third person who is directly or indirectly concerned with the events known as witness. This method is used when the informants are hesitating to give the information directly. The information obtained from this method cannot be relied due to the absence of direct contact.

c.      Telephone interview

The interviewer contacts respondents by telephones. This method uses a structured interview schedule.

d.     Focus Group Interview

It generally involves 6-10 persons. The involved persons are brough together at one place to discuss the topic of interest. The inner feelings and emotional attitudes of the interviewees with respect to a given problem or situation are studied. The interviewer may does not interfere during the discussion and brings the discussion back to the main issues when it goes outside the theme of the discussion

Advantages of Interview:

·        More information can be obtained.

·        Sample can be controlled.

·        It has greater flexibility.

·        Personal information can also be obtained.

·        Mis-interpretation can be avoided by using unstructured way.

Disadvantages of Interview:

·        It is expensive.

·        There is chances of biasness of interviewer or respondent.

·        It is more time consuming.

·        There is more possibility of imaginary info and less frank responses.

·        It needs high skilled interviewer.

iii. Questionnaire

It is the formal list of the questions designed to gather responses from respondents on a given topic.  It is an efficient data collecting mechanism since the researcher knows exactly what is required and how to measure the variable of interest.  It involves the several steps including writing question items, organizing the question items on a questionnaire, administering the questionnaire and so on.

Characteristics of the Good Questionnaire

-It should be short and simple

-Questions should proceed in a logical sequence

-Technical terms and vague expressions must be avoided.

-Control questions to check the reliability of the respondent must be present.

-Brief directions with regard to filling up of questionnaire must be provided

-The physical appearances – quality of paper, colour etc must be good to attract the attention of the respondent

Types of Questions in Questionnaire:

a.       Open Questions

Open questions allow people to express what they think in their own words. Open-ended questions enable the respondent to answer in as much detail as they like in their own words. For example: “can you tell me how happy you feel right now?” Open questions are often used for complex questions that cannot be answered in a few simple categories but require more detail and discussion. Rich qualitative data is obtained as open questions allow the respondent to elaborate on their answer.

b.      Closed Questions

Closed questions structure the answer by only allowing responses which fit into pre-decided categories. Data that can be placed into a category is called nominal data. The category can be restricted to as few as two options, i.e., dichotomous (e.g., 'yes' or 'no,' 'male' or 'female'), or include quite complex lists of alternatives from which the respondent can choose (e.g., polytomous). Closed questions can also provide ordinal data (which can be ranked). This often involves using a continuous rating scale to measure the strength of attitudes or emotions. For example, strongly agree / agree / neutral / disagree / strongly disagree / unable to answer. It is cheap at cost.

 

Types of Questionnaire:

a.      Self Administered

In this method, the respondents usually complete self-administered questionnaires.

·        Online Questionnaire: It is done by sing the email, internet or the website.

·        Mail Questionnaire: It is done by posting the questionnaires to respondents who return them by post.

·        Delivery and Collection Questionnaire: It is done by hand to hand to each respondent and collecting later.

b.     Interviewer Administered Questionnaire

It is generally administered by the researcher him or her or by any other interviewer.

·        Telephone Questionnaire: The researcher contacts the respondents and administers questionnaires by using the telephone. The accurate information and response are essential conditions for a good telephone questionnaire. The respondents selected for the telephone questionnaire need to be informed before hand by email or telephone or fixing appointment about the study.

·        Interview Schedule: It is administered by the interviewer by physically meeting the respondent and asking the questions face to face. It uses schedule device which is the set of questions. It provides opportunity to the researcher to rapport with the respondents.

Advantages –

Free from bias of interviewer

Respondents have adequate time to give

Respondents have adequate time to give answers

Respondents are easily and conveniently approachable

Large samples can be used to be more reliable

Disadvantages–

Low rate of return of duly filled questionnaire

Control over questions is lost once it is sent

It is inflexible once sent

Possibility of ambiguous or omission of replies

  Time taking and slow process

 

b.Methods of Secondary Data Collection

A researcher can obtain secondary data from various sources. Secondary data may either be published data or unpublished data. Published data are available in :

 a. Publications of government

b. technical and trade journals

c. reports of various businesses, banks etc.

 d. public records

e. statistical or historical documents.

Unpublished data may be found in letters, diaries, unpublished biographies or work.

Before using secondary data, it must be checked for the following characteristics:

1. Reliability of data – Who collected the data? From what source? Which methods? Time? Possibility of bias? Accuracy?

2.Suitability of data – The object, scope and nature of the original enquiry must be studies and then carefully scrutinize the data for suitability.

 3.Adequacy – The data is considered inadequate if the level of accuracy achieved in data is found inadequate or if they are related to an area which may be either narrower or wider than the area of the present enquiry

Some parts are adopted from:

a.      https://bbamantra.com/methods-of-data-collection-primary-and-secondary-data/

b.     https://www.formpl.us/blog/primary-data

c.      https://www.simplypsychology.org/questionnaires.html

Stem and Leaf Display

 


It is the valuable tool for organizing the set of data and understanding the distribution of data in the data set. It separates the whole data into two parts: leading part and trailing part. A single data is used to define each leaf and if leaf is not shown, it is assumed to be one.

A good stem and leaf plot :

  • shows the first digits of the number (thousands, hundreds or tens) as the stem and shows the last digit (ones) as the leaf.
  • usually uses whole numbers. Anything that has a decimal point is rounded to the nearest whole number. For example, test results, speeds, heights, weights, etc.
  • looks like a bar graph when it is turned on its side.
  • shows how the data are spread—that is, highest number, lowest number, most common number and outliers (a number that lies outside the main group of numbers).

 

For Example:

56, 78, 82, 82, 90, 94, 93, 67, 67, 69, 74, 77, 92, 88, 81, 83, 84, 77, 72

Arranging the data in the ascending order:

56, 67, 67, 72, 74, 77, 77, 78, 81, 82, 82, 83, 84, 88, 90, 92, 93, 94



Box and Whisker Plot

 


It is the visual way to show the five number summary. It is the graphical representation of the data and displays a five number summary based on the minimum value, lower quartile, median, upper quartile and the maximum value. The left vertical line of the box is first quartile (Q1) and the right vertical line shows third quartile (Q3). The box contains middle 50% of the value. The lower 25% of the data is represented by the line .i.e. whisker connecting minimum value and first quartile (Q1). The upper 25% of the data are represented by the whisker connecting the largest value and the third quartile (Q3). If the middle vertical line of the box is near to the right vertical line of the box then the data is right skewed. If it is in the middle of the box then the data is symmetric.

Example 1: Draw a box-and-whisker plot for the data set {3, 7, 8, 5, 12, 14, 21, 13, 18}.

Minimum: 3, Q1 : 6, Median: 12, Q3 : 16, and Maximum: 21.



Example 2: Draw a box-and-whisker plot for the data set {3, 7, 8, 5, 12, 14, 21, 15, 18, 14}.

Minimum: 3, Q1: 7, Median: 13, Q3: 15, and Maximum: 21.



 

Sampling and it's types

 


Sampling

It is the method or process of data collection in which data is collected from the representative part of whole population. It is the selection of the sample from the whole population in order to estimate the characteristics of the population.

For eg:

a.      A cook can taste a spoon of rice or vegetable whether it is properly cooked or not.

b.     A pathologist or doctor examines a few drops of blood to draw the conclusion about the whole body.

c.      A businessman gives order for the commodities by examining only small sample of the same commodity.

Advantages of Sampling:

a.      The cost of sampling is minimum.

b.     It takes less time in collecting, editing, classification analysis and interpretation of data.

c.      More trained and skilled manpower can be used to collect accurate information.

d.     It is applicable in case of large size population.

e.      It is applicable if the elements need to be destroyed in case of testing.

Disadvantages of Sampling:

a.      Wrong and unreliable conclusion may be obtained.

b.     It cannot give accurate results if the sample survey is conducted by unskilled, untrained and illiterate person.

c.      It the population is too heterogeneous, it may be impossible to use the sampling technique.

d.     It may give wrong conclusion if the sample selected from the population is not the representative.

Methods of Sampling:

The important methods of sampling are given below:

a.      Probability Sampling

b.     Non-probability Sampling

A. Probability Sampling

Probability sampling is a sampling technique where a researcher sets a selection of a few criteria and chooses members of a population randomly. All the members have an equal opportunity to be a part of the sample. It is mainly used in quantitative research. If you want to produce results that are representative of the whole population, you need to use a probability sampling technique. In this method, units of the population are selected under the law of probability.

There are four main types of probability sample.

a.      Simple Random Sampling

One of the best probability sampling techniques that helps in saving time and resources, is the Simple Random Sampling method. It is the simplest and most common method of sampling. It is a reliable method of obtaining information where every single member of a population is chosen randomly, merely by chance. Each individual has the same probability of being chosen to be a part of a sample.
For example, in an schools of 500 students, if the teacher decides on conducting team building activities, it is highly likely that they would prefer picking chits out of a bowl. In this case, each of the 500 students has an equal opportunity of being selected.

b.     Systematic Random Sampling

Researchers use the systematic sampling method to choose the sample members of a population at regular intervals. This method is used when: i. Complete list of the population from which the sample drawn is available.

ii. Population is large, scattered and non-homogeneous

It requires the selection of a starting point for the sample and sample size that can be repeated at regular intervals. This type of sampling method has a predefined range, and hence this sampling technique is the least time-consuming.
For example, a researcher intends to collect a systematic sample of 500 people in a population of 5000. He/she numbers each element of the population from 1-5000 and will choose every 10th individual to be a part of the sample (Total population/ Sample Size = 5000/500 = 10).

c.      Stratified Random Sampling

 Stratified random sampling is a method in which the researcher divides the population into smaller groups that don’t overlap but represent the entire population. While sampling, these groups can be organized and then draw a sample from each group separately. It is used in heterogeneous population. In this method, the population is first divided into subgroups (or strata) who all share a similar characteristic. It is used when we might reasonably expect the measurement of interest to vary between the different subgroups, and we want to ensure representation from all the subgroups. It improves the accuracy and representativeness of the results by reducing sampling bias. However, it requires knowledge of the appropriate characteristics of the sampling frame (the details of which are not always available), and it can be difficult to decide which characteristic(s) to stratify by.
For example, a researcher looking to analyze the characteristics of people belonging to different annual income divisions will create strata (groups) according to the annual family income. Eg – less than $20,000, $21,000 – $30,000, $31,000 to $40,000, $41,000 to $50,000, etc. By doing this, the researcher concludes the characteristics of people belonging to different income groups. Marketers can analyze which income groups to target and which ones to eliminate to create a roadmap that would bear fruitful results.

d. Cluster Sampling

 Cluster sampling is a method where the researchers divide the entire population into sections or clusters that represent a population. Clusters are identified and included in a sample based on demographic parameters like age, sex, location, etc. This makes it very simple for a survey creator to derive effective inference from the feedback. Cluster sampling can be more efficient that simple random sampling, especially where a study takes place over a wide geographical region. 

 

B. Non-Probability Sampling

 In non-probability sampling, the researcher chooses members for research at random. This sampling method is not a fixed or predefined selection process. This makes it difficult for all elements of a population to have equal opportunities to be included in a sample. The units of the population are not selected under the rule of probability. Non-probability sampling techniques are often appropriate for exploratory and qualitative research. In these types of research, the aim is not to test a hypothesis about a broad population, but to develop an initial understanding of a small or under-researched population.

a.      Convenience Sampling

A convenience sample simply includes the individuals who happen to be most accessible to the researcher. The investigator selects the sample elements on the basis of his or her convenience. It is also known as accidental sampling because sample is chose accidentally. The investigator choses the closest person as respondents. It is not a scientific plan ans also does not have any definite plan. The selection is totally biased.

b.     Purposive or Judgement Sampling

Also known as selective, or subjective, sampling, this technique relies on the judgement of the researcher when choosing who to ask to participate. Researchers may implicitly thus choose a “representative” sample to suit their needs, or specifically approach individuals with certain characteristics. This approach is often used by the media when canvassing the public for opinions and in qualitative research. It is often used in qualitative research, where the researcher wants to gain detailed knowledge about a specific phenomenon rather than make statistical inferences. It is useful for situations where we need to reach a targeted sample quickly and proportional sampling is not a primary concern.

Judgement sampling has the advantage of being time-and cost-effective to perform whilst resulting in a range of responses (particularly useful in qualitative research). However, in addition to volunteer bias, it is also prone to errors of judgement by the researcher and the findings, whilst being potentially broad, will not necessarily be representative.

c.      Quota Sampling

In this method, sample is selected according to some fixed quota. It is similar to stratified random sampling but sample items are chosen accidentally not randomly. This method of sampling is often used by market researchers. Interviewers are given a quota of subjects of a specified type to attempt to recruit. For example, an interviewer might be told to go out and select 20 adult men, 20 adult women, 10 teenage girls and 10 teenage boys so that they could interview them about their television viewing. Ideally the quotas chosen would proportionally represent the characteristics of the underlying population.

d. Snowball Sampling

This method is commonly used in social sciences when investigating hard-to-reach groups. Existing subjects are asked to nominate further subjects known to them, so the sample increases in size like a rolling snowball. For example, when carrying out a survey of risk behaviours amongst intravenous drug users, participants may be asked to nominate other users to be interviewed.

Snowball sampling can be effective when a sampling frame is difficult to identify. However, by selecting friends and acquaintances of subjects already investigated, there is a significant risk of selection bias (choosing a large number of people with similar characteristics or views to the initial individual identified).
 

Adopted from:

a.    https://www.scribbr.com/methodology/sampling-methods/

b.    https://www.questionpro.com/blog/types-of-sampling-for-social-research/

c.    https://www.healthknowledge.org.uk/public-health-textbook/research-methods/1a-epidemiology/methods-of-sampling-population