STATISTICIAN II
INTERVIEW PREPARATION
1.
What is the mean of 2, 4, 6, 8?
A.
4
B.
5
C.
6
D.
7 → B
2.
Which measure is most affected by
outliers?
A.
Median
B.
Mode
C.
Mean
D.
Range → C
3.
The median of an odd-numbered dataset
is:
A.
Average of all values
B.
Middle value
C.
Most frequent value
D.
Smallest value → B
4.
Standard deviation measures:
A.
Central tendency
B.
Dispersion
C.
Skewness
D.
Probability → B
5.
Which distribution is symmetric?
A.
Normal
B.
Exponential
C.
Poisson
D.
Binomial (skewed) → A
6.
Mode is:
A.
Average
B.
Most frequent value
C.
Middle value
D.
Range → B
7.
Variance is the square of:
A.
Mean
B.
Median
C.
Standard deviation
D.
Mode → C
8.
Range is calculated as:
A.
Max − Min
B.
Mean − Median
C.
Median − Mode
D.
Sum of values → A
9.
A dataset with no variability has
standard deviation:
A.
0
B.
1
C.
∞
D.
-1 → A
10.
Skewness measures:
A.
Spread
B.
Shape of distribution
C.
Center
D.
Size → B
11.
11–20: Probability
12.
Probability values lie between:
A.
-1 and 1
B.
0 and 1
C.
1 and 10
D.
0 and 10 → B
13.
Probability of a sure event is:
A.
0
B.
0.5
C.
1
D.
2 → C
14.
If events A and B are independent:
A.
P(A∩B)=P(A)+P(B)
B.
P(A∩B)=P(A)P(B)
C.
P(A|B)=0
D.
P(A)=P(B) → B
15.
Complement of event A is:
A.
A
B.
1 − P(A)
C.
P(A)²
D.
0 → B
16.
Conditional probability is:
A.
P(A)
B.
P(A|B)
C.
P(B|A)
D.
Both B and C → D
17.
Binomial distribution requires:
A.
Continuous data
B.
Fixed trials
C.
Infinite trials
D.
Negative values → B
18.
Expected value of a fair die:
A.
2.5
B.
3
C.
3.5
D.
4 → C
19.
Law of large numbers states:
A.
Sample mean → population mean
B.
Sample decreases
C.
Variance increases
D.
Data disappears → A
20.
Random variable can be:
A.
Only discrete
B.
Only continuous
C.
Both
D.
None → C
21.
Poisson distribution models:
A.
Continuous data
B.
Rare events
C.
Large values
D.
Negative values → B
22.
21–30: Inferential Statistics
23.
Null hypothesis represents:
A.
Alternative claim
B.
No effect
C.
Strong effect
D.
Prediction → B
24.
p-value measures:
A.
Mean
B.
Evidence against H₀
C.
Sample size
D.
Variance → B
25.
If p-value < 0.05:
A.
Accept H₀
B.
Reject H₀
C.
Ignore
D.
Increase sample → B
26.
Type I error is:
A.
False negative
B.
False positive
C.
True positive
D.
True negative → B
27.
Type II error is:
A.
Reject true H₀
B.
Accept false H₀
C.
Correct decision
D.
None → B
28.
Confidence interval provides:
A.
Exact value
B.
Range estimate
C.
Mean only
D.
Error only → B
29.
Larger sample size leads to:
A.
Larger error
B.
Smaller error
C.
No change
D.
Infinite error → B
30.
t-test is used when:
A.
Large sample
B.
Unknown variance
C.
Known variance
D.
Infinite sample → B
31.
Z-test requires:
A.
Small sample
B.
Known variance
C.
Unknown variance
D.
No data → B
32.
ANOVA compares:
A.
Two means
B.
Multiple means
C.
Variance only
D.
Probabilities → B
33.
31–40: Regression & Data Analysis
34.
Regression analysis studies:
A.
Distribution
B.
Relationship between variables
C.
Mean
D.
Variance → B
35.
Dependent variable is:
A.
Output
B.
Input
C.
Constant
D.
Random → A
36.
Independent variable is:
A.
Output
B.
Predictor
C.
Result
D.
Error → B
37.
R² measures:
A.
Error
B.
Fit of model
C.
Mean
D.
Probability → B
38.
Correlation ranges between:
A.
0 and 1
B.
-1 and 1
C.
1 and 10
D.
-10 and 10 → B
39.
Perfect positive correlation is:
A.
0
B.
-1
C.
1
D.
2 → C
40.
Multicollinearity affects:
A.
Accuracy
B.
Predictor independence
C.
Mean
D.
Variance → B
41.
Residual is:
A.
Observed − Predicted
B.
Predicted − Observed
C.
Mean − Median
D.
Max − Min → A
42.
Linear regression assumes:
A.
Non-linearity
B.
Linearity
C.
Randomness only
D.
Discrete values → B
43.
Overfitting occurs when:
A.
Model too simple
B.
Model too complex
C.
No data
D.
Small variance → B
44.
41–50: Scenario & Practical
Questions
45.
Data has extreme outliers. Best measure?
A.
Mean
B.
Median
C.
Variance
D.
Range → B
46.
Small sample size analysis uses:
A.
Z-test
B.
t-test
C.
ANOVA
D.
Regression → B
47.
Comparing 3 groups’ means:
A.
t-test
B.
Z-test
C.
ANOVA
D.
Chi-square → C
48.
Categorical data test:
A.
t-test
B.
Z-test
C.
Chi-square
D.
Regression → C
49.
Checking model accuracy:
A.
R²
B.
Mean
C.
Mode
D.
Range → A
50.
Missing data handling:
A.
Ignore
B.
Impute
C.
Delete always
D.
Random guess → B
51.
Time series data requires:
A.
Regression
B.
Trend analysis
C.
ANOVA
D.
Chi-square → B
52.
Sampling bias affects:
A.
Accuracy
B.
Mean only
C.
Variance only
D.
Nothing → A
53.
Large variance indicates:
A.
Low spread
B.
High spread
C.
No spread
D.
Constant data → B
54.
Best visualization for distribution:
A.
Pie chart
B.
Histogram
C.
Table
D.
Text → B
55.
What does a histogram display?
A.
Relationship between variables
B.
Frequency distribution
C.
Correlation
D.
Regression → B
56.
A scatter plot is used to show:
A.
Distribution
B.
Relationship between two variables
C.
Frequency
D.
Mean → B
57.
If two variables are perfectly
negatively correlated, r =
A.
1
B.
0
C.
-1
D.
0.5 → C
58.
Sampling error occurs due to:
A.
Bias
B.
Random variation
C.
Calculation mistake
D.
Data entry error → B
59.
Non-sampling error includes:
A.
Random error
B.
Sampling fluctuation
C.
Measurement error
D.
Sample size → C
60.
A census studies:
A.
Sample
B.
Population
C.
Subset
D.
Variable → B
61.
Stratified sampling divides population
into:
A.
Equal parts
B.
Random parts
C.
Homogeneous groups
D.
Heterogeneous groups → C
62.
Cluster sampling selects:
A.
Individuals
B.
Groups
C.
Variables
D.
Means → B
63.
Systematic sampling selects every:
A.
Random item
B.
nth item
C.
First item
D.
Last item → B
64.
A parameter is usually denoted by:
A.
Greek letters
B.
Numbers
C.
Roman letters
D.
Symbols only → A
65.
A statistic is usually denoted by:
A.
Greek letters
B.
Roman letters
C.
Symbols only
D.
Numbers only → B
66.
Degrees of freedom for variance:
A.
n
B.
n − 1
C.
n + 1
D.
n − 2 → B
67.
A two-tailed test checks:
A.
One direction
B.
Both directions
C.
No direction
D.
Mean only → B
68.
A one-tailed test checks:
A.
Both sides
B.
One direction
C.
Mean only
D.
Variance only → B
69.
Significance level is denoted by:
A.
β
B.
α
C.
μ
D.
σ → B
70.
If α = 0.01, confidence level is:
A.
90%
B.
95%
C.
99%
D.
100% → C
71.
Normal distribution has mean = median =
A.
Mode
B.
Variance
C.
Range
D.
Skewness → A
72.
In a normal distribution, 68% data lies
within:
A.
1 SD
B.
2 SD
C.
3 SD
D.
4 SD → A
73.
In a normal distribution, 95% lies
within:
A.
1 SD
B.
2 SD
C.
3 SD
D.
4 SD → B
74.
Z-score measures:
A.
Raw value
B.
Standardized value
C.
Mean
D.
Variance → B
75.
Z-score formula includes:
A.
Mean and variance
B.
Mean and standard deviation
C.
Median and mode
D.
Range and IQR → B
76.
If Z = 0, value equals:
A.
Mean
B.
Median
C.
Mode
D.
Range → A
77.
Sampling distribution refers to:
A.
Population data
B.
Distribution of sample statistic
C.
Raw data
D.
Mean only → B
78.
A large sample size reduces:
A.
Bias
B.
Standard error
C.
Mean
D.
Variance → B
79.
A biased estimator:
A.
Equals parameter
B.
Deviates systematically
C.
Is always correct
D.
Has zero variance → B
80.
Efficiency of estimator relates to:
A.
Bias
B.
Variance
C.
Mean
D.
Sample size → B
81.
Consistency means estimator:
A.
Changes
B.
Converges to true value
C.
Is biased
D.
Is random → B
82.
A likelihood ratio test compares:
A.
Means
B.
Models
C.
Variances
D.
Probabilities → B
83.
p-hacking refers to:
A.
Correct testing
B.
Manipulating results
C.
Data cleaning
D.
Sampling → B
84.
A control group is used to:
A.
Compare results
B.
Increase bias
C.
Reduce data
D.
Ignore treatment → A
85.
Experimental design aims to:
A.
Increase bias
B.
Reduce bias
C.
Ignore variables
D.
Increase variance → B
86.
Randomization helps:
A.
Increase error
B.
Reduce bias
C.
Increase bias
D.
Ignore data → B
87.
Blocking is used to:
A.
Ignore variables
B.
Control variability
C.
Increase randomness
D.
Reduce sample → B
88.
Factorial design studies:
A.
One factor
B.
Multiple factors
C.
No factors
D.
Random data → B
89.
Interaction effect means:
A.
No effect
B.
Combined effect
C.
Single effect
D.
Random effect → B
90.
Residual plot helps detect:
A.
Mean
B.
Model assumptions
C.
Variance only
D.
Sample size → B
91.
Leverage points influence:
A.
Mean
B.
Regression line
C.
Variance
D.
Median → B
92.
Influential points affect:
A.
Model strongly
B.
Mean only
C.
Variance only
D.
Nothing → A
93.
Data transformation helps:
A.
Normalize data
B.
Increase bias
C.
Reduce sample
D.
Ignore outliers → A
94.
Log transformation is used for:
A.
Skewed data
B.
Normal data
C.
Small data
D.
Categorical data → A
95.
Scaling data helps:
A.
Model performance
B.
Increase error
C.
Reduce variables
D.
Ignore data → A
96.
Standardization results in:
A.
Mean 1, SD 0
B.
Mean 0, SD 1
C.
Mean 1, SD 1
D.
Mean 0, SD 0 → B
97.
Normalization scales data between:
A.
-1 to 1
B.
0 to 1
C.
1 to 10
D.
-10 to 10 → B
98.
PCA reduces:
A.
Sample size
B.
Dimensions
C.
Mean
D.
Variance → B
99.
Eigenvalues measure:
A.
Variance explained
B.
Mean
C.
Probability
D.
Error → A
100.
Eigenvectors represent:
A.
Direction
B.
Magnitude
C.
Mean
D.
Variance → A
101.
K-means clustering requires:
A.
Labels
B.
No labels
C.
Mean only
D.
Variance only → B
102.
Number of clusters in K-means is:
A.
Fixed
B.
Unknown
C.
Predefined
D.
Infinite → C
103.
Elbow method is used to:
A.
Reduce bias
B.
Choose clusters
C.
Increase data
D.
Normalize data → B
104.
Silhouette score measures:
A.
Accuracy
B.
Cluster quality
C.
Mean
D.
Variance → B
105.
What is the harmonic mean mainly used
for?
A.
Averaging ratios
B.
Averaging sums
C.
Categorical data
D.
Large samples
106.
Answer: A
107.
If all values in a dataset are equal,
skewness is:
A.
Positive
B.
Negative
C.
Zero
D.
Infinite
108.
Answer: C
109.
A platykurtic distribution is:
A.
Highly peaked
B.
Flat
C.
Skewed
D.
Symmetric
110.
Answer: B
111.
A leptokurtic distribution is:
A.
Flat
B.
Highly peaked
C.
Uniform
D.
Random
112.
Answer: B
113.
In simple linear regression, the slope
represents:
A.
Intercept
B.
Change in Y per unit X
C.
Error
D.
Variance
114.
Answer: B
115.
Intercept in regression is value of Y
when:
A.
X = 1
B.
X = 0
C.
X = mean
D.
X = variance
116.
Answer: B
117.
Residuals should ideally be:
A.
Patterned
B.
Random
C.
Increasing
D.
Decreasing
118.
Answer: B
119.
Durbin-Watson test detects:
A.
Multicollinearity
B.
Autocorrelation
C.
Normality
D.
Heteroscedasticity
120.
Answer: B
121.
Variance Inflation Factor (VIF) detects:
A.
Autocorrelation
B.
Multicollinearity
C.
Normality
D.
Skewness
122.
Answer: B
123.
If VIF is high, it indicates:
A.
Good model
B.
Multicollinearity
C.
Low variance
D.
Independence
124.
Answer: B
125.
A confounding variable:
A.
Has no effect
B.
Distorts relationship
C.
Is dependent variable
D.
Is constant
126.
Answer: B
127.
Endogeneity refers to:
A.
External variables
B.
Internal bias
C.
Random error
D.
Constant data
128.
Answer: B
129.
Panel data combines:
A.
Cross-sectional only
B.
Time series only
C.
Both cross-sectional & time series
D.
None
130.
Answer: C
131.
Fixed effects model controls for:
A.
Time variation
B.
Individual differences
C.
Mean only
D.
Variance only
132.
Answer: B
133.
Random effects model assumes:
A.
No variation
B.
Random variation
C.
Fixed variation
D.
No correlation
134.
Answer: B
135.
Akaike Information Criterion (AIC)
favors:
A.
Simpler models
B.
Complex models
C.
Balanced fit
D.
Random models
136.
Answer: C
137.
Bayesian Information Criterion (BIC)
penalizes complexity:
A.
Less
B.
More
C.
Equal
D.
None
138.
Answer: B
139.
A dummy variable takes values:
A.
0 or 1
B.
1 or 2
C.
-1 or 1
D.
Any number
140.
Answer: A
141.
Interaction term in regression captures:
A.
Independent effect
B.
Combined effect
C.
Error
D.
Mean
142.
Answer: B
143.
Elasticity measures:
A.
Absolute change
B.
Relative change
C.
Mean
D.
Variance
144.
Answer: B
145.
In hypothesis testing, β represents:
A.
Type I error
B.
Type II error
C.
Mean
D.
Variance
146.
Answer: B
147.
Power increases when:
A.
Sample size increases
B.
Sample size decreases
C.
Variance increases
D.
Bias increases
148.
Answer: A
149.
A non-parametric test does not assume:
A.
Mean
B.
Distribution
C.
Variance
D.
Data
150.
Answer: B
151.
Mann-Whitney test compares:
A.
Means
B.
Medians
C.
Variances
D.
Probabilities
152.
Answer: B
153.
Wilcoxon test is used for:
A.
Paired data
B.
Independent data
C.
Random data
D.
Large samples
154.
Answer: A
155.
Kruskal-Wallis test compares:
A.
Two groups
B.
Multiple groups
C.
One group
D.
Variance only
156.
Answer: B
157.
Spearman correlation measures:
A.
Linear relation
B.
Rank relation
C.
Variance
D.
Mean
158.
Answer: B
159.
Pearson correlation measures:
A.
Rank relation
B.
Linear relation
C.
Nonlinear relation
D.
Variance
160.
Answer: B
161.
A p-value close to 0 indicates:
A.
Weak evidence
B.
Strong evidence
C.
No evidence
D.
Infinite evidence
162.
Answer: B
163.
Bonferroni correction is used for:
A.
Multiple testing
B.
Sampling
C.
Regression
D.
Clustering
164.
Answer: A
165.
Missing completely at random (MCAR)
means:
A.
Depends on data
B.
Independent of data
C.
Depends on outcome
D.
Systematic
166.
Answer: B
167.
Missing at random (MAR) depends on:
A.
Observed data
B.
Unobserved data
C.
None
D.
Random guess
168.
Answer: A
169.
Not missing at random (NMAR) depends on:
A.
Observed only
B.
Unobserved
C.
Mean
D.
Variance
170.
Answer: B
171.
Imputation replaces:
A.
Outliers
B.
Missing values
C.
Means
D.
Variance
172.
Answer: B
173.
Mean imputation may:
A.
Increase variance
B.
Reduce variance
C.
Increase bias
D.
Reduce bias
174.
Answer: B
175.
Weighted mean assigns:
A.
Equal weights
B.
Different weights
C.
No weights
D.
Random weights
176.
Answer: B
177.
Survey weights correct:
A.
Bias
B.
Mean
C.
Variance
D.
Range
178.
Answer: A
179.
Design effect measures:
A.
Efficiency of design
B.
Mean
C.
Variance
D.
Sample size
180.
Answer: A
181.
ROC curve plots:
A.
Precision vs Recall
B.
TPR vs FPR
C.
Mean vs variance
D.
Error vs accuracy
182.
Answer: B
183.
AUC measures:
A.
Accuracy
B.
Model performance
C.
Mean
D.
Variance
184.
Answer: B
185.
Sensitivity is:
A.
True negative rate
B.
True positive rate
C.
False positive rate
D.
Error rate
186.
Answer: B
187.
Specificity is:
A.
True negative rate
B.
True positive rate
C.
Error rate
D.
Mean
188.
Answer: A
189.
False positive rate equals:
A.
1 − specificity
B.
Specificity
C.
Sensitivity
D.
Accuracy
190.
Answer: A
191.
Data leakage occurs when:
A.
Training uses future info
B.
Testing uses past info
C.
Data is missing
D.
Data is clean
192.
Answer: A
193.
Train-test split is used to:
A.
Increase bias
B.
Evaluate model
C.
Reduce data
D.
Normalize data
194.
Answer: B
195.
Underfitting occurs when:
A.
Model too simple
B.
Model too complex
C.
Too much data
D.
No data
196.
Answer: A
197.
Bias-variance tradeoff balances:
A.
Accuracy & error
B.
Bias & variance
C.
Mean & median
D.
Range & IQR
198.
Answer: B
199.
Gradient descent is used to:
A.
Maximize error
B.
Minimize loss
C.
Increase bias
D.
Reduce sample
200.
Answer: B
201.
Loss function measures:
A.
Accuracy
B.
Error
C.
Mean
D.
Variance
202.
Answer: B
203.
Regularization helps:
A.
Prevent overfitting
B.
Increase error
C.
Reduce data
D.
Ignore variables
204.
Answer: A
205.
What is the main purpose of descriptive
statistics?
A.
Make predictions
B.
Summarize data
C.
Test hypotheses
D.
Build models → B
206.
Inferential statistics is used to:
A.
Summarize data
B.
Describe sample
C.
Draw conclusions about population
D.
Organize data → C
207.
A frequency polygon is used to:
A.
Show relationship
B.
Show distribution
C.
Show mean
D.
Show variance → B
208.
Bar charts are best for:
A.
Continuous data
B.
Categorical data
C.
Time series
D.
Correlation → B
209.
Pie charts show:
A.
Trends
B.
Proportions
C.
Variance
D.
Mean → B
210.
Stem-and-leaf plot shows:
A.
Mean
B.
Raw data distribution
C.
Correlation
D.
Regression → B
211.
A bimodal distribution has:
A.
One mode
B.
Two modes
C.
No mode
D.
Many modes → B
212.
A uniform distribution has:
A.
Equal frequencies
B.
Skewness
C.
High variance
D.
Low variance → A
213.
A right-skewed distribution has:
A.
Tail on left
B.
Tail on right
C.
No tail
D.
Symmetric → B
214.
A left-skewed distribution has:
A.
Tail on left
B.
Tail on right
C.
Symmetric
D.
No tail → A
215.
Mean deviation is taken from:
A.
Mean or median
B.
Mode only
C.
Range
D.
Variance → A
216.
Quartile deviation equals:
A.
Q3 − Q1
B.
(Q3 − Q1)/2
C.
Q1 − Q3
D.
Mean − median → B
217.
Standard error decreases when:
A.
Sample size increases
B.
Sample size decreases
C.
Variance increases
D.
Mean increases → A
218.
A large p-value suggests:
A.
Reject H₀
B.
Weak evidence
C.
Strong evidence
D.
Significant result → B
219.
Hypothesis testing begins with:
A.
Data collection
B.
Null hypothesis
C.
Conclusion
D.
Graph → B
220.
A parameter is fixed but:
A.
Known
B.
Unknown
C.
Random
D.
Variable → B
221.
A statistic is:
A.
Fixed
B.
Known
C.
Random
D.
Constant → C
222.
Sampling distribution of mean is:
A.
Always normal
B.
Approximately normal
C.
Uniform
D.
Skewed → B
223.
Standard normal distribution has mean:
A.
1
B.
0
C.
-1
D.
100 → B
224.
Standard normal distribution has
variance:
A.
0
B.
1
C.
2
D.
10 → B
225.
Z-table gives:
A.
Mean
B.
Probability
C.
Variance
D.
Mode → B
226.
t-distribution is used when:
A.
Large sample
B.
Small sample
C.
Infinite sample
D.
No sample → B
227.
t-distribution approaches normal when:
A.
Sample decreases
B.
Sample increases
C.
Variance decreases
D.
Mean increases → B
228.
Chi-square distribution is used for:
A.
Means
B.
Variances
C.
Categorical data
D.
Correlation → C
229.
F-distribution is used in:
A.
Regression
B.
ANOVA
C.
Probability
D.
Mean → B
230.
ANOVA tests equality of:
A.
Variances
B.
Means
C.
Probabilities
D.
Medians → B
231.
Degrees of freedom in ANOVA depend on:
A.
Mean
B.
Sample size
C.
Groups
D.
Both B and C → D
232.
Residual sum of squares measures:
A.
Explained variation
B.
Unexplained variation
C.
Total variation
D.
Mean → B
233.
Total sum of squares equals:
A.
Explained + residual
B.
Mean + variance
C.
Mode + median
D.
Range + IQR → A
234.
Regression coefficient shows:
A.
Strength
B.
Direction
C.
Change
D.
All of the above → D
235.
Perfect fit means R² equals:
A.
0
B.
0.5
C.
1
D.
-1 → C
236.
If R² = 0, model explains:
A.
All variation
B.
No variation
C.
Half variation
D.
Negative variation → B
237.
Multicollinearity increases:
A.
Accuracy
B.
Variance of coefficients
C.
Mean
D.
Sample size → B
238.
Ridge regression adds:
A.
L1 penalty
B.
L2 penalty
C.
No penalty
D.
Random penalty → B
239.
Lasso regression adds:
A.
L1 penalty
B.
L2 penalty
C.
No penalty
D.
Random penalty → A
240.
Overfitting leads to:
A.
Poor training
B.
Poor generalization
C.
High bias
D.
Low variance → B
241.
Underfitting leads to:
A.
High bias
B.
High variance
C.
Perfect fit
D.
Low bias → A
242.
Bias is:
A.
Error from assumptions
B.
Random error
C.
Sampling error
D.
Measurement error → A
243.
Variance is:
A.
Error variability
B.
Mean
C.
Mode
D.
Range → A
244.
Bootstrap method uses:
A.
Replacement
B.
No replacement
C.
Fixed data
D.
Random guess → A
245.
Jackknife method uses:
A.
All data
B.
Leave-one-out
C.
Sampling
D.
Mean → B
246.
Cross-validation splits data into:
A.
One set
B.
Two or more sets
C.
No sets
D.
Infinite sets → B
247.
K-fold cross-validation uses:
A.
One fold
B.
K partitions
C.
Two partitions
D.
Infinite folds → B
248.
Confusion matrix evaluates:
A.
Regression
B.
Classification
C.
Sampling
D.
Mean → B
249.
Accuracy measures:
A.
Correct predictions
B.
Errors
C.
Variance
D.
Mean → A
250.
Precision focuses on:
A.
True positives
B.
True negatives
C.
Errors
D.
Mean → A
251.
Recall focuses on:
A.
True positives
B.
True negatives
C.
Errors
D.
Variance → A
252.
F1 score balances:
A.
Accuracy
B.
Precision & recall
C.
Mean
D.
Variance → B
253.
KNN is a:
A.
Regression model
B.
Classification method
C.
Clustering method
D.
Sampling method → B
254.
Decision tree is used for:
A.
Classification & regression
B.
Mean only
C.
Variance only
D.
Sampling → A
255.
Central Limit Theorem states that the
sampling distribution of the mean is approximately normal if:
A.
Sample size is small
B.
Sample size is large
C.
Population is skewed
D.
Population is uniform → B
256.
Law of Large Numbers states that:
A.
Sample mean approaches population mean
as sample size increases
B.
Sample mean decreases with sample size
C.
Variance increases with sample size
D.
Mean is always equal to median → A
257.
A Type I error occurs when:
A.
True null hypothesis is rejected
B.
False null hypothesis is accepted
C.
True null hypothesis is accepted
D.
False null hypothesis is rejected → A
258.
A Type II error occurs when:
A.
True null hypothesis is rejected
B.
False null hypothesis is accepted
C.
True null hypothesis is accepted
D.
False null hypothesis is rejected → B
259.
Confidence interval gives:
A.
Exact value of parameter
B.
Range of plausible values
C.
Variance only
D.
Mean only → B
260.
95% confidence level means:
A.
95% chance population mean is in
interval
B.
5% chance population mean is in interval
C.
Mean = 0.95
D.
Variance = 0.95 → A
261.
Margin of error depends on:
A.
Sample size
B.
Confidence level
C.
Standard deviation
D.
All of the above → D
262.
Standard error is:
A.
Standard deviation of population
B.
Standard deviation of sample mean
C.
Mean of sample
D.
Range → B
263.
Normal approximation to binomial works
if:
A.
n is large and p not too close to 0 or 1
B.
n is small
C.
p = 0
D.
p = 1 → A
264.
Poisson distribution approximates
binomial if:
A.
n is large, p small
B.
n is small, p large
C.
n and p are large
D.
n and p small → A
265.
Chi-square test is used for:
A.
Mean comparison
B.
Categorical data
C.
Regression
D.
Correlation → B
266.
Degrees of freedom for chi-square =
A.
(r + c −1)
B.
(r −1)(c −1)
C.
r × c
D.
r − c → B
267.
Goodness-of-fit test checks:
A.
Regression fit
B.
Observed vs expected frequencies
C.
Mean comparison
D.
Variance equality → B
268.
Homoscedasticity means:
A.
Equal variances across groups
B.
Unequal variances
C.
Mean = median
D.
Skewness = 0 → A
269.
Heteroscedasticity means:
A.
Equal variances
B.
Unequal variances
C.
Constant error
D.
Normality → B
270.
Bartlett’s test checks:
A.
Normality
B.
Equality of variances
C.
Mean equality
D.
Skewness → B
271.
Levene’s test is used for:
A.
Equality of means
B.
Equality of variances
C.
Regression coefficients
D.
Correlation → B
272.
Kolmogorov-Smirnov test checks:
A.
Variance
B.
Normality
C.
Correlation
D.
Regression → B
273.
Shapiro-Wilk test checks:
A.
Variance
B.
Normality
C.
Mean equality
D.
Skewness → B
274.
Q-Q plot helps assess:
A.
Variance
B.
Normality
C.
Correlation
D.
Regression → B
275.
Boxplot identifies:
A.
Mean
B.
Outliers
C.
Correlation
D.
Regression → B
276.
Interquartile range (IQR) =
A.
Q3 − Q1
B.
Q1 − Q3
C.
Max − Min
D.
Median → A
277.
Standard score (z) formula:
A.
(x − μ)/σ
B.
(μ − x)/σ
C.
x/σ
D.
x − μ → A
278.
Chebyshev’s inequality applies to:
A.
Any distribution
B.
Normal only
C.
Skewed only
D.
Uniform only → A
279.
Empirical rule applies to:
A.
Any distribution
B.
Normal distribution
C.
Uniform
D.
Skewed → B
280.
In hypothesis testing, power =
A.
1 − α
B.
1 − β
C.
α + β
D.
α × Î² → B
281.
ANOVA F-statistic =
A.
Variance between groups / variance
within groups
B.
Mean between groups / mean within groups
C.
Sum of squares / mean
D.
Explained / total variance → A
282.
One-way ANOVA compares:
A.
Two means
B.
Multiple means
C.
Variances only
D.
Regression coefficients → B
283.
Two-way ANOVA includes:
A.
One factor
B.
Two factors
C.
Multiple regression
D.
Correlation → B
284.
Post hoc tests are used after:
A.
Significant ANOVA
B.
Non-significant ANOVA
C.
Regression
D.
Chi-square → A
285.
Tukey’s test controls:
A.
Type I error
B.
Type II error
C.
Variance
D.
Mean → A
286.
Bonferroni correction controls:
A.
Type I error in multiple comparisons
B.
Type II error
C.
Regression error
D.
Correlation error → A
287.
Non-parametric tests are used when:
A.
Normality assumption fails
B.
Sample is large
C.
Population is known
D.
Regression needed → A
288.
Wilcoxon signed-rank test is for:
A.
Paired data
B.
Independent data
C.
Categorical data
D.
Multiple groups → A
289.
Mann-Whitney U test is for:
A.
Paired data
B.
Independent two samples
C.
Categorical data
D.
Multiple groups → B
290.
Kruskal-Wallis test is for:
A.
Two groups
B.
Multiple groups
C.
Paired data
D.
Categorical data → B
291.
Friedman test is for:
A.
One-way repeated measures
B.
Two-way repeated measures
C.
Regression
D.
Correlation → A
292.
Spearman rank correlation uses:
A.
Raw values
B.
Ranks
C.
Variances
D.
Means → B
293.
Kendall’s tau measures:
A.
Linear correlation
B.
Rank correlation
C.
Regression slope
D.
Variance → B
294.
Regression diagnostics detect:
A.
Outliers
B.
Leverage points
C.
Influential points
D.
All of the above → D
295.
Cook’s distance measures:
A.
Variance
B.
Influence of observation
C.
Mean
D.
Skewness → B
296.
Leverage points affect:
A.
Regression line slope
B.
Variance
C.
Mean only
D.
Mode only → A
297.
Influential points combine:
A.
Outlier + leverage
B.
Mean + variance
C.
Skewness + kurtosis
D.
Range + median → A
298.
Multivariate analysis deals with:
A.
One variable
B.
Two or more variables
C.
Means only
D.
Variances only → B
299.
Principal Component Analysis (PCA)
reduces:
A.
Variance
B.
Dimensions
C.
Mean
D.
Skewness → B
300.
Factor analysis identifies:
A.
Observed variables
B.
Latent factors
C.
Mean
D.
Variance → B
301.
Cluster analysis groups:
A.
Variables
B.
Observations
C.
Mean only
D.
Variance only → B
302.
Hierarchical clustering can be:
A.
Agglomerative
B.
Divisive
C.
Both
D.
None → C
303.
K-means clustering minimizes:
A.
Between-cluster distance
B.
Within-cluster distance
C.
Mean
D.
Variance → B
304.
Silhouette score evaluates:
A.
Regression
B.
Classification
C.
Clustering quality
D.
Correlation → C
305.
Bayesian statistics updates:
A.
Prior probability using data → B
B.
Mean only
C.
Variance only
D.
Mode only
306.
Posterior probability combines:
A.
Likelihood × Prior → A
B.
Mean + Variance
C.
Standard deviation only
D.
Median only
307.
Likelihood function depends on:
A.
Parameter values → A
B.
Sample size only
C.
Variance only
D.
Mean only
308.
Maximum a posteriori (MAP) estimation
maximizes:
A.
Likelihood
B.
Posterior → B
C.
Mean
D.
Variance
309.
Prior distribution can be:
A.
Informative → A
B.
Non-informative → B
C.
Both
D.
Neither
310.
Conjugate prior ensures:
A.
Posterior in same family as prior → A
B.
Posterior is uniform
C.
Posterior is normal
D.
Posterior is variance only
311.
Poisson process models:
A.
Continuous time events → A
B.
Categorical data
C.
Mean only
D.
Variance only
312.
Exponential interarrival times are:
A.
Memoryless → A
B.
Dependent
C.
Correlated
D.
Uniform
313.
Markov process satisfies:
A.
Future depends only on present → A
B.
Future depends on past
C.
Mean = median
D.
Variance = 0
314.
Stationary Markov chain has:
A.
Constant transition probabilities → A
B.
Varying probabilities
C.
Mean = 0
D.
Variance = 1
315.
Ergodic Markov chain:
A.
Can reach all states eventually → A
B.
Cannot reach all states
C.
Deterministic
D.
Non-stationary
316.
Transition matrix elements are:
A.
Probabilities → A
B.
Means
C.
Variances
D.
Modes
317.
Poisson regression models:
A.
Count data → A
B.
Continuous data
C.
Binary outcome
D.
Ordinal outcome
318.
Negative binomial regression handles:
A.
Overdispersed count data → A
B.
Binary data
C.
Continuous data
D.
Time series
319.
Ordinal regression models:
A.
Continuous outcome
B.
Ordered categorical outcome → B
C.
Binary outcome
D.
Count data
320.
Multinomial logistic regression handles:
A.
Multiple categories → A
B.
Two categories
C.
Continuous data
D.
Time series
321.
Survival analysis studies:
A.
Time until event → A
B.
Mean only
C.
Variance only
D.
Regression coefficients
322.
Censoring occurs when:
A.
Exact event time unknown → A
B.
Event never occurs
C.
Time series is stationary
D.
Mean = median
323.
Kaplan-Meier estimator estimates:
A.
Survival function → A
B.
Hazard function
C.
Mean
D.
Variance
324.
Cox proportional hazards model assumes:
A.
Constant hazard ratios → A
B.
Increasing hazard
C.
Mean = median
D.
Variance = 1
325.
Hazard function measures:
A.
Instantaneous risk → A
B.
Mean risk
C.
Cumulative variance
D.
Median
326.
Log-rank test compares:
A.
Two survival curves → A
B.
Means
C.
Variances
D.
Regression coefficients
327.
Time-to-event data is:
A.
Continuous → A
B.
Categorical
C.
Binary
D.
Count
328.
Monte Carlo Markov Chain (MCMC) is used
for:
A.
Bayesian estimation → A
B.
Mean estimation only
C.
Variance estimation only
D.
Correlation
329.
Gibbs sampling updates:
A.
One variable at a time → A
B.
All variables simultaneously
C.
Mean only
D.
Variance only
330.
Metropolis-Hastings algorithm:
A.
Accepts or rejects proposed sample → A
B.
Always accepts
C.
Rejects all
D.
Updates mean only
331.
Random effects model accounts for:
A.
Within-group variability → A
B.
Between-group only
C.
Mean only
D.
Variance only
332.
Fixed effects model assumes:
A.
Effects are constant → A
B.
Effects are random
C.
Effects vary by sample
D.
Effects unknown
333.
Mixed-effects model includes:
A.
Fixed + random effects → A
B.
Only fixed
C.
Only random
D.
None
334.
Hierarchical linear model handles:
A.
Nested data → A
B.
Time series only
C.
Regression only
D.
Categorical data only
335.
Multilevel modeling is used when:
A.
Data clustered → A
B.
Data independent
C.
Time series
D.
Continuous only
336.
Structural equation modeling (SEM)
combines:
A.
Regression + factor analysis → A
B.
Only regression
C.
Only correlation
D.
Only variance
337.
Path analysis is part of:
A.
SEM → A
B.
ANOVA
C.
Regression
D.
Time series
338.
Confirmatory factor analysis (CFA)
tests:
A.
Hypothesized factor structure → A
B.
Mean equality
C.
Regression coefficients
D.
Variance equality
339.
Exploratory factor analysis (EFA)
discovers:
A.
Factor structure → A
B.
Mean
C.
Regression
D.
Variance
340.
Principal axis factoring is:
A.
Common factor method → A
B.
PCA method
C.
Regression
D.
Correlation
341.
Varimax rotation maximizes:
A.
Loading variance → A
B.
Mean
C.
Regression slope
D.
Covariance
342.
Oblique rotation allows:
A.
Factors correlated → A
B.
Factors uncorrelated
C.
Regression only
D.
Mean only
343.
Bartlett’s test in factor analysis
checks:
A.
Sphericity → A
B.
Variance equality
C.
Regression
D.
Mean
344.
Kaiser-Meyer-Olkin (KMO) measure checks:
A.
Sampling adequacy → A
B.
Variance
C.
Regression slope
D.
Mean
345.
Communality indicates:
A.
Variance explained by factors → A
B.
Total variance
C.
Regression slope
D.
Mean
346.
Eigenvalue >1 rule selects:
A.
Number of factors → A
B.
Number of observations
C.
Number of predictors
D.
Regression coefficients
347.
Scree plot visualizes:
A.
Eigenvalues → A
B.
Mean
C.
Variance
D.
Regression
348.
Cluster validity indices include:
A.
Silhouette → A
B.
Rand index → B
C.
Both
D.
None → C
349.
Dendrogram helps in:
A.
Hierarchical clustering → A
B.
Regression
C.
ANOVA
D.
Time series
350.
Agglomerative clustering starts with:
A.
Each observation as a cluster → A
B.
Single cluster
C.
Random cluster
D.
Mean only
351.
Divisive clustering starts with:
A.
All data in one cluster → A
B.
Each observation as cluster
C.
Random clusters
D.
Mean only
352.
Outlier in clustering may:
A.
Form its own cluster → A
B.
Merge with nearest cluster
C.
Affect centroids
D.
All of the above → D
353.
Hierarchical vs K-means:
A.
Deterministic vs iterative → A
B.
Both deterministic
C.
Both iterative
D.
None
354.
DBSCAN clustering identifies:
A.
Density-based clusters → A
B.
Hierarchical clusters
C.
K-means clusters
D.
PCA clusters
355.
The mode is defined as:
A.
Most frequent value → A
B.
Average value
C.
Middle value
D.
Sum of values
356.
The median divides data into:
A.
Two equal halves → A
B.
Four equal parts
C.
Three equal parts
D.
Five equal parts
357.
Skewness measures:
A.
Symmetry of data → A
B.
Spread
C.
Central tendency
D.
Correlation
358.
Positive skew means:
A.
Tail on right → A
B.
Tail on left
C.
Symmetric
D.
Uniform
359.
Negative skew means:
A.
Tail on left → A
B.
Tail on right
C.
Symmetric
D.
Uniform
360.
Kurtosis measures:
A.
Peakedness of distribution → A
B.
Spread
C.
Mean
D.
Median
361.
High kurtosis indicates:
A.
Heavy tails → A
B.
Light tails
C.
Symmetry
D.
Skewness = 0
362.
Low kurtosis indicates:
A.
Light tails → A
B.
Heavy tails
C.
Skewness
D.
Mean
363.
Variance formula (population) is:
A.
Σ(x−μ)² / N → A
B.
Σ(x−μ)² / (N−1)
C.
Σx / N
D.
Σx² / N
364.
Variance formula (sample) is:
A.
Σ(x−x̄)² / (n−1) → A
B.
Σ(x−x̄)² / n
C.
Σx / n
D.
Σx² / n
365.
Standard deviation is:
A.
Square root of variance → A
B.
Variance squared
C.
Mean
D.
Median
366.
Coefficient of variation (CV) =
A.
SD / Mean → A
B.
Mean / SD
C.
Variance / Mean
D.
Median / Mean
367.
Probability of mutually exclusive
events:
A.
Sum of individual probabilities → A
B.
Product
C.
Difference
D.
Ratio
368.
Probability of independent events:
A.
Product of probabilities → A
B.
Sum
C.
Difference
D.
Ratio
369.
Conditional probability formula:
A.
P(A|B) = P(A∩B)/P(B) → A
B.
P(A∩B)/P(A)
C.
P(A)+P(B)
D.
P(A)-P(B)
370.
Bayes theorem updates:
A.
Prior probability → A
B.
Mean
C.
Variance
D.
Standard deviation
371.
Random variable X can be:
A.
Discrete → A
B.
Continuous → B
C.
Both → C
D.
Neither
372.
Probability mass function (PMF) applies
to:
A.
Discrete → A
B.
Continuous
C.
Both
D.
Neither
373.
Probability density function (PDF)
applies to:
A.
Continuous → A
B.
Discrete
C.
Both
D.
Neither
374.
Cumulative distribution function (CDF)
gives:
A.
P(X ≤ x) → A
B.
P(X ≥ x)
C.
P(X = x)
D.
Mean
375.
Expected value of X =
A.
Σx·P(x) → A
B.
Mean only
C.
Variance only
D.
Median
376.
Law of total probability:
A.
P(A) = Σ P(A|Bi)P(Bi) → A
B.
P(A) = P(A∩B)
C.
P(A) = P(A)+P(B)
D.
P(A) = P(A)/P(B)
377.
Standard normal distribution:
A.
Mean 0, SD 1 → A
B.
Mean 1, SD 0
C.
Mean 0, SD 0
D.
Mean 1, SD 1
378.
Z-score formula:
A.
(X−μ)/σ → A
B.
(μ−X)/σ
C.
X/σ
D.
X−μ
379.
T-distribution is used when:
A.
Population SD unknown → A
B.
Population mean unknown
C.
Sample size large
D.
Sample size infinite
380.
Degrees of freedom in t-test:
A.
n−1 → A
B.
n
C.
n+1
D.
n−2
381.
One-sample t-test compares:
A.
Sample mean vs population mean → A
B.
Two sample means
C.
Two variances
D.
Proportions
382.
Two-sample t-test compares:
A.
Means of two independent samples → A
B.
Paired samples
C.
Variances
D.
Proportions
383.
Paired t-test compares:
A.
Means of paired observations → A
B.
Independent samples
C.
Variances
D.
Proportions
384.
F-test compares:
A.
Variances → A
B.
Means
C.
Medians
D.
Correlations
385.
ANOVA is an extension of:
A.
t-test → A
B.
Z-test
C.
F-test
D.
Chi-square
386.
One-way ANOVA has:
A.
One factor → A
B.
Two factors
C.
Multiple factors
D.
None
387.
Two-way ANOVA has:
A.
Two factors → A
B.
One factor
C.
Multiple factors
D.
None
388.
Post-hoc tests are used after:
A.
Significant ANOVA → A
B.
Non-significant ANOVA
C.
Regression
D.
Chi-square
389.
Bonferroni correction adjusts for:
A.
Multiple comparisons → A
B.
Single test
C.
Regression
D.
Variance
390.
Chi-square test applies to:
A.
Categorical data → A
B.
Continuous data
C.
Regression
D.
Time series
391.
Chi-square goodness-of-fit compares:
A.
Observed vs expected frequencies → A
B.
Means
C.
Variances
D.
Regression coefficients
392.
Chi-square test for independence
examines:
A.
Association between variables → A
B.
Mean difference
C.
Variance equality
D.
Regression
393.
Contingency table shows:
A.
Cross-tabulated counts → A
B.
Means only
C.
Variances only
D.
Regression
394.
Residual =
A.
Observed − Predicted → A
B.
Predicted − Observed
C.
Mean − Observed
D.
Variance − Predicted
395.
Homoscedasticity =
A.
Equal variance → A
B.
Unequal variance
C.
Normal distribution
D.
Independence
396.
Heteroscedasticity violates:
A.
Constant variance assumption → A
B.
Linearity
C.
Normality
D.
Independence
397.
Cook’s distance detects:
A.
Influential points → A
B.
Outliers only
C.
Leverage only
D.
Residuals
398.
Leverage measures:
A.
Distance in predictor space → A
B.
Residual size
C.
Mean
D.
Variance
399.
Multicollinearity affects:
A.
Standard errors → A
B.
Means
C.
Medians
D.
Mode
400.
VIF > 10 indicates:
A.
Severe multicollinearity → A
B.
Low correlation
C.
Independence
D.
Normality
401.
Random effects model accounts for:
A.
Group-level variation → A
B.
Fixed effect only
C.
Mean only
D.
Variance only
402.
Fixed effects model assumes:
A.
Constant effects → A
B.
Random effects
C.
Variable effects
D.
Unknown effects
403.
Mixed effects model combines:
A.
Fixed + random → A
B.
Fixed only
C.
Random only
D.
Neither
404.
Hierarchical modeling handles:
A.
Nested data → A
B.
Independent data
C.
Time series only
D.
Categorical only
405.
Poisson regression is suitable for:
A.
Count data → A
B.
Continuous data
C.
Binary outcome
D.
Ordinal outcome
406.
Overdispersion occurs when:
A.
Variance > Mean → A
B.
Variance < Mean
C.
Variance = Mean
D.
Mean = 0
407.
Negative binomial regression handles:
A.
Overdispersed count data → A
B.
Binary data
C.
Continuous data
D.
Ordinal data
408.
Time series data has:
A.
Temporal order → A
B.
Random order
C.
Categorical only
D.
Constant variance
409.
Stationary time series has:
A.
Constant mean & variance → A
B.
Changing mean
C.
Changing variance
D.
Trend only
410.
Differencing a series removes:
A.
Trend → A
B.
Seasonality
C.
Noise
D.
Skewness
411.
Seasonal differencing removes:
A.
Seasonality → A
B.
Trend
C.
Noise
D.
Mean
412.
Autocorrelation measures:
A.
Correlation of series with lagged values
→ A
B.
Variance only
C.
Mean only
D.
Skewness
413.
Partial autocorrelation measures:
A.
Direct correlation between X_t and
X_{t-k} controlling intermediate lags → A
B.
Total correlation
C.
Variance
D.
Mean
414.
AR(p) model uses:
A.
p lagged values → A
B.
p future values
C.
Moving average
D.
Trend
415.
MA(q) model uses:
A.
q lagged errors → A
B.
q lagged values
C.
Trend only
D.
Mean
416.
ARMA(p,q) combines:
A.
AR + MA → A
B.
AR only
C.
MA only
D.
AR + trend
417.
ARIMA(p,d,q) adds:
A.
Differencing → A
B.
Autocorrelation
C.
Variance
D.
Mean
418.
Exponential smoothing gives:
A.
Higher weight to recent observations → A
B.
Equal weight
C.
Lower weight to recent
D.
Random weight
419.
Holt-Winters method handles:
A.
Trend + seasonality → A
B.
Noise only
C.
Mean only
D.
Variance only
420.
White noise has:
A.
Zero mean, constant variance → A
B.
Non-zero mean
C.
Changing variance
D.
Trend
421.
ARCH/GARCH models address:
A.
Heteroscedasticity → A
B.
Mean
C.
Median
D.
Mode
422.
Monte Carlo simulation estimates:
A.
Probabilities & distributions → A
B.
Mean only
C.
Median only
D.
Mode
423.
MCMC (Markov Chain Monte Carlo) is used
for:
A.
Bayesian estimation → A
B.
Frequentist estimation
C.
Mean only
D.
Variance only
424.
Gibbs sampling updates:
A.
One variable at a time → A
B.
All variables simultaneously
C.
Mean only
D.
Variance only
425.
Metropolis-Hastings algorithm:
A.
Accepts or rejects proposed sample → A
B.
Always accepts
C.
Always rejects
D.
Updates mean only
426.
Bayesian inference combines:
A.
Prior × Likelihood → A
B.
Mean × Variance
C.
Median × Mode
D.
Regression coefficients
427.
Posterior distribution =
A.
Updated belief after data → A
B.
Prior only
C.
Likelihood only
D.
Mean only
428.
Conjugate prior ensures:
A.
Posterior in same family → A
B.
Posterior uniform
C.
Posterior normal
D.
Posterior variance
429.
Maximum a posteriori (MAP) estimation
maximizes:
A.
Posterior → A
B.
Likelihood
C.
Mean
D.
Variance
430.
Hierarchical Bayesian model accounts
for:
A.
Group-level variation → A
B.
Individual only
C.
Mean only
D.
Variance only
431.
Random effects model includes:
A.
Group-level variation → A
B.
Fixed effect only
C.
Mean only
D.
Variance only
432.
Fixed effects model assumes:
A.
Constant effects → A
B.
Random effects
C.
Variable effects
D.
Unknown effects
433.
Mixed effects model combines:
A.
Fixed + random → A
B.
Fixed only
C.
Random only
D.
Neither
434.
Multilevel modeling is used for:
A.
Nested data → A
B.
Independent data
C.
Continuous only
D.
Categorical only
435.
Structural equation modeling (SEM)
combines:
A.
Regression + factor analysis → A
B.
Regression only
C.
Correlation only
D.
Variance only
436.
Path analysis is part of:
A.
SEM → A
B.
ANOVA
C.
Regression
D.
Time series
437.
Confirmatory factor analysis (CFA)
tests:
A.
Hypothesized factor structure → A
B.
Mean equality
C.
Regression coefficients
D.
Variance equality
438.
Exploratory factor analysis (EFA)
discovers:
A.
Factor structure → A
B.
Mean only
C.
Regression
D.
Variance
439.
Kaiser-Meyer-Olkin (KMO) measure checks:
A.
Sampling adequacy → A
B.
Variance
C.
Regression slope
D.
Mean
440.
Bartlett’s test checks:
A.
Sphericity → A
B.
Mean
C.
Variance
D.
Regression
441.
Scree plot visualizes:
A.
Eigenvalues → A
B.
Means
C.
Variances
D.
Regression
442.
Principal Component Analysis (PCA)
reduces:
A.
Dimensionality → A
B.
Mean
C.
Variance
D.
Regression
443.
First principal component maximizes:
A.
Variance → A
B.
Mean
C.
Skewness
D.
Kurtosis
444.
Varimax rotation achieves:
A.
Simple structure → A
B.
Maximum variance
C.
Regression
D.
Mean only
445.
K-means clustering minimizes:
A.
Within-cluster sum of squares → A
B.
Between-cluster variance
C.
Mean only
D.
Variance only
446.
Hierarchical clustering produces:
A.
Dendrogram → A
B.
Regression line
C.
Correlation matrix
D.
Factor loadings
447.
Agglomerative clustering starts with:
A.
Each observation as cluster → A
B.
One cluster
C.
Random clusters
D.
Mean only
448.
Divisive clustering starts with:
A.
All data in one cluster → A
B.
Each observation
C.
Random clusters
D.
Mean only
449.
DBSCAN identifies:
A.
Density-based clusters → A
B.
Hierarchical clusters
C.
K-means clusters
D.
Regression clusters
450.
Silhouette score measures:
A.
Cluster separation → A
B.
Mean
C.
Variance
D.
Standard deviation
451.
High silhouette score indicates:
A.
Well-separated clusters → A
B.
Overlapping clusters
C.
Poor clustering
D.
Random clustering
452.
Outlier detection uses:
A.
Z-score, IQR → A
B.
Mean only
C.
Variance only
D.
Median only
453.
Boxplot identifies:
A.
Outliers → A
B.
Mean
C.
Variance
D.
Standard deviation
454.
Leverage points affect:
A.
Regression line → A
B.
Median only
C.
Variance only
D.
Mean only
455.
Hierarchical clustering produces:
A.
Dendrogram → A
B.
Regression line
C.
Correlation matrix
D.
Factor loadings
456.
Agglomerative clustering starts with:
A.
Each observation as a cluster → A
B.
One cluster
C.
Random clusters
D.
Mean only
457.
Divisive clustering starts with:
A.
All data in one cluster → A
B.
Each observation
C.
Random clusters
D.
Mean only
458.
K-means clustering minimizes:
A.
Within-cluster sum of squares → A
B.
Between-cluster variance
C.
Mean only
D.
Variance only
459.
DBSCAN clustering identifies:
A.
Density-based clusters → A
B.
Hierarchical clusters
C.
K-means clusters
D.
Regression clusters
460.
Silhouette score measures:
A.
Cluster separation → A
B.
Mean
C.
Variance
D.
Standard deviation
461.
High silhouette score indicates:
A.
Well-separated clusters → A
B.
Overlapping clusters
C.
Poor clustering
D.
Random clustering
462.
Outlier detection methods include:
A.
Z-score, IQR → A
B.
Mean only
C.
Variance only
D.
Median only
463.
Boxplot shows:
A.
Outliers → A
B.
Mean only
C.
Variance only
D.
Standard deviation
464.
Leverage points affect:
A.
Regression line → A
B.
Median only
C.
Variance only
D.
Mean only
465.
Cook’s distance identifies:
A.
Influential points → A
B.
Outliers only
C.
Median points
D.
Regular points
466.
Multicollinearity inflates:
A.
Standard errors → A
B.
Means
C.
Medians
D.
Modes
467.
Variance Inflation Factor (VIF) >10
indicates:
A.
Severe multicollinearity → A
B.
No correlation
C.
Independence
D.
Normality
468.
Heteroscedasticity violates:
A.
Constant variance assumption → A
B.
Linearity
C.
Normality
D.
Independence
469.
Autocorrelation violates:
A.
Independence assumption → A
B.
Linearity
C.
Normality
D.
Variance
470.
Time series decomposition separates:
A.
Trend, seasonality, residual → A
B.
Mean only
C.
Variance only
D.
Median only
471.
ARIMA model includes:
A.
AR + I + MA → A
B.
AR only
C.
MA only
D.
Differencing only
472.
Stationarity is required for:
A.
ARIMA → A
B.
Regression
C.
ANOVA
D.
Chi-square
473.
Exponential smoothing is used for:
A.
Forecasting → A
B.
Regression
C.
Correlation
D.
Variance estimation
474.
Holt-Winters method models:
A.
Trend + seasonality → A
B.
Noise only
C.
Mean only
D.
Variance only
475.
Bootstrapping resamples:
A.
With replacement → A
B.
Without replacement
C.
Randomly once
D.
Deterministically
476.
Jackknife resampling removes:
A.
One observation at a time → A
B.
Half the sample
C.
Entire sample
D.
Random subset
477.
Principal Component Analysis (PCA)
reduces:
A.
Dimensionality → A
B.
Mean
C.
Variance
D.
Regression
478.
First principal component maximizes:
A.
Variance → A
B.
Mean
C.
Skewness
D.
Kurtosis
479.
Eigenvalues in PCA indicate:
A.
Variance explained → A
B.
Mean only
C.
Regression coefficient
D.
Skewness
480.
Factor analysis identifies:
A.
Latent variables → A
B.
Observed variables only
C.
Means
D.
Variances
481.
Kaiser-Meyer-Olkin (KMO) measure checks:
A.
Sampling adequacy → A
B.
Variance
C.
Regression slope
D.
Mean
482.
Bartlett’s test checks:
A.
Sphericity → A
B.
Mean
C.
Variance
D.
Regression
483.
Scree plot visualizes:
A.
Eigenvalues → A
B.
Means
C.
Variances
D.
Regression
484.
Confirmatory Factor Analysis (CFA)
tests:
A.
Hypothesized factor structure → A
B.
Mean equality
C.
Regression coefficients
D.
Variance equality
485.
Exploratory Factor Analysis (EFA)
discovers:
A.
Factor structure → A
B.
Mean only
C.
Regression
D.
Variance
486.
Structural Equation Modeling (SEM)
combines:
A.
Regression + factor analysis → A
B.
Regression only
C.
Correlation only
D.
Variance only
487.
Path analysis is part of:
A.
SEM → A
B.
ANOVA
C.
Regression
D.
Time series
488.
Bayesian statistics updates:
A.
Prior beliefs with data → A
B.
Only mean
C.
Only variance
D.
Only median
489.
Posterior distribution =
A.
Updated probability → A
B.
Prior only
C.
Likelihood only
D.
Mean only
490.
Maximum a posteriori (MAP) estimation
maximizes:
A.
Posterior → A
B.
Likelihood
C.
Mean
D.
Variance
491.
Markov Chain Monte Carlo (MCMC) is used
for:
A.
Bayesian estimation → A
B.
Frequentist estimation
C.
Mean only
D.
Variance only
492.
Gibbs sampling updates:
A.
One variable at a time → A
B.
All variables simultaneously
C.
Mean only
D.
Variance only
493.
Metropolis-Hastings algorithm:
A.
Accepts/rejects proposed sample → A
B.
Always accepts
C.
Always rejects
D.
Updates mean only
494.
Monte Carlo simulation estimates:
A.
Probabilities & distributions → A
B.
Mean only
C.
Median only
D.
Mode
495.
Random effects model accounts for:
A.
Group-level variation → A
B.
Fixed effect only
C.
Mean only
D.
Variance only
496.
Fixed effects model assumes:
A.
Constant effects → A
B.
Random effects
C.
Variable effects
D.
Unknown effects
497.
Mixed effects model combines:
A.
Fixed + random → A
B.
Fixed only
C.
Random only
D.
Neither
498.
Multilevel modeling is used for:
A.
Nested data → A
B.
Independent data
C.
Continuous only
D.
Categorical only
499.
Overdispersion occurs when:
A.
Variance > mean → A
B.
Variance < mean
C.
Variance = mean
D.
Mean = 0
500.
Negative binomial regression handles:
A.
Overdispersed count data → A
B.
Binary data
C.
Continuous data
D.
Ordinal data


