Cognizant Interview Question Machine Learning

Here’s a list of important interview questions in statistics that are commonly asked in the context of Machine Learning:

1. Descriptive Statistics

What is the difference between mean, median, and mode?
How do you handle outliers in a dataset?
What are the different measures of central tendency?
What is the standard deviation, and how is it different from variance?
Explain skewness and kurtosis.

2. Probability

What is the difference between probability and likelihood?
Explain Bayes’ Theorem with an example.
What are conditional probability and joint probability?
What is the Law of Large Numbers?
What are prior, likelihood, and posterior probabilities?

3. Probability Distributions

Explain the difference between a discrete and continuous probability distribution.
What is a normal distribution, and why is it important in statistics?
Explain the Central Limit Theorem and its significance.
What is the difference between a uniform distribution and a normal distribution?
Describe the binomial distribution and provide a practical example.

4. Hypothesis Testing

What is a null hypothesis and an alternative hypothesis?
Explain p-value and its significance in hypothesis testing.
What are Type I and Type II errors?
What is a confidence interval, and how is it constructed?
Explain the concept of statistical power in hypothesis testing.

5. Correlation and Regression

What is the difference between correlation and causation?
How is correlation coefficient interpreted?
Explain the concept of multicollinearity in regression.
What is the difference between linear and logistic regression?
How do you interpret the coefficients in a linear regression model?

6. Sampling and Estimation

What is the difference between a sample and a population?
Explain different sampling techniques like random sampling, stratified sampling, and cluster sampling.
What is the difference between point estimation and interval estimation?
What is bias in sampling, and how can it affect your results?
How do you ensure that a sample is representative of the population?

7. ANOVA (Analysis of Variance)

What is ANOVA, and when is it used?
What are the assumptions of ANOVA?
Explain the difference between one-way and two-way ANOVA.
What is the F-statistic in ANOVA, and how is it interpreted?
When would you use an ANOVA test instead of a t-test?

8. Chi-Square Test

What is the Chi-Square test, and when is it used?
How do you interpret the results of a Chi-Square test?
What are the assumptions of the Chi-Square test?
Explain the difference between a Chi-Square test for independence and a Chi-Square goodness-of-fit test.

9. Time Series Analysis

What is stationarity in a time series, and why is it important?
Explain autocorrelation and partial autocorrelation in time series.
What is ARIMA, and how is it used in time series forecasting?
What are trend, seasonality, and noise in time series data?
How do you handle missing values in time series data?

10. Decision Trees and Random Forest

How does a decision tree make decisions?
Explain Gini impurity and entropy in the context of decision trees.
What is overfitting, and how can it be prevented in decision trees?
How does a random forest model reduce the risk of overfitting?
What are the advantages and disadvantages of using decision trees?

11. Bias-Variance Tradeoff

What is the bias-variance tradeoff?
Explain how bias and variance affect the model performance.
How do you reduce high variance or high bias in a model?
What is cross-validation, and how does it help in managing bias-variance tradeoff?

12. Model Evaluation Metrics

What is the difference between accuracy, precision, and recall?
Explain the ROC curve and AUC.
What is F1 score, and when should it be used?
How do you evaluate a classification model?
What is the confusion matrix, and how is it interpreted?

13. Clustering

What is clustering, and when is it used?
Explain the difference between k-means and hierarchical clustering.
How do you determine the optimal number of clusters in k-means clustering?
What is the silhouette score in clustering?
Explain the difference between supervised and unsupervised learning with examples.

These questions cover a wide range of topics in statistics that are fundamental to Machine Learning and Data Science. Preparing well for these questions will give you a solid foundation for technical interviews in this field.