Join the discussion
Question 1/26
A data scientist has developed a linear regression model using Spark ML and computed the predictions in a Spark DataFrame preds_df with the following schema:
prediction DOUBLE
actual DOUBLE
Which of the following code blocks can be used to compute the root mean-squared-error of the model according to the data in preds_df and assign it to the rmse variable?
prediction DOUBLE
actual DOUBLE
Which of the following code blocks can be used to compute the root mean-squared-error of the model according to the data in preds_df and assign it to the rmse variable?
Correct Answer: C
To compute the root mean-squared-error (RMSE) of a linear regression model using Spark ML, the RegressionEvaluator class is used. The RegressionEvaluator is specifically designed for regression tasks and can calculate various metrics, including RMSE, based on the columns containing predictions and actual values.
The correct code block to compute RMSE from the preds_df DataFrame is:
regression_evaluator = RegressionEvaluator( predictionCol="prediction", labelCol="actual", metricName="rmse" ) rmse = regression_evaluator.evaluate(preds_df) This code creates an instance of RegressionEvaluator, specifying the prediction and label columns, as well as the metric to be computed ("rmse"). It then evaluates the predictions in preds_df and assigns the resulting RMSE value to the rmse variable.
Options A and B incorrectly use BinaryClassificationEvaluator, which is not suitable for regression tasks. Option D also incorrectly uses BinaryClassificationEvaluator.
Reference:
PySpark ML Documentation
The correct code block to compute RMSE from the preds_df DataFrame is:
regression_evaluator = RegressionEvaluator( predictionCol="prediction", labelCol="actual", metricName="rmse" ) rmse = regression_evaluator.evaluate(preds_df) This code creates an instance of RegressionEvaluator, specifying the prediction and label columns, as well as the metric to be computed ("rmse"). It then evaluates the predictions in preds_df and assigns the resulting RMSE value to the rmse variable.
Options A and B incorrectly use BinaryClassificationEvaluator, which is not suitable for regression tasks. Option D also incorrectly uses BinaryClassificationEvaluator.
Reference:
PySpark ML Documentation
Add Comments
- Other Question (26q)
- Q1. A data scientist has developed a linear regression model using Spark ML and computed the p...
- Q2. A data scientist wants to use Spark ML to one-hot encode the categorical features in their...
- Q3. Which of the following hyperparameter optimization methods automatically makes informed se...
- Q4. A machine learning engineer has identified the best run from an MLflow Experiment. They ha...
- Q5. Which of the following machine learning algorithms typically uses bagging?...
- Q6. A machine learning engineer is trying to scale a machine learning pipeline by distributing...
- Q7. Which of the following statements describes a Spark ML estimator?...
- Q8. A data scientist is developing a machine learning pipeline using AutoML on Databricks Mach...
- Q9. A machine learning engineer is trying to perform batch model inference. They want to get p...
- Q10. A data scientist has created a linear regression model that uses log(price) as a label var...
- Q11. Which of the Spark operations can be used to randomly split a Spark DataFrame into a train...
- Q12. A data scientist has created two linear regression models. The first model uses price as a...
- Q13. Which of the following tools can be used to distribute large-scale feature engineering wit...
- Q14. A health organization is developing a classification model to determine whether or not a p...
- Q15. A machine learning engineer has been notified that a new Staging version of a model regist...
- Q16. A data scientist has defined a Pandas UDF function predict to parallelize the inference pr...
- Q17. A data scientist is performing hyperparameter tuning using an iterative optimization algor...
- Q18. In which of the following situations is it preferable to impute missing feature values wit...
- Q19. A new data scientist has started working on an existing machine learning project. The proj...
- Q20. A data scientist has developed a machine learning pipeline with a static input data set us...
- Q21. A data scientist has a Spark DataFrame spark_df. They want to create a new Spark DataFrame...
- Q22. A data scientist learned during their training to always use 5-fold cross-validation in th...
- Q23. Which of the following evaluation metrics is not suitable to evaluate runs in AutoML exper...
- Q24. The implementation of linear regression in Spark ML first attempts to solve the linear reg...
- Q25. A machine learning engineer wants to parallelize the inference of group-specific models us...
- Q26. A machine learning engineering team has a Job with three successive tasks. Each task runs ...

[×]
Download PDF File
Enter your email address to download Databricks.Databricks-Machine-Learning-Associate.v2024-10-17.q26.pdf