Join the discussion
Question 1/28
A data scientist has been given an incomplete notebook from the data engineering team. The notebook uses a Spark DataFrame spark_df on which the data scientist needs to perform further feature engineering. Unfortunately, the data scientist has not yet learned the PySpark DataFrame API.
Which of the following blocks of code can the data scientist run to be able to use the pandas API on Spark?
Which of the following blocks of code can the data scientist run to be able to use the pandas API on Spark?
Correct Answer: A
To use the pandas API on Spark, the data scientist can run the following code block:
import pyspark.pandas as ps df = ps.DataFrame(spark_df)
This code imports the pandas API on Spark and converts the Spark DataFrame spark_df into a pandas-on-Spark DataFrame, allowing the data scientist to use familiar pandas functions for further feature engineering.
Reference:
Databricks documentation on pandas API on Spark: pandas API on Spark
import pyspark.pandas as ps df = ps.DataFrame(spark_df)
This code imports the pandas API on Spark and converts the Spark DataFrame spark_df into a pandas-on-Spark DataFrame, allowing the data scientist to use familiar pandas functions for further feature engineering.
Reference:
Databricks documentation on pandas API on Spark: pandas API on Spark
Add Comments
- Other Question (28q)
- Q1. A data scientist has been given an incomplete notebook from the data engineering team. The...
- Q2. The implementation of linear regression in Spark ML first attempts to solve the linear reg...
- Q3. A data scientist is using Spark SQL to import their data into a machine learning pipeline....
- Q4. A data scientist has developed a machine learning pipeline with a static input data set us...
- Q5. A data scientist learned during their training to always use 5-fold cross-validation in th...
- Q6. A machine learning engineer wants to parallelize the inference of group-specific models us...
- Q7. A data scientist wants to tune a set of hyperparameters for a machine learning model. They...
- Q8. A data scientist has a Spark DataFrame spark_df. They want to create a new Spark DataFrame...
- Q9. A data scientist has developed a linear regression model using Spark ML and computed the p...
- Q10. A data scientist wants to parallelize the training of trees in a gradient boosted tree to ...
- Q11. The implementation of linear regression in Spark ML first attempts to solve the linear reg...
- Q12. Which of the following is a benefit of using vectorized pandas UDFs instead of standard Py...
- Q13. A machine learning engineer wants to parallelize the training of group-specific models usi...
- Q14. A data scientist wants to efficiently tune the hyperparameters of a scikit-learn model in ...
- Q15. Which of the following tools can be used to distribute large-scale feature engineering wit...
- Q16. A machine learning engineer has created a Feature Table new_table using Feature Store Clie...
- Q17. A machine learning engineer has been notified that a new Staging version of a model regist...
- Q18. A new data scientist has started working on an existing machine learning project. The proj...
- Q19. A data scientist has produced two models for a single machine learning problem. One of the...
- Q20. Which of the Spark operations can be used to randomly split a Spark DataFrame into a train...
- Q21. A data scientist is utilizing MLflow Autologging to automatically track their machine lear...
- Q22. A data scientist is performing hyperparameter tuning using an iterative optimization algor...
- Q23. A data scientist has written a data cleaning notebook that utilizes the pandas library, bu...
- Q24. Which of the following evaluation metrics is not suitable to evaluate runs in AutoML exper...
- Q25. A machine learning engineer is converting a decision tree from sklearn to Spark ML. They n...
- Q26. A data scientist is using MLflow to track their machine learning experiment. As a part of ...
- Q27. A data scientist has developed a random forest regressor rfr and included it as the final ...
- Q28. Which statement describes a Spark ML transformer?...

[×]
Download PDF File
Enter your email address to download Databricks.Databricks-Machine-Learning-Associate.v2024-12-23.q28.pdf