Real Databricks Databricks-Certified-Data-Engineer-Associate Exam Questions [Updated 2024]
Databricks-Certified-Data-Engineer-Associate Exam Dumps Pass with Updated 2024 Databricks Certified Data Engineer Associate Exam
NEW QUESTION # 38
A data analyst has developed a query that runs against Delta table. They want help from the data engineering team to implement a series of tests to ensure the data returned by the query is clean. However, the data engineering team uses Python for its tests rather than SQL.
Which of the following operations could the data engineering team use to run the query and operate with the results in PySpark?
- A. spark.sql
- B. spark.delta.table
- C. There is no way to share data between PySpark and SQL.
- D. spark.table
- E. SELECT * FROM sales
Answer: A
Explanation:
Explanation
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
df = spark.sql("SELECT * FROM sales")
print(df.count())
NEW QUESTION # 39
Which of the following Git operations must be performed outside of Databricks Repos?
- A. Merge
- B. Clone
- C. Pull
- D. Push
- E. Commit
Answer: A
Explanation:
Explanation
For following tasks, work in your Git provider:
Create a pull request.
Resolve merge conflicts.
Merge or delete branches.
Rebase a branch.
https://docs.databricks.com/repos/index.html
NEW QUESTION # 40
In which of the following scenarios should a data engineer use the MERGE INTO command instead of the INSERT INTO command?
- A. When the location of the data needs to be changed
- B. When the source table can be deleted
- C. When the target table is an external table
- D. When the source is not a Delta table
- E. When the target table cannot contain duplicate records
Answer: E
Explanation:
Explanation
With merge , you can avoid inserting the duplicate records. The dataset containing the new logs needs to be deduplicated within itself. By the SQL semantics of merge, it matches and deduplicates the new data with the existing data in the table, but if there is duplicate data within the new dataset, it is inserted.https://docs.databricks.com/en/delta/merge.html#:~:text=With%20merge%20%2C%20you%20can%20a
NEW QUESTION # 41
A Delta Live Table pipeline includes two datasets defined using STREAMING LIVE TABLE. Three datasets are defined against Delta Lake table sources using LIVE TABLE.
The table is configured to run in Production mode using the Continuous Pipeline Mode.
Assuming previously unprocessed data exists and all definitions are valid, what is the expected outcome after clicking Start to update the pipeline?
- A. All datasets will be updated once and the pipeline will shut down. The compute resources will be terminated.
- B. All datasets will be updated once and the pipeline will persist without any processing. The compute resources will persist but go unused.
- C. All datasets will be updated once and the pipeline will shut down. The compute resources will persist to allow for additional testing.
- D. All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will persist to allow for additional testing.
- E. All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will be deployed for the update and terminated when the pipeline is stopped.
Answer: C
NEW QUESTION # 42
A dataset has been defined using Delta Live Tables and includes an expectations clause:
CONSTRAINT valid_timestamp EXPECT (timestamp > '2020-01-01') ON VIOLATION FAIL UPDATE What is the expected behavior when a batch of data containing data that violates these constraints is processed?
- A. Records that violate the expectation are added to the target dataset and flagged as invalid in a field added to the target dataset.
- B. Records that violate the expectation cause the job to fail.
- C. Records that violate the expectation are dropped from the target dataset and loaded into a quarantine table.
- D. Records that violate the expectation are added to the target dataset and recorded as invalid in the event log.
- E. Records that violate the expectation are dropped from the target dataset and recorded as invalid in the event log.
Answer: B
Explanation:
Explanation
https://docs.databricks.com/en/delta-live-tables/expectations.html
Action
Result
warn (default)
Invalid records are written to the target; failure is reported as a metric for the dataset.
drop
Invalid records are dropped before data is written to the target; failure is reported as a metrics for the dataset.
fail
Invalid records prevent the update from succeeding. Manual intervention is required before re-processing.
NEW QUESTION # 43
A data engineer has been given a new record of data:
id STRING = 'a1'
rank INTEGER = 6
rating FLOAT = 9.4
Which of the following SQL commands can be used to append the new record to an existing Delta table my_table?
- A. my_table UNION VALUES ('a1', 6, 9.4)
- B. UPDATE VALUES ('a1', 6, 9.4) my_table
- C. UPDATE my_table VALUES ('a1', 6, 9.4)
- D. INSERT VALUES ( 'a1' , 6, 9.4) INTO my_table
- E. INSERT INTO my_table VALUES ('a1', 6, 9.4)
Answer: E
NEW QUESTION # 44
Which of the following describes the relationship between Bronze tables and raw data?
- A. Bronze tables contain less data than raw data files.
- B. Bronze tables contain more truthful data than raw data.
- C. Bronze tables contain a less refined view of data than raw data.
- D. Bronze tables contain raw data with a schema applied.
- E. Bronze tables contain aggregates while raw data is unaggregated.
Answer: E
NEW QUESTION # 45
Which of the following commands will return the number of null values in the member_id column?
- A. SELECT count(member_id) - count_null(member_id) FROM my_table;
- B. SELECT count(member_id) FROM my_table;
- C. SELECT null(member_id) FROM my_table;
- D. SELECT count_if(member_id IS NULL) FROM my_table;
- E. SELECT count_null(member_id) FROM my_table;
Answer: D
Explanation:
Explanation
https://docs.databricks.com/en/sql/language-manual/functions/count.html Returns A BIGINT.
If * is specified also counts row containing NULL values.
If expr are specified counts only rows for which all expr are not NULL.
If DISTINCT duplicate rows are not counted.
NEW QUESTION # 46
A data engineer needs to apply custom logic to string column city in table stores for a specific use case. In order to apply this custom logic at scale, the data engineer wants to create a SQL user-defined function (UDF).
Which of the following code blocks creates this SQL UDF?
- A.

- B.

- C.

- D.

- E.

Answer: A
NEW QUESTION # 47
A data engineer needs to determine whether to use the built-in Databricks Notebooks versioning or version their project using Databricks Repos.
Which of the following is an advantage of using Databricks Repos over the Databricks Notebooks versioning?
- A. Databricks Repos provides the ability to comment on specific changes
- B. Databricks Repos supports the use of multiple branches
- C. Databricks Repos automatically saves development progress
- D. Databricks Repos allows users to revert to previous versions of a notebook
- E. Databricks Repos is wholly housed within the Databricks Lakehouse Platform
Answer: B
Explanation:
Explanation
An advantage of using Databricks Repos over the built-in Databricks Notebooks versioning is the ability to work with multiple branches. Branching is a fundamental feature ofversion control systems like Git, which Databricks Repos is built upon. It allows you to create separate branches for different tasks, features, or experiments within your project. This separation helps in parallel development and experimentation without affecting the main branch or the work of other team members. Branching provides a more organized and collaborative development environment, making it easier to merge changes and manage different development efforts. While Databricks Notebooks versioning also allows you to track versions of notebooks, it may not provide the same level of flexibility and collaboration as branching in Databricks Repos.
NEW QUESTION # 48
Which of the following code blocks will remove the rows where the value in column age is greater than 25 from the existing Delta table my_table and save the updated table?
- A. UPDATE my_table WHERE age > 25;
- B. UPDATE my_table WHERE age <= 25;
- C. DELETE FROM my_table WHERE age > 25;
- D. DELETE FROM my_table WHERE age <= 25;
- E. SELECT * FROM my_table WHERE age > 25;
Answer: C
NEW QUESTION # 49
A data analysis team has noticed that their Databricks SQL queries are running too slowly when connected to their always-on SQL endpoint. They claim that this issue is present when many members of the team are running small queries simultaneously. They ask the data engineering team for help. The data engineering team notices that each of the team's queries uses the same SQL endpoint.
Which of the following approaches can the data engineering team use to improve the latency of the team's queries?
- A. They can turn on the Serverless feature for the SQL endpoint and change the Spot Instance Policy to
"Reliability Optimized." - B. They can turn on the Serverless feature for the SQL endpoint.
- C. They can turn on the Auto Stop feature for the SQL endpoint.
- D. They can increase the cluster size of the SQL endpoint.
- E. They can increase the maximum bound of the SQL endpoint's scaling range.
Answer: E
NEW QUESTION # 50
A new data engineering team team. has been assigned to an ELT project. The new data engineering team will need full privileges on the database customers to fully manage the project.
Which of the following commands can be used to grant full permissions on the database to the new data engineering team?
- A. GRANT USAGE ON DATABASE customers TO team;
- B. GRANT ALL PRIVILEGES ON DATABASE team TO customers;
- C. GRANT SELECT PRIVILEGES ON DATABASE customers TO teams;
- D. GRANT SELECT CREATE MODIFY USAGE PRIVILEGES ON DATABASE customers TO team;
- E. GRANT ALL PRIVILEGES ON DATABASE customers TO team;
Answer: E
NEW QUESTION # 51
An engineering manager uses a Databricks SQL query to monitor ingestion latency for each data source. The manager checks the results of the query every day, but they are manually rerunning the query each day and waiting for the results.
Which of the following approaches can the manager use to ensure the results of the query are updated each day?
- A. They can schedule the query to refresh every 1 day from the SQL endpoint's page in Databricks SQL.
- B. They can schedule the query to run every 1 day from the Jobs UI.
- C. They can schedule the query to run every 12 hours from the Jobs UI.
- D. They can schedule the query to refresh every 1 day from the query's page in Databricks SQL.
- E. They can schedule the query to refresh every 12 hours from the SQL endpoint's page in Databricks SQL.
Answer: D
Explanation:
Explanation
https://docs.databricks.com/en/sql/user/queries/schedule-query.html
NEW QUESTION # 52
A data engineer has realized that they made a mistake when making a daily update to a table. They need to use Delta time travel to restore the table to a version that is 3 days old. However, when the data engineer attempts to time travel to the older version, they are unable to restore the data because the data files have been deleted.
Which of the following explains why the data files are no longer present?
- A. The VACUUM command was run on the table
- B. The TIME TRAVEL command was run on the table
- C. The HISTORY command was run on the table
- D. The DELETE HISTORY command was run on the table
- E. The OPTIMIZE command was nun on the table
Answer: A
NEW QUESTION # 53
A data engineer has configured a Structured Streaming job to read from a table, manipulate the data, and then perform a streaming write into a new table.
The cade block used by the data engineer is below:
If the data engineer only wants the query to execute a micro-batch to process data every 5 seconds, which of the following lines of code should the data engineer use to fill in the blank?
- A. trigger(continuous="5 seconds")
- B. trigger(once="5 seconds")
- C. trigger(processingTime="5 seconds")
- D. trigger()
- E. trigger("5 seconds")
Answer: C
NEW QUESTION # 54
A data engineer has realized that the data files associated with a Delta table are incredibly small. They want to compact the small files to form larger files to improve performance.
Which of the following keywords can be used to compact the small files?
- A. OPTIMIZE
- B. COMPACTION
- C. REPARTITION
- D. REDUCE
- E. VACUUM
Answer: A
Explanation:
Explanation
OPTIMIZE can be used to club small files into 1 and improve performance.
NEW QUESTION # 55
Which of the following describes the type of workloads that are always compatible with Auto Loader?
- A. Machine learning workloads
- B. Streaming workloads
- C. Serverless workloads
- D. Dashboard workloads
- E. Batch workloads
Answer: B
Explanation:
Explanation
Auto Loader is a feature of Databricks that simplifies and automates the process of loading streaming data into Delta Lake tables. Auto Loader can detect new and updated files in cloud storage and efficiently load them as micro-batches or as a continuous stream. Auto Loader is always compatible with streaming workloads, as it is designed to handle streaming sources such as Amazon S3, Azure Data Lake Storage Gen2, and Azure Blob Storage. The other types of workloads may or may not be compatible with Auto Loader, depending on the data source and the use case. References: The information can be referenced from Databricks documentation on Auto Loader: Auto Loader.
https://community.databricks.com/t5/data-engineering/practice-exams-for-databricks-certified-data-engineer/td-p
NEW QUESTION # 56
A data engineer needs to determine whether to use the built-in Databricks Notebooks versioning or version their project using Databricks Repos.
Which of the following is an advantage of using Databricks Repos over the Databricks Notebooks versioning?
- A. Databricks Repos provides the ability to comment on specific changes
- B. Databricks Repos supports the use of multiple branches
- C. Databricks Repos automatically saves development progress
- D. Databricks Repos allows users to revert to previous versions of a notebook
- E. Databricks Repos is wholly housed within the Databricks Lakehouse Platform
Answer: B
NEW QUESTION # 57
A data engineer runs a statement every day to copy the previous day's sales into the table transactions. Each day's sales are in their own file in the location "/transactions/raw".
Today, the data engineer runs the following command to complete this task:
After running the command today, the data engineer notices that the number of records in table transactions has not changed.
Which of the following describes why the statement might not have copied any new records into the table?
- A. The COPY INTO statement requires the table to be refreshed to view the copied rows.
- B. The format of the files to be copied were not included with the FORMAT_OPTIONS keyword.
- C. The PARQUET file format does not support COPY INTO.
- D. The names of the files to be copied were not included with the FILES keyword.
- E. The previous day's file has already been copied into the table.
Answer: E
Explanation:
Explanation
https://docs.databricks.com/en/ingestion/copy-into/index.html The COPY INTO SQL command lets you load data from a file location into a Delta table. This is a re-triable and idempotent operation; files in the source location that have already been loaded are skipped. if there are no new records, the only consistent choice is C no new files were loaded because already loaded files were skipped.
NEW QUESTION # 58
A data engineer needs to create a table in Databricks using data from their organization's existing SQLite database.
They run the following command:
Which of the following lines of code fills in the above blank to successfully complete the task?
- A. org.apache.spark.sql.sqlite
- B. sqlite
- C. org.apache.spark.sql.jdbc
- D. DELTA
- E. autoloader
Answer: C
Explanation:
Explanation
CREATE TABLE new_employees_table
USING JDBC
OPTIONS (
url "<jdbc_url>",
dbtable "<table_name>",
user '<username>',
password '<password>'
) AS
SELECT * FROM employees_table_vw
https://docs.databricks.com/external-data/jdbc.html#language-sql
NEW QUESTION # 59
......
The GAQM Databricks-Certified-Data-Engineer-Associate (Databricks Certified Data Engineer Associate) Exam is a certification exam designed for professionals seeking to validate their proficiency in data engineering using Databricks. Databricks-Certified-Data-Engineer-Associate exam covers various aspects of Databricks such as data transformation, ETL processes, data modeling, and data warehousing. Databricks Certified Data Engineer Associate Exam certification is globally recognized and demonstrates the candidate's expertise in Databricks data engineering.
Databricks Certified Data Engineer Associate certification is ideal for professionals working in data engineering, data warehousing, and data modeling roles. Databricks Certified Data Engineer Associate Exam certification demonstrates the candidate's knowledge of Databricks and their ability to design and implement data engineering solutions using Databricks. It also validates their understanding of data transformation, ETL processes, data warehousing, and data modeling concepts. Databricks Certified Data Engineer Associate Exam certification can enhance the candidate's career opportunities and increase their earning potential.
Databricks-Certified-Data-Engineer-Associate Exam Dumps, Databricks-Certified-Data-Engineer-Associate Practice Test Questions: https://www.validbraindumps.com/Databricks-Certified-Data-Engineer-Associate-exam-prep.html