Working With Data

Working With Data Quiz

Quiz

Question 1 of 35 (0 answered)

Question 1

When reading CSV files, what is the primary advantage of using csv.DictReader over csv.reader?

It reads files faster It uses less memory It allows access to columns by name instead of position It automatically handles malformed CSV files

✓

Correct!

csv.DictReader treats the first row as column headers and returns each row as a dictionary, allowing you to access values by column name (e.g., row['product']) rather than by numeric index (e.g., row[0]). This makes code more readable and maintainable.

✗

Incorrect

Think about what ‘Dict’ in DictReader represents.

Question 2

What will be the output of this code?

import json

data = {"status": "ok", "count": 42}
json_str = json.dumps(data)
print(type(json_str))

What will this code output?

`<class 'dict'>` `<class 'str'>` `<class 'json'>` `<class 'bytes'>`

✓

Correct!

json.dumps() (with an ’s’ for string) converts a Python dictionary to a JSON-formatted string. The ’s’ suffix is the key reminder: dumps = dump to string, dump = dump to file.

✗

Incorrect

json.dumps() (with an ’s’ for string) converts a Python dictionary to a JSON-formatted string. The ’s’ suffix is the key reminder: dumps = dump to string, dump = dump to file.

Remember: the ’s’ in dumps means ‘string’.

Question 3

Which of the following are valid ways to handle missing or nested JSON fields safely?

Using .get() method with default values Using direct key access like data['key'] Creating a helper function with multiple .get() calls Using try/except blocks around each key access Checking isinstance() before accessing nested levels

✓

Correct!

Safe JSON navigation uses .get() with defaults, helper functions that chain .get() calls, and isinstance() checks before accessing nested data. Direct key access (data['key']) raises KeyError if the key is missing. While try/except works, it’s verbose and less elegant than .get().

✗

Incorrect

Think about methods that don’t raise exceptions when keys are missing.

Question 4

In Python’s json module, json.load() reads from a file while json.loads() reads from a string.

True False

✓

Correct!

Correct! The ’s’ suffix is the mnemonic: load() reads from a file object, loads() (load string) reads from a string. Similarly, dump() writes to a file, dumps() returns a string.

✗

Incorrect

Correct! The ’s’ suffix is the mnemonic: load() reads from a file object, loads() (load string) reads from a string. Similarly, dump() writes to a file, dumps() returns a string.

The ’s’ stands for ‘string’.

Question 5

What parameter should you ALWAYS pass to the open() function when writing CSV files to prevent blank rows on Windows?

✓

Correct!

The newline='' parameter prevents the csv module from writing blank lines between rows on Windows systems. This is a cross-platform best practice when using Python’s csv module.

✗

Incorrect

The newline='' parameter prevents the csv module from writing blank lines between rows on Windows systems. This is a cross-platform best practice when using Python’s csv module.

It’s related to line endings and takes an empty string as its value.

Question 6

When making HTTP requests with the requests library, call ___ to raise an HTTPError exception if the status code indicates an error (4xx or 5xx).

✓

Correct!

response.raise_for_status() checks if the HTTP status code indicates an error (4xx client errors or 5xx server errors) and raises an HTTPError exception if so. This allows you to use try/except for error handling instead of manually checking status_code.

✗

Incorrect

Think about what ‘raise’ typically means in Python error handling.

Question 7

Complete the code to make a safe, parameterized SQLite query:

Fill in the missing placeholder syntax

cursor.execute(
    "SELECT * FROM users WHERE username = ___ AND age > ___",
    (username, min_age)
)

Your answer:

✓

Correct!

SQLite uses ? as a placeholder for parameterized queries. This prevents SQL injection by automatically escaping values. Never use f-strings or string concatenation for SQL queries as they create security vulnerabilities.

✗

Incorrect

It’s a single character that acts as a placeholder.

Question 8

Which of the following are benefits of using requests.Session() instead of individual requests.get() calls?

Reuses TCP connections for better performance Automatically converts all responses to JSON Persists headers across multiple requests Maintains cookies automatically Eliminates the need for error handling

✓

Correct!

Sessions provide connection pooling (TCP reuse), header persistence, and automatic cookie handling. They don’t automatically convert responses to JSON (you still call .json()), and error handling is still necessary.

✗

Incorrect

Focus on what gets ‘reused’ or ‘persisted’ across requests.

Question 9

Arrange these steps in the correct order for making a production-ready single HTTP request (requests.get()):

Drag to arrange from first to last

⋮⋮ Define URL and query parameters

⋮⋮ Make GET request with timeout

⋮⋮ Call raise_for_status() to check for errors

⋮⋮ Parse JSON response

⋮⋮ Handle exceptions with try/except

✓

Correct!

The correct flow is: 1) Define the URL and any query parameters, 2) Open a try/except block to handle RequestException errors, 3) Make the GET request with a timeout to avoid hanging, 4) Call raise_for_status() to raise an HTTPError on 4xx/5xx responses, 5) Parse the JSON response body.

import requests

url = "https://api.example.com"

try:
    response = requests.get(url, timeout=3)
    response.raise_for_status()
    data = response.json()
except requests.exceptions.RequestException as e:
    print("Request failed:", e)

✗

Incorrect

import requests

url = "https://api.example.com"

try:
    response = requests.get(url, timeout=3)
    response.raise_for_status()
    data = response.json()
except requests.exceptions.RequestException as e:
    print("Request failed:", e)

Question 10

What does this CSV writing code produce?

import csv

data = [{'name': 'Alice', 'age': 30}]
with open('out.csv', 'w', newline='') as f:
    writer = csv.DictWriter(f, fieldnames=['name', 'age'])
    writer.writeheader()
    writer.writerows(data)

# How many lines are in out.csv?

What will this code output?

1 line (just the data row) 2 lines (header + data row) 3 lines (blank line between header and data) Error - writerows doesn't work with dictionaries

✓

Correct!

writeheader() writes the column names as the first line, then writerows() writes each dictionary as a data row. This produces 2 lines total. The newline='' parameter prevents blank lines between rows.

✗

Incorrect

Count what writeheader() and writerows() each contribute.

Question 11

JSONL (JSON Lines) format allows you to append new records to a file without rewriting the entire file.

True False

✓

Correct!

True! JSONL stores one JSON object per line, making it append-friendly. You can open the file in append mode (‘a’) and write new JSON objects, which is impossible with standard JSON arrays/objects without parsing and rewriting the entire file.

✗

Incorrect

Think about how standard JSON arrays require complete rewrites versus line-by-line formats.

Question 12

In SQLite, what happens if you execute INSERT/UPDATE/DELETE operations but forget to call conn.commit()?

The changes are saved automatically The changes are lost when the connection closes An error is raised immediately The changes are queued until the next SELECT query

✓

Correct!

SQLite uses transactions. Changes made with INSERT/UPDATE/DELETE are held in a transaction and not persisted to disk until you call conn.commit(). If you close the connection without committing, all changes are rolled back and lost.

✗

Incorrect

Think about the transactional nature of databases.

Question 13

In the requests library, what parameter prevents a request from hanging forever if the server doesn’t respond?

✓

Correct!

The timeout parameter (e.g., requests.get(url, timeout=5)) specifies how many seconds to wait before giving up. Without it, requests can hang indefinitely, especially on slow or unresponsive servers.

✗

Incorrect

It’s a parameter that specifies how long to wait in seconds.

Question 14

Why should you use ? placeholders instead of f-strings when building SQL queries?

F-strings are slower than placeholders Placeholders automatically escape values, preventing SQL injection attacks F-strings don’t work with SQLite Placeholders make queries easier to read

✓

Correct!

Parameterized queries with ? placeholders automatically escape special characters, preventing SQL injection vulnerabilities. Using f-strings or string concatenation allows malicious input like "'; DROP TABLE users; --" to execute arbitrary SQL commands.

✗

Incorrect

Think about security and what happens with malicious user input.

Question 15

When working with BeautifulSoup for HTML parsing, which methods can you use to find elements?

.find() - finds the first matching element .find_all() - finds all matching elements .select() - uses CSS selectors .xpath() - uses XPath expressions .search() - searches with regex patterns

✓

Correct!

BeautifulSoup provides .find() (first match), .find_all() (all matches), and .select() (CSS selectors). It does not have .xpath() (that’s lxml) or .search() methods. CSS selectors via .select() are often the most flexible approach.

✗

Incorrect

BeautifulSoup focuses on ‘find’ methods and CSS selectors.

Question 16

What will this code print?

import sqlite3

conn = sqlite3.connect(':memory:')
cursor = conn.cursor()

cursor.execute("CREATE TABLE test (id INTEGER PRIMARY KEY, name TEXT)")
cursor.execute("INSERT INTO test (name) VALUES ('Alice')")
print(cursor.lastrowid)

What will this code output?

0 1 None Error - lastrowid doesn't exist

✓

Correct!

cursor.lastrowid returns the auto-generated ID of the last inserted row. Since this is the first insert with an auto-incrementing PRIMARY KEY, it returns 1. This is useful for immediately getting the ID of newly created records.

✗

Incorrect

Think about what auto-increment primary keys start from.

Question 17

What is the mental model for working with JSON in Python?

Decode → Work → Encode

JSON text
    ↓
Decode (json.load / json.loads)
    ↓
Python objects
    ↓
Work with dicts/lists
    ↓
Encode (json.dump / json.dumps)
    ↓
JSON text again

Key insight: JSON is always a string format. You decode it to Python, work with native objects, then encode back to JSON when needed.

Did you get it right?

✓

Correct!

✗

Incorrect

Question 18

When using requests.Session() with retry logic via HTTPAdapter, you should retry POST and DELETE requests to ensure reliability.

True False

✓

Correct!

False! You should only retry safe (idempotent) methods like GET, HEAD, and OPTIONS. Retrying POST/PUT/DELETE can cause duplicate actions (creating multiple records, deleting twice, etc.) because these operations have side effects. The allowed_methods parameter in Retry() should exclude non-idempotent methods.

✗

Incorrect

Think about what happens if a POST request succeeds on the server but the response gets lost—should you retry?

Question 19

What is the primary difference between ElementTree and BeautifulSoup for parsing?

ElementTree is for XML, BeautifulSoup is for HTML ElementTree is faster, BeautifulSoup is more flexible ElementTree is built-in, BeautifulSoup requires installation ElementTree uses XPath, BeautifulSoup uses CSS selectors

✓

Correct!

While there’s some overlap, ElementTree is Python’s built-in XML parser (best for well-formed XML like RSS feeds, config files), while BeautifulSoup is a third-party library specialized for HTML parsing that can handle messy, malformed HTML commonly found on websites.

✗

Incorrect

Think about the primary use case for each library.

Question 20

What SQLite setting should you configure immediately after connecting to enable dictionary-like access to query results by column name?

✓

Correct!

Setting conn.row_factory = sqlite3.Row enables accessing columns by name (e.g., row['username']) instead of by index (e.g., row[0]). This makes code much more readable and maintainable.

✗

Incorrect

Setting conn.row_factory = sqlite3.Row enables accessing columns by name (e.g., row['username']) instead of by index (e.g., row[0]). This makes code much more readable and maintainable.

It’s about configuring how rows are returned from queries.

Question 21

In the requests library, what is the difference between the json parameter and the data parameter in POST requests?

They are identical—just different names for the same thing json automatically serializes to JSON and sets Content-Type header; data sends form-encoded data json is faster than data data is deprecated in favor of json

✓

Correct!

The json parameter automatically: 1) serializes your Python dict to JSON string, and 2) sets Content-Type: application/json. The data parameter sends form-encoded data (application/x-www-form-urlencoded) and requires manual JSON serialization if you want JSON.

✗

Incorrect

Think about what each parameter does to the Content-Type header.

Question 22

Complete the code to safely navigate nested JSON and provide a default value:

Fill in the safe navigation method

# Safely get city from: data['user']['profile']['location']['city']
city = data.___('user', {}).___('profile', {}).___('location', {}).___('city', 'Unknown')

Your answer:

✓

Correct!

Using chained .get() methods with empty dict defaults {} safely navigates nested JSON without raising KeyError. If any key is missing, it returns the default value instead of crashing. The final .get('city', 'Unknown') provides a fallback if city is missing.

✗

Incorrect

It’s a dictionary method that returns a default value if the key doesn’t exist.

Question 23

Arrange these SQLite operations in the correct execution order:

Drag to arrange in the correct order

⋮⋮ Configure connection (optional)

⋮⋮ Connect to database

⋮⋮ Create cursor object

⋮⋮ Execute SQL query

⋮⋮ Fetch results (if SELECT)

⋮⋮ Commit changes (if INSERT/UPDATE/DELETE)

⋮⋮ Close connection

✓

Correct!

The correct workflow is: 1) Connect to database, 2) Configure connection optionally (e.g. row_factory), 3) Create cursor, 4) Execute SQL, 5) Fetch results (for SELECT), 6) Commit changes (for INSERT/UPDATE/DELETE), 7) Close connection.

✗

Incorrect

Question 24

Which HTTP status codes should you typically configure for automatic retries in a retry strategy?

404 Not Found 429 Too Many Requests (rate limit) 500 Internal Server Error 401 Unauthorized 503 Service Unavailable

✓

Correct!

Retry on transient errors: 429 (rate limit—wait and retry), 500/502/503/504 (temporary server issues). Don’t retry 404 (resource doesn’t exist—won’t change) or 401 (auth failed—need different credentials, not retry).

✗

Incorrect

Think about which errors are temporary versus permanent.

Question 25

Arrange the high-level workflow steps for designing production-ready HTTP request logic:

Drag to arrange in the correct order

⋮⋮ Build request (method + URL + params)

⋮⋮ Choose approach (single request vs. Session)

⋮⋮ Handle errors (RequestException hierarchy)

⋮⋮ Handle response (status code + parse body)

⋮⋮ Add retry / rate limiting (optional)

✓

Correct!

The progressive workflow: 1) Choose between a single requests.get() call or a Session for multiple requests, 2) Build the request with method, URL, headers, params, and timeout, 3) Handle the response by checking status and parsing body, 4) Catch errors in the RequestException hierarchy, 5) Optionally harden with retry logic and rate limiting.

✗

Incorrect

Question 26

When using pandas to read CSV files, you must manually handle data type conversions for numeric columns.

True False

✓

Correct!

False! Pandas automatically infers data types when reading CSV files with pd.read_csv(). Numeric columns are converted to int/float, dates can be auto-parsed, etc. This is one of pandas’ key advantages over the built-in csv module.

✗

Incorrect

Think about pandas’ automatic type inference capabilities.

Question 27

What is the purpose of the backoff_factor parameter in the Retry strategy?

It sets the maximum number of retries It determines which HTTP methods to retry It controls exponential wait time between retries (e.g., 1s, 2s, 4s, 8s) It specifies which status codes trigger retries

✓

Correct!

backoff_factor controls the exponential delay between retry attempts. With backoff_factor=1, waits are 1s, 2s, 4s, 8s, etc. This prevents overwhelming the server and gives it time to recover. Formula: {backoff_factor} * (2 ** retry_number).

✗

Incorrect

Think about the time delay pattern between retry attempts.

Question 28

What will this code print?

import json

data = {'a': 1, 'b': 2}
json_str = json.dumps(data, separators=(',', ':'))
print(len(json_str))

What will this code output?

9 11 13 15

✓

Correct!

separators=(',', ':') creates compact JSON with no spaces: {"a":1,"b":2} — count: {, "a", :, 1, ,, "b", :, 2, } = 13 characters. The default separators (', ', ': ') include spaces, producing {"a": 1, "b": 2} (16 characters). Compact format is useful for minimizing file/network size.

✗

Incorrect

Count the characters in compact JSON: {“a”:1,“b”:2}

Question 29

Which of the following are best practices when working with SQLite in Python?

Always use parameterized queries with ? placeholders Use executemany() for batch inserts instead of individual inserts Set conn.row_factory = sqlite3.Row for readable column access Call commit() after every single SQL statement Use :memory: databases for testing

✓

Correct!

Best practices: parameterized queries (prevent SQL injection), batch operations (executemany), Row factory (readable code), and in-memory DBs for tests. You should not commit after every statement—batch commits into transactions for better performance.

✗

Incorrect

Think about security, performance, readability, and testing.

Question 30

When scraping HTML with BeautifulSoup, what’s the most flexible way to select elements?

.find() with tag name .find_all() with class name .select() with CSS selectors Direct attribute access

✓

Correct!

.select() with CSS selectors is the most flexible because it supports complex queries like 'div.content p' (all <p> inside div.content), pseudo-selectors, attribute matching, etc. It’s the same selector syntax used in CSS and jQuery.

✗

Incorrect

Think about which method supports the most complex selection patterns.

Question 31

In SQLite, what SQL clause makes a table creation idempotent (safe to run multiple times)?

✓

Correct!

CREATE TABLE IF NOT EXISTS only creates the table if it doesn’t already exist, making the operation idempotent. Without this, running the CREATE TABLE statement twice raises an error.

✗

Incorrect

CREATE TABLE IF NOT EXISTS only creates the table if it doesn’t already exist, making the operation idempotent. Without this, running the CREATE TABLE statement twice raises an error.

It’s a clause that checks for existence before creating.

Question 32

In the requests library, calling response.json() is equivalent to calling json.loads(response.text).

True False

✓

Correct!

True! response.json() is a convenience method that internally calls json.loads(response.text). It parses the JSON string from the response body into a Python dictionary. If the response isn’t valid JSON, both will raise json.JSONDecodeError.

✗

Incorrect

Think about what response.json() does under the hood.

Question 33

When should you use requests.Session vs. a plain requests.get()?

Single request: Use requests.get() / requests.post() etc. directly
Multiple requests to the same host: Use requests.Session() — it reuses the TCP connection and shares headers/cookies/auth across all requests

import requests

BASE_URL = "https://api.example.com"
TOKEN = "my-secret-token"

# Single request
response = requests.get(f"{BASE_URL}/status", timeout=5)
print(response.status_code)

# Session (multiple requests)
with requests.Session() as session:
    session.headers.update({"Authorization": f"Bearer {TOKEN}"})

    r1 = session.get(f"{BASE_URL}/users", timeout=5)
    r2 = session.post(f"{BASE_URL}/users", json={"name": "Alice"}, timeout=5)

    print(r1.json())
    print(r2.json())

Did you get it right?

✓

Correct!

✗

Incorrect

Question 34

What are the three error layers to handle when using the requests library?

Network errors — connection failed, DNS failure, timeout → requests.ConnectionError, requests.Timeout
HTTP errors — server returned 4xx / 5xx → response.raise_for_status() raises requests.HTTPError
JSON parse errors — response body isn’t valid JSON → json.JSONDecodeError from response.json()

All network/HTTP errors inherit from requests.RequestException.

Did you get it right?

✓

Correct!

✗

Incorrect

Question 35

What makes HTTP requests production-ready in Python?

Always set timeout — prevents hanging indefinitely
Use Session — connection reuse + shared config for multiple requests
Retry only safe methods — GET, HEAD (not POST/DELETE which aren’t idempotent)
Rate limiting — use @limits decorator (ratelimit library) to avoid overwhelming APIs

import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
from ratelimit import limits, sleep_and_retry

CALLS_PER_MINUTE = 60

@sleep_and_retry
@limits(calls=CALLS_PER_MINUTE, period=60)
def fetch_data(session, url):
    response = session.get(url, timeout=5)
    response.raise_for_status()
    return response.json()

def build_session():
    session = requests.Session()
    session.headers.update({"Authorization": "Bearer my-token"})

    retry = Retry(
        total=3,
        status_forcelist=[429, 500, 502, 503, 504],
        allowed_methods=["GET", "HEAD"]
    )
    adapter = HTTPAdapter(max_retries=retry)
    session.mount("https://", adapter)
    return session

with build_session() as session:
    data = fetch_data(session, "https://api.example.com/users")
    print(data)

Did you get it right?

✓

Correct!

✗

Incorrect

Quiz Results

Score

0/0

Accuracy

Right

Wrong

Skipped

Last updated on March 28, 2026

Advanced Functions Log Analysis