Working With Data Quiz
Quiz
csv.DictReader over csv.reader?csv.DictReader treats the first row as column headers and returns each row as a dictionary, allowing you to access values by column name (e.g., row['product']) rather than by numeric index (e.g., row[0]). This makes code more readable and maintainable.csv.DictReader treats the first row as column headers and returns each row as a dictionary, allowing you to access values by column name (e.g., row['product']) rather than by numeric index (e.g., row[0]). This makes code more readable and maintainable.import json
data = {"status": "ok", "count": 42}
json_str = json.dumps(data)
print(type(json_str))json.dumps() (with an ’s’ for string) converts a Python dictionary to a JSON-formatted string. The ’s’ suffix is the key reminder: dumps = dump to string, dump = dump to file.json.dumps() (with an ’s’ for string) converts a Python dictionary to a JSON-formatted string. The ’s’ suffix is the key reminder: dumps = dump to string, dump = dump to file..get() with defaults, helper functions that chain .get() calls, and isinstance() checks before accessing nested data. Direct key access (data['key']) raises KeyError if the key is missing. While try/except works, it’s verbose and less elegant than .get()..get() with defaults, helper functions that chain .get() calls, and isinstance() checks before accessing nested data. Direct key access (data['key']) raises KeyError if the key is missing. While try/except works, it’s verbose and less elegant than .get().json module, json.load() reads from a file while json.loads() reads from a string.load() reads from a file object, loads() (load string) reads from a string. Similarly, dump() writes to a file, dumps() returns a string.load() reads from a file object, loads() (load string) reads from a string. Similarly, dump() writes to a file, dumps() returns a string.open() function when writing CSV files to prevent blank rows on Windows?newline='' parameter prevents the csv module from writing blank lines between rows on Windows systems. This is a cross-platform best practice when using Python’s csv module.newline='' parameter prevents the csv module from writing blank lines between rows on Windows systems. This is a cross-platform best practice when using Python’s csv module.requests library, call ___ to raise an HTTPError exception if the status code indicates an error (4xx or 5xx).response.raise_for_status() checks if the HTTP status code indicates an error (4xx client errors or 5xx server errors) and raises an HTTPError exception if so. This allows you to use try/except for error handling instead of manually checking status_code.response.raise_for_status() checks if the HTTP status code indicates an error (4xx client errors or 5xx server errors) and raises an HTTPError exception if so. This allows you to use try/except for error handling instead of manually checking status_code.cursor.execute(
"SELECT * FROM users WHERE username = ___ AND age > ___",
(username, min_age)
)? as a placeholder for parameterized queries. This prevents SQL injection by automatically escaping values. Never use f-strings or string concatenation for SQL queries as they create security vulnerabilities.? as a placeholder for parameterized queries. This prevents SQL injection by automatically escaping values. Never use f-strings or string concatenation for SQL queries as they create security vulnerabilities.requests.Session() instead of individual requests.get() calls?.json()), and error handling is still necessary..json()), and error handling is still necessary.requests.get()):The correct flow is: 1) Define the URL and any query parameters, 2) Open a try/except block to handle RequestException errors, 3) Make the GET request with a timeout to avoid hanging, 4) Call raise_for_status() to raise an HTTPError on 4xx/5xx responses, 5) Parse the JSON response body.
import requests
url = "https://api.example.com"
try:
response = requests.get(url, timeout=3)
response.raise_for_status()
data = response.json()
except requests.exceptions.RequestException as e:
print("Request failed:", e)The correct flow is: 1) Define the URL and any query parameters, 2) Open a try/except block to handle RequestException errors, 3) Make the GET request with a timeout to avoid hanging, 4) Call raise_for_status() to raise an HTTPError on 4xx/5xx responses, 5) Parse the JSON response body.
import requests
url = "https://api.example.com"
try:
response = requests.get(url, timeout=3)
response.raise_for_status()
data = response.json()
except requests.exceptions.RequestException as e:
print("Request failed:", e)import csv
data = [{'name': 'Alice', 'age': 30}]
with open('out.csv', 'w', newline='') as f:
writer = csv.DictWriter(f, fieldnames=['name', 'age'])
writer.writeheader()
writer.writerows(data)
# How many lines are in out.csv?writeheader() writes the column names as the first line, then writerows() writes each dictionary as a data row. This produces 2 lines total. The newline='' parameter prevents blank lines between rows.writeheader() writes the column names as the first line, then writerows() writes each dictionary as a data row. This produces 2 lines total. The newline='' parameter prevents blank lines between rows.conn.commit()?conn.commit(). If you close the connection without committing, all changes are rolled back and lost.conn.commit(). If you close the connection without committing, all changes are rolled back and lost.requests library, what parameter prevents a request from hanging forever if the server doesn’t respond?timeout parameter (e.g., requests.get(url, timeout=5)) specifies how many seconds to wait before giving up. Without it, requests can hang indefinitely, especially on slow or unresponsive servers.timeout parameter (e.g., requests.get(url, timeout=5)) specifies how many seconds to wait before giving up. Without it, requests can hang indefinitely, especially on slow or unresponsive servers.? placeholders instead of f-strings when building SQL queries?? placeholders automatically escape special characters, preventing SQL injection vulnerabilities. Using f-strings or string concatenation allows malicious input like "'; DROP TABLE users; --" to execute arbitrary SQL commands.? placeholders automatically escape special characters, preventing SQL injection vulnerabilities. Using f-strings or string concatenation allows malicious input like "'; DROP TABLE users; --" to execute arbitrary SQL commands..find() (first match), .find_all() (all matches), and .select() (CSS selectors). It does not have .xpath() (that’s lxml) or .search() methods. CSS selectors via .select() are often the most flexible approach..find() (first match), .find_all() (all matches), and .select() (CSS selectors). It does not have .xpath() (that’s lxml) or .search() methods. CSS selectors via .select() are often the most flexible approach.import sqlite3
conn = sqlite3.connect(':memory:')
cursor = conn.cursor()
cursor.execute("CREATE TABLE test (id INTEGER PRIMARY KEY, name TEXT)")
cursor.execute("INSERT INTO test (name) VALUES ('Alice')")
print(cursor.lastrowid)cursor.lastrowid returns the auto-generated ID of the last inserted row. Since this is the first insert with an auto-incrementing PRIMARY KEY, it returns 1. This is useful for immediately getting the ID of newly created records.cursor.lastrowid returns the auto-generated ID of the last inserted row. Since this is the first insert with an auto-incrementing PRIMARY KEY, it returns 1. This is useful for immediately getting the ID of newly created records.Decode → Work → Encode
JSON text
↓
Decode (json.load / json.loads)
↓
Python objects
↓
Work with dicts/lists
↓
Encode (json.dump / json.dumps)
↓
JSON text againKey insight: JSON is always a string format. You decode it to Python, work with native objects, then encode back to JSON when needed.
Did you get it right?
requests.Session() with retry logic via HTTPAdapter, you should retry POST and DELETE requests to ensure reliability.allowed_methods parameter in Retry() should exclude non-idempotent methods.allowed_methods parameter in Retry() should exclude non-idempotent methods.conn.row_factory = sqlite3.Row enables accessing columns by name (e.g., row['username']) instead of by index (e.g., row[0]). This makes code much more readable and maintainable.conn.row_factory = sqlite3.Row enables accessing columns by name (e.g., row['username']) instead of by index (e.g., row[0]). This makes code much more readable and maintainable.json parameter and the data parameter in POST requests?json parameter automatically: 1) serializes your Python dict to JSON string, and 2) sets Content-Type: application/json. The data parameter sends form-encoded data (application/x-www-form-urlencoded) and requires manual JSON serialization if you want JSON.json parameter automatically: 1) serializes your Python dict to JSON string, and 2) sets Content-Type: application/json. The data parameter sends form-encoded data (application/x-www-form-urlencoded) and requires manual JSON serialization if you want JSON.# Safely get city from: data['user']['profile']['location']['city']
city = data.___('user', {}).___('profile', {}).___('location', {}).___('city', 'Unknown').get() methods with empty dict defaults {} safely navigates nested JSON without raising KeyError. If any key is missing, it returns the default value instead of crashing. The final .get('city', 'Unknown') provides a fallback if city is missing..get() methods with empty dict defaults {} safely navigates nested JSON without raising KeyError. If any key is missing, it returns the default value instead of crashing. The final .get('city', 'Unknown') provides a fallback if city is missing.row_factory), 3) Create cursor, 4) Execute SQL, 5) Fetch results (for SELECT), 6) Commit changes (for INSERT/UPDATE/DELETE), 7) Close connection.row_factory), 3) Create cursor, 4) Execute SQL, 5) Fetch results (for SELECT), 6) Commit changes (for INSERT/UPDATE/DELETE), 7) Close connection.requests.get() call or a Session for multiple requests, 2) Build the request with method, URL, headers, params, and timeout, 3) Handle the response by checking status and parsing body, 4) Catch errors in the RequestException hierarchy, 5) Optionally harden with retry logic and rate limiting.requests.get() call or a Session for multiple requests, 2) Build the request with method, URL, headers, params, and timeout, 3) Handle the response by checking status and parsing body, 4) Catch errors in the RequestException hierarchy, 5) Optionally harden with retry logic and rate limiting.pd.read_csv(). Numeric columns are converted to int/float, dates can be auto-parsed, etc. This is one of pandas’ key advantages over the built-in csv module.pd.read_csv(). Numeric columns are converted to int/float, dates can be auto-parsed, etc. This is one of pandas’ key advantages over the built-in csv module.backoff_factor parameter in the Retry strategy?backoff_factor controls the exponential delay between retry attempts. With backoff_factor=1, waits are 1s, 2s, 4s, 8s, etc. This prevents overwhelming the server and gives it time to recover. Formula: {backoff_factor} * (2 ** retry_number).backoff_factor controls the exponential delay between retry attempts. With backoff_factor=1, waits are 1s, 2s, 4s, 8s, etc. This prevents overwhelming the server and gives it time to recover. Formula: {backoff_factor} * (2 ** retry_number).import json
data = {'a': 1, 'b': 2}
json_str = json.dumps(data, separators=(',', ':'))
print(len(json_str))separators=(',', ':') creates compact JSON with no spaces: {"a":1,"b":2} — count: {, "a", :, 1, ,, "b", :, 2, } = 13 characters. The default separators (', ', ': ') include spaces, producing {"a": 1, "b": 2} (16 characters). Compact format is useful for minimizing file/network size.separators=(',', ':') creates compact JSON with no spaces: {"a":1,"b":2} — count: {, "a", :, 1, ,, "b", :, 2, } = 13 characters. The default separators (', ', ': ') include spaces, producing {"a": 1, "b": 2} (16 characters). Compact format is useful for minimizing file/network size.executemany), Row factory (readable code), and in-memory DBs for tests. You should not commit after every statement—batch commits into transactions for better performance.executemany), Row factory (readable code), and in-memory DBs for tests. You should not commit after every statement—batch commits into transactions for better performance..select() with CSS selectors is the most flexible because it supports complex queries like 'div.content p' (all <p> inside div.content), pseudo-selectors, attribute matching, etc. It’s the same selector syntax used in CSS and jQuery..select() with CSS selectors is the most flexible because it supports complex queries like 'div.content p' (all <p> inside div.content), pseudo-selectors, attribute matching, etc. It’s the same selector syntax used in CSS and jQuery.CREATE TABLE IF NOT EXISTS only creates the table if it doesn’t already exist, making the operation idempotent. Without this, running the CREATE TABLE statement twice raises an error.CREATE TABLE IF NOT EXISTS only creates the table if it doesn’t already exist, making the operation idempotent. Without this, running the CREATE TABLE statement twice raises an error.response.json() is equivalent to calling json.loads(response.text).response.json() is a convenience method that internally calls json.loads(response.text). It parses the JSON string from the response body into a Python dictionary. If the response isn’t valid JSON, both will raise json.JSONDecodeError.response.json() is a convenience method that internally calls json.loads(response.text). It parses the JSON string from the response body into a Python dictionary. If the response isn’t valid JSON, both will raise json.JSONDecodeError.requests.Session vs. a plain requests.get()?requests.Session vs. a plain requests.get()?- Single request: Use
requests.get()/requests.post()etc. directly - Multiple requests to the same host: Use
requests.Session()— it reuses the TCP connection and shares headers/cookies/auth across all requests
import requests
BASE_URL = "https://api.example.com"
TOKEN = "my-secret-token"
# Single request
response = requests.get(f"{BASE_URL}/status", timeout=5)
print(response.status_code)
# Session (multiple requests)
with requests.Session() as session:
session.headers.update({"Authorization": f"Bearer {TOKEN}"})
r1 = session.get(f"{BASE_URL}/users", timeout=5)
r2 = session.post(f"{BASE_URL}/users", json={"name": "Alice"}, timeout=5)
print(r1.json())
print(r2.json())Did you get it right?
requests library?requests library?Network errors — connection failed, DNS failure, timeout →
requests.ConnectionError,requests.TimeoutHTTP errors — server returned 4xx / 5xx →
response.raise_for_status()raisesrequests.HTTPErrorJSON parse errors — response body isn’t valid JSON →
json.JSONDecodeErrorfromresponse.json()
All network/HTTP errors inherit from requests.RequestException.
Did you get it right?
- Always set
timeout— prevents hanging indefinitely - Use
Session— connection reuse + shared config for multiple requests - Retry only safe methods — GET, HEAD (not POST/DELETE which aren’t idempotent)
- Rate limiting — use
@limitsdecorator (ratelimit library) to avoid overwhelming APIs
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
from ratelimit import limits, sleep_and_retry
CALLS_PER_MINUTE = 60
@sleep_and_retry
@limits(calls=CALLS_PER_MINUTE, period=60)
def fetch_data(session, url):
response = session.get(url, timeout=5)
response.raise_for_status()
return response.json()
def build_session():
session = requests.Session()
session.headers.update({"Authorization": "Bearer my-token"})
retry = Retry(
total=3,
status_forcelist=[429, 500, 502, 503, 504],
allowed_methods=["GET", "HEAD"]
)
adapter = HTTPAdapter(max_retries=retry)
session.mount("https://", adapter)
return session
with build_session() as session:
data = fetch_data(session, "https://api.example.com/users")
print(data)Did you get it right?