Log Analysis

Log Analysis Quiz

Quiz

Question 1 of 58 (0 answered)

Question 1

When should you use string manipulation (split, find, slice) over regular expressions for log parsing?

When you need to validate the log format When the log format is simple, predictable, and has clear delimiters When you need to handle multiple varying formats When you need to extract complex patterns like email addresses

✓

Correct!

String manipulation is ideal for simple, predictable log formats with clear delimiters. It’s fast, has no dependencies, and is easy to understand. Use regex when you need validation, handle variations, or extract complex patterns.

✗

Incorrect

Think about when you need the simplest, fastest approach without validation.

Question 2

Which of the following are advantages of using regular expressions with named groups for log parsing?

Validation is built into the pattern matching Faster performance than string methods Self-documenting code with descriptive group names Handles format variations gracefully No regex knowledge required

✓

Correct!

Regular expressions provide built-in validation, are self-documenting with named groups, and handle variations well. However, they are slower than string methods and do require regex knowledge.

✗

Incorrect

Regular expressions provide built-in validation, are self-documenting with named groups, and handle variations well. However, they are slower than string methods and do require regex knowledge.

Consider what regex offers beyond simple string splitting.

Question 3

What will be printed?

import json
event = json.loads('{"type": "Warning", "obj": {"name": "pod1"}}')
result = event.get("obj", {}).get("namespace", "default")
print(result)

What will this code output?

None default Error {}

✓

Correct!

The ’namespace’ key doesn’t exist in the ‘obj’ dictionary, so .get(’namespace’, ‘default’) returns the default value ‘default’. This is the safe way to handle missing nested JSON fields.

✗

Incorrect

Think about what .get() returns when a key is missing and a default is provided.

Question 4

What dictionary method should you use to safely access nested JSON fields that might not exist?

✓

Correct!

The dict.get() method returns None (or a specified default) if the key doesn’t exist, preventing KeyError exceptions. For nested structures, use chained get calls: event.get(‘obj’, {}).get(‘kind’)

✗

Incorrect

It returns a default value instead of raising an exception.

Question 5

Which data structure is best for counting the frequency of log levels (INFO, WARN, ERROR) when you also need to find the top 3 most common levels?

Regular dictionary with .get() defaultdict(int) Counter set

✓

Correct!

Counter is ideal when you need .most_common() functionality. While dict.get() and defaultdict(int) can count, Counter provides built-in .most_common(N) which is perfect for finding top N items.

✗

Incorrect

Counter is ideal when you need .most_common() functionality. While dict.get() and defaultdict(int) can count, Counter provides built-in .most_common(N) which is perfect for finding top N items.

Which structure has a built-in method for getting the most common items?

Question 6

defaultdict(int) automatically initializes missing keys to 0, eliminating the need for .get() or existence checks when counting.

True False

✓

Correct!

This is true. defaultdict(int) auto-initializes missing keys to 0, making code cleaner: ip_counts[ip] += 1 works without checking if ip exists first.

✗

Incorrect

This is true. defaultdict(int) auto-initializes missing keys to 0, making code cleaner: ip_counts[ip] += 1 works without checking if ip exists first.

Think about what ‘default’ means in defaultdict.

Question 7

Arrange these data structure choices from most appropriate to least appropriate for tracking unique pod names:

Drag to arrange from most to least appropriate

⋮⋮ set

⋮⋮ Counter

⋮⋮ list

⋮⋮ defaultdict(int)

✓

Correct!

For uniqueness: set is ideal (O(1) lookup, auto-deduplication). Counter works but is overkill. defaultdict(int) can work but adds complexity. list is worst (O(n) lookup, no auto-deduplication).

✗

Incorrect

For uniqueness: set is ideal (O(1) lookup, auto-deduplication). Counter works but is overkill. defaultdict(int) can work but adds complexity. list is worst (O(n) lookup, no auto-deduplication).

Question 8

Complete the code to safely handle division by zero when calculating average latency:

Fill in the condition

total_latency = sum(latencies)
count = len(latencies)
_____:
    avg_latency = total_latency / count
else:
    avg_latency = 0

Your answer:

✓

Correct!

Always check that the divisor is greater than 0 before dividing to avoid ZeroDivisionError. This is a critical best practice in log analysis.

✗

Incorrect

Always check that the divisor is greater than 0 before dividing to avoid ZeroDivisionError. This is a critical best practice in log analysis.

Question 9

What is the result of subtracting two datetime objects?

A new datetime object A timedelta object representing the duration An integer representing seconds TypeError - this operation is invalid

✓

Correct!

datetime2 - datetime1 = timedelta. You cannot add two datetime objects (meaningless), but subtracting them gives you a duration (timedelta) between them.

✗

Incorrect

datetime2 - datetime1 = timedelta. You cannot add two datetime objects (meaningless), but subtracting them gives you a duration (timedelta) between them.

Think about what represents a duration or time difference.

Question 10

What will be the output?

from datetime import timedelta
td1 = timedelta(hours=2)
td2 = timedelta(hours=3)
avg = (td1 + td2) / 2
print(type(avg).__name__)

What will this code output?

int float timedelta datetime

✓

Correct!

Adding timedelta objects produces another timedelta. Dividing a timedelta by an integer also produces a timedelta. This is how you calculate average durations.

✗

Incorrect

Adding timedelta objects produces another timedelta. Dividing a timedelta by an integer also produces a timedelta. This is how you calculate average durations.

Think about what type represents a duration.

Question 11

Which operations are valid in Python’s datetime module?

datetime + datetime datetime - datetime timedelta + timedelta datetime + timedelta timedelta / integer

✓

Correct!

Valid: datetime - datetime = timedelta, timedelta + timedelta = timedelta, datetime + timedelta = datetime, timedelta / int = timedelta. Invalid: datetime + datetime (conceptually meaningless - you can’t add two points in time).

✗

Incorrect

You can add durations together or to points in time, but not points in time to each other.

Question 12

What is the two-pass processing pattern and when should you use it?

Two-Pass Processing Pattern:

A two-pass processing pattern separates analysis from mutation by processing the dataset twice.

Pass 1: Analyze (read-only)

Scan the data to detect patterns or conflicts and build the required state (e.g., find duplicate UIDs and create a reassignment map).

Pass 2: Apply (write/action)

Apply corrections using only the results from Pass 1, without making new decisions.

When to use: Use this pattern when fixes depend on global knowledge of the data, and modifying records safely requires seeing the full dataset first. Example: Finding duplicate UIDs and reassigning them to available unique values.

Did you get it right?

✓

Correct!

✗

Incorrect

Question 13

When parsing log files, what should you ALWAYS do before checking if a line is empty?

Convert to lowercase Strip whitespace with .strip() Split by delimiters Check if it starts with ‘#’

✓

Correct!

Always strip whitespace BEFORE checking emptiness. Whitespace-only lines won’t be caught by ‘if not line’ but will be caught by ‘if not line.strip()’. The pattern is: cleaned_line = line.strip(); if not cleaned_line or cleaned_line.startswith(’#’): continue

✗

Incorrect

What handles lines that contain only spaces or tabs?

Question 14

When using find() to locate substrings, it returns -1 if the substring is not found, which can cause slice errors if not handled properly.

True False

✓

Correct!

True. find() returns -1 if not found, which can cause issues with slicing: line[start+1:end] becomes line[0:end] if start is -1. Use index() instead (raises ValueError) or check if find() returned -1.

✗

Incorrect

What happens when you use -1 as a slice index?

Question 15

What will this code print?

count = {}
for item in ['a', 'b', 'a', 'c', 'a']:
    count[item] = count.get(item, 0) + 1
print(count['a'])

What will this code output?

1 2 3 Error

✓

Correct!

The .get() pattern safely counts occurrences. ‘a’ appears 3 times in the list, so count[‘a’] = 3. This is the most Pythonic way to count with regular dictionaries.

✗

Incorrect

The .get() pattern safely counts occurrences. ‘a’ appears 3 times in the list, so count[‘a’] = 3. This is the most Pythonic way to count with regular dictionaries.

Count how many times ‘a’ appears in the list.

Question 16

Which approach is better for finding the top 3 most frequent error codes from parsed logs?

Manual sorting: sorted(err_counts.items(), key=lambda x: x[1], reverse=True)[:3] Using Counter: err_counts.most_common(3) Using max() three times with removal Both A and B are equally good

✓

Correct!

Both approaches work well. sorted() with lambda is versatile for any sorting need. Counter.most_common() is more concise and readable when you’re already using Counter. Choose based on whether you need other Counter features.

✗

Incorrect

Consider readability and whether you’re already using the Counter data structure.

Question 17

What regex pattern matches an IP address in log parsing?

✓

Correct!

The pattern \d+\.\d+\.\d+\.\d+ matches IP addresses like 192.168.1.10. \d+ matches one or more digits, \. matches a literal dot (escaped because . is a special regex character).

✗

Incorrect

The pattern \d+\.\d+\.\d+\.\d+ matches IP addresses like 192.168.1.10. \d+ matches one or more digits, \. matches a literal dot (escaped because . is a special regex character).

Remember that dots need to be escaped in regex.

Question 18

When should you load an entire log file into memory before processing?

Always - it’s faster Never - it wastes memory When you need to sort by timestamp or match entries across the file Only for JSON files

✓

Correct!

Load into memory when you need to: sort by timestamp, match entries across the file (e.g., pair login/logout), or look ahead/back. Otherwise, process line-by-line for memory efficiency.

✗

Incorrect

Load into memory when you need to: sort by timestamp, match entries across the file (e.g., pair login/logout), or look ahead/back. Otherwise, process line-by-line for memory efficiency.

Think about when you need to see all data before you can process it correctly.

Question 19

Complete the code to track only the first login timestamp for each user:

Fill in the missing condition

active_sessions = {}
for event in events:
    user_id = event['user_id']
    if event['action'] == 'login' and _____:
        active_sessions[user_id] = event['timestamp']

Your answer:

✓

Correct!

Check ‘user_id not in active_sessions’ to ensure you only record the first occurrence. This pattern is crucial for tracking first-only events.

✗

Incorrect

Check ‘user_id not in active_sessions’ to ensure you only record the first occurrence. This pattern is crucial for tracking first-only events.

You want to add the user only if they haven’t been added yet.

Question 20

Which statements about defaultdict and Counter are correct?

defaultdict(list) auto-initializes missing keys to empty lists Counter can combine counts using arithmetic: count1 + count2 defaultdict is always faster than regular dictionaries Counter.most_common() returns items sorted by count in descending order defaultdict creates keys on access, even for lookups

✓

Correct!

All statements are correct except the third one. defaultdict(list) creates empty lists for missing keys. Counter supports arithmetic operations. most_common() sorts by count descending. defaultdict does create keys on access (a potential gotcha). defaultdict isn’t always faster - just more convenient.

✗

Incorrect

Think about the trade-offs and special features of each structure.

Question 21

What is the difference between find() and index() when searching for substrings?

find()

Returns -1 if substring not found
Silent failure - can cause subtle bugs with slicing
Use when you want to check and handle missing substrings with conditionals

index()

Raises ValueError if substring not found
Explicit error handling with try/except
Clearer for error cases

Best practice: Use index() with try/except for clearer error handling in log parsing.

Did you get it right?

✓

Correct!

✗

Incorrect

Question 22

What will this code print?

import re
pattern = r'\[(?P<LEVEL>\w+)\]'
line = '[ERROR] Connection failed'
match = re.search(pattern, line)
print(match.groupdict()['LEVEL'])

What will this code output?

ERROR [ERROR] LEVEL Error

✓

Correct!

Named groups (?P<name>pattern) capture matching text without the surrounding characters. The pattern captures just the word inside brackets. match.groupdict()[‘LEVEL’] returns ‘ERROR’.

✗

Incorrect

Named groups (?P<name>pattern) capture matching text without the surrounding characters. The pattern captures just the word inside brackets. match.groupdict()[‘LEVEL’] returns ‘ERROR’.

Named groups capture what’s inside the parentheses, not the surrounding brackets.

Question 23

What is the main disadvantage of using regular expressions compared to simple string manipulation for log parsing?

Regular expressions cannot handle complex patterns Regular expressions are slower and require regex knowledge Regular expressions don’t support named groups Regular expressions cannot validate input

✓

Correct!

The main cons of regex are: slower performance than string methods and requires regex knowledge. However, regex excels at validation, complex patterns, and handling variations - which string methods cannot do well.

✗

Incorrect

Think about the trade-offs mentioned in the ‘Cons’ section.

Question 24

For line-delimited JSON logs (one JSON object per line), you should use json.loads() on each line individually rather than loading the entire file as one JSON array.

True False

✓

Correct!

True. Line-delimited JSON has one JSON object per line, not a JSON array. Process each line with json.loads(line) individually. This is common in Kubernetes events and structured application logs.

✗

Incorrect

True. Line-delimited JSON has one JSON object per line, not a JSON array. Process each line with json.loads(line) individually. This is common in Kubernetes events and structured application logs.

Think about whether each line is a complete JSON object or part of a larger structure.

Question 25

Why should you compile regex patterns outside of loops?

It makes the code more readable It’s required by Python’s re module It improves performance by avoiding repeated compilation It allows you to use named groups

✓

Correct!

Compiling regex patterns outside loops (pattern = re.compile(r’…’)) improves performance by avoiding repeated compilation on every iteration. This is a key performance optimization for log parsing.

✗

Incorrect

Think about what happens when you compile the same pattern thousands of times.

Question 26

Complete the pattern to match quoted content in a log line:

Fill in the regex pattern

import re
pattern = r'_____'
line = '"GET /api/users HTTP/1.1"'
match = re.search(pattern, line)
request = match.group(1)

Your answer:

✓

Correct!

The pattern “(.*?)” or \"(.*?)\" matches quoted content. .*? is non-greedy quantifier (stops at first closing quote). Parentheses create a capture group accessible via .group(1).

✗

Incorrect

The pattern “(.*?)” or \"(.*?)\" matches quoted content. .*? is non-greedy quantifier (stops at first closing quote). Parentheses create a capture group accessible via .group(1).

Use .*? for non-greedy matching between quotes.

Question 27

What data structures should you use for counting occurrences in log analysis?

Three options with trade-offs:

dict with .get() - Simple counting, minimal overhead

count[item] = count.get(item, 0) + 1

defaultdict(int) - Cleaner code, no .get() needed

count[item] += 1

Counter - When you need .most_common() or count arithmetic

Choose based on needs: Start simple (dict), use defaultdict for cleaner code, use Counter when you need its special features.

Did you get it right?

✓

Correct!

✗

Incorrect

Question 28

What data structure should you use for tracking unique items in log analysis?

set - Ideal for uniqueness tracking

Benefits:

O(1) membership testing: if item in seen_set
Automatic deduplication
Memory efficient for large unique sets

Example use cases: Track unique IP addresses, pod names, user IDs

Alternative: dict keys work but are overkill unless you need associated values.

Did you get it right?

✓

Correct!

✗

Incorrect

Question 29

What data structure should you use for grouping log entries by a key (e.g., events by pod name)?

defaultdict(list) - Purpose-built for grouping

Why it’s ideal:

Auto-initializes missing keys to empty lists
No existence checks needed: pod_events[pod_name].append(event)
Clean, readable code

Manual alternatives (from explicit to Pythonic):

# Most explicit - shows the logic clearly
if key in my_dict:
    my_dict[key].append(value)
else:
    my_dict[key] = [value]

# More concise with setdefault
dict.setdefault(key, []).append(value)

Did you get it right?

✓

Correct!

✗

Incorrect

Question 30

When should you use Counter instead of a regular dict for counting in log analysis?

Use Counter when you need:

.most_common(N): Find top N most common items
Count arithmetic: Combine counts from multiple sources (count1 + count2)
Multiple count operations: Subtract, intersect, union

Use dict/defaultdict when:

Simple counting without special operations
Want minimal overhead
Don’t need Counter’s features

Key insight: Counter is a specialized tool - use it when you need its features, not just for basic counting.

Did you get it right?

✓

Correct!

✗

Incorrect

Question 31

Which of these are valid reasons to use the two-pass processing pattern?

Finding duplicate UIDs before reassigning them Counting word frequencies Identifying all issues before fixing them in a configuration file Processing logs that are already sorted Building a correction mapping based on entire file analysis

✓

Correct!

Two-pass is needed when you must analyze the entire dataset before making corrections (duplicates, issues, correction mappings). Single-pass works for counting and processing sorted data.

✗

Incorrect

Two-pass is needed when you must analyze the entire dataset before making corrections (duplicates, issues, correction mappings). Single-pass works for counting and processing sorted data.

When do you need to see everything before you can fix anything?

Question 32

What will be printed?

from collections import Counter
words = Counter(['apple', 'banana', 'apple', 'cherry', 'banana', 'apple'])
result = words.most_common(2)
print(len(result))

What will this code output?

1 2 3 6

✓

Correct!

most_common(2) returns a list of the 2 most common items as tuples: [(‘apple’, 3), (‘banana’, 2)]. len(result) = 2.

✗

Incorrect

most_common(2) returns a list of the 2 most common items as tuples: [(‘apple’, 3), (‘banana’, 2)]. len(result) = 2.

How many items did you ask for with most_common()?

Question 33

What is the best practice for handling punctuation when analyzing word frequencies in log messages?

Ignore it - punctuation doesn’t affect word counts Remove punctuation before splitting into words Count words with punctuation as different from words without Use case-insensitive matching instead

✓

Correct!

Remove punctuation before counting to avoid treating ’error,’ and ’error’ as different words. Use string.punctuation with replace() or regex \b\w+\b to extract clean words.

✗

Incorrect

Remove punctuation before counting to avoid treating ’error,’ and ’error’ as different words. Use string.punctuation with replace() or regex \b\w+\b to extract clean words.

Should ’error,’ and ’error’ be counted as the same word or different words?

Question 34

Sets in Python provide O(1) membership testing, making them ideal for checking if an item has been seen before.

True False

✓

Correct!

True. Sets use hash tables internally, providing O(1) average-case lookup. This makes ‘if item in seen_items’ very fast compared to lists (O(n)).

✗

Incorrect

True. Sets use hash tables internally, providing O(1) average-case lookup. This makes ‘if item in seen_items’ very fast compared to lists (O(n)).

Think about the performance characteristics of sets versus lists.

Question 35

Arrange these steps for parsing logs with regular expressions in the correct order:

Drag to arrange in logical order

⋮⋮ Compile regex pattern with named groups

⋮⋮ Read each log line

⋮⋮ Match pattern against line

⋮⋮ Extract data using groupdict()

⋮⋮ Handle non-matching lines

✓

Correct!

Correct flow: 1) Compile pattern once (performance), 2) Read lines, 3) Match pattern, 4) Extract with groupdict() if matched, 5) Handle/skip non-matching lines.

✗

Incorrect

Correct flow: 1) Compile pattern once (performance), 2) Read lines, 3) Match pattern, 4) Extract with groupdict() if matched, 5) Handle/skip non-matching lines.

Question 36

Complete the code to safely find the slowest request from a filtered list:

Fill in the parameter to avoid errors on empty lists

error_requests = [log for log in logs if log['status'] >= 400]
slowest = max(error_requests, key=lambda x: x['latency'], _____)

Your answer:

✓

Correct!

Use default=None with max()/min() on filtered lists to avoid ValueError when the list is empty. Always consider edge cases in log analysis.

✗

Incorrect

Use default=None with max()/min() on filtered lists to avoid ValueError when the list is empty. Always consider edge cases in log analysis.

What parameter prevents max() from crashing on an empty sequence?

Question 37

What are the key differences between parsing delimited files (like /etc/passwd) versus JSON logs?

Delimited Files (CSV-like):

Use split(’:’) or split(’,')
Fixed column positions
Flat structure
Must handle comments (#) and empty lines manually
Simple index-based access: data[2]

JSON Logs:

Use json.loads() per line
Named fields with .get()
Nested structures
Type-safe (booleans, numbers preserved)
Safer with .get() defaults: event.get(‘field’, {})

Key insight: JSON is self-describing and handles nesting; delimited is simpler but requires knowing column positions.

Did you get it right?

✓

Correct!

✗

Incorrect

Question 38

Which datetime parsing method should you use for ISO 8601 formatted timestamps like ‘2025-01-15T10:23:45+00:00’?

datetime.strptime(ts, ‘%Y-%m-%dT%H:%M:%S%z’) datetime.fromisoformat(ts) datetime.parse(ts) datetime.fromtimestamp(ts)

✓

Correct!

datetime.fromisoformat() is the built-in method for parsing ISO 8601 timestamps. It’s simpler than strptime() for standard ISO formats. Use strptime() for custom formats like ‘15/Jan/2025:10:23:45 +0000’.

✗

Incorrect

There’s a specific method for ISO format in the datetime module.

Question 39

Which are valid performance optimization tips for log analysis?

Process line-by-line for memory efficiency Always load entire files into memory for speed Use sets for membership testing instead of lists Compile regex patterns inside loops for accuracy Use list comprehensions instead of for-loops with append

✓

Correct!

Good practices: line-by-line processing (memory), sets for membership (O(1)), list comprehensions (faster). Bad: loading all files (memory waste), compiling regex in loops (slow).

✗

Incorrect

Good practices: line-by-line processing (memory), sets for membership (O(1)), list comprehensions (faster). Bad: loading all files (memory waste), compiling regex in loops (slow).

Think about memory efficiency and algorithmic complexity.

Question 40

What will this code output?

text = 'ERROR: failed'
for punc in '!@#$%^&*(),.:':
    text = text.replace(punc, ' ')
words = text.split()
print(len(words))

What will this code output?

1 2 3 4

✓

Correct!

After removing punctuation, text becomes ‘ERROR failed’ (colon replaced with space). split() handles multiple spaces and returns [‘ERROR’, ‘failed’], so len = 2.

✗

Incorrect

After removing punctuation, text becomes ‘ERROR failed’ (colon replaced with space). split() handles multiple spaces and returns [‘ERROR’, ‘failed’], so len = 2.

Count the words after punctuation removal and splitting.

Question 41

When grouping events by pod name using defaultdict(list), you must initialize the list for each pod before appending to it.

True False

✓

Correct!

False. That’s the whole point of defaultdict(list) - it automatically initializes missing keys to empty lists. You can directly do: pod_events[pod_name].append(event) without checking or initializing.

✗

Incorrect

What does ‘default’ in defaultdict mean?

Question 42

What is the primary benefit of using named groups in regular expressions like (?P<IP>\d+\.\d+\.\d+\.\d+)?

Faster pattern matching Self-documenting code with descriptive names accessible via groupdict() Required for validation Allows reuse of the same pattern

✓

Correct!

Named groups make code self-documenting and allow accessing matched data by name: match.groupdict()[‘IP’] instead of match.group(1). This improves readability and maintainability.

✗

Incorrect

Named groups make code self-documenting and allow accessing matched data by name: match.groupdict()[‘IP’] instead of match.group(1). This improves readability and maintainability.

Think about code clarity and how you access the matched data.

Question 43

Complete the code to handle case-insensitive matching when parsing log levels:

Fill in the normalization step

log_levels = {'INFO', 'error', 'WARNING', 'info'}
counts = {}
for level in log_levels:
    normalized = _____
    counts[normalized] = counts.get(normalized, 0) + 1

Your answer:

✓

Correct!

Use .upper() or .lower() to normalize case before counting. This ensures ‘ERROR’, ’error’, and ‘Error’ are counted together. Choose one and be consistent.

✗

Incorrect

Use .upper() or .lower() to normalize case before counting. This ensures ‘ERROR’, ’error’, and ‘Error’ are counted together. Choose one and be consistent.

You need to normalize all strings to the same case.

Question 44

Why should you use early ‘continue’ statements when filtering with multiple conditions?

Early Continue Pattern:

Early continue enables fail-fast logic, keeping the main logic flat, readable, and focused on valid cases.

for event in events:
    if obj.get('kind') != 'Pod':
        continue
    if event.get('type') != 'Warning':
        continue
    # Process filtered event

Benefits:

Improves readability with many conditions
Reduces nesting levels
Makes filtering logic explicit
Each condition is independent and clear

Alternative: Chain with ‘and’ for compact code with few conditions

Best practice: Use early continue when you have 3+ filtering conditions.

Did you get it right?

✓

Correct!

✗

Incorrect

Question 45

When should you use Counter.update() instead of counting manually?

When adding new items from an iterable to existing counts When you need to reset all counts to zero When you only need to count one specific item When you need to sort by frequency

✓

Correct!

Counter.update(iterable) adds counts from an iterable to existing counts. It’s cleaner than manually looping and incrementing. Example: count1.update(count2) combines two counters.

✗

Incorrect

Counter.update(iterable) adds counts from an iterable to existing counts. It’s cleaner than manually looping and incrementing. Example: count1.update(count2) combines two counters.

Think about combining or adding to existing counts.

Question 46

What is the output?

line = '192.168.1.1 - [ERROR] message'
start = line.find('[')
end = line.find(']')
result = line[start+1:end]
print(result)

What will this code output?

ERROR [ERROR] ERROR] message 192.168.1.1 - [ERROR

✓

Correct!

find(’[’) returns 12, find(’]’) returns 18. Slice [13:18] extracts ‘ERROR’ (between the brackets, not including them).

✗

Incorrect

find(’[’) returns 12, find(’]’) returns 18. Slice [13:18] extracts ‘ERROR’ (between the brackets, not including them).

Slicing from start+1 to end excludes the opening bracket and includes up to but not including the closing bracket.

Question 47

Which scenarios require loading the entire log file into memory before processing?

Sorting logs by timestamp before processing chronologically Counting error frequencies Pairing login events with logout events for session tracking Finding the maximum latency value Calculating percentiles (P95, P99) for response times

✓

Correct!

Need memory: sorting (must see all), pairing across file, percentiles (require sorted data). Don’t need: counting (running total), finding max (track as you go).

✗

Incorrect

Need memory: sorting (must see all), pairing across file, percentiles (require sorted data). Don’t need: counting (running total), finding max (track as you go).

When do you need to see all data before you can process correctly?

Question 48

List comprehensions in Python are generally faster than for-loops with append() for building lists.

True False

✓

Correct!

True. List comprehensions are optimized internally and typically faster than equivalent for-loops with append(). They’re also more Pythonic and readable for simple transformations.

✗

Incorrect

True. List comprehensions are optimized internally and typically faster than equivalent for-loops with append(). They’re also more Pythonic and readable for simple transformations.

Python optimizes list comprehensions at the interpreter level.

Question 49

To sort a dict by its values and get back (key, value) tuples, which expression is correct?

sorted(d, key=lambda x: x[1], reverse=True) sorted(d.items(), key=lambda x: x[1], reverse=True) sorted(d.values(), key=lambda x: x[1], reverse=True) sorted(d.keys(), key=lambda x: x[1], reverse=True)

✓

Correct!

.items() returns (key, value) tuples, so x[1] correctly accesses the value for sorting and the result contains full (key, value) pairs you can unpack. Sorting d directly iterates over keys only — x[1] then silently accesses the second character of each key string, giving wrong order and a list of bare strings instead of tuples.

✗

Incorrect

Which dict method returns key-value pairs as tuples?

Question 50

Fix the code so that line prints as hello world (punctuation replaced with spaces):

Fill in the corrected line

line = 'hello, world.'
punctuations = ['.', ',']
for punctuation in punctuations:
    _____
print(line)  # hello  world

Your answer:

✓

Correct!

Strings are immutable in Python. str.replace() returns a NEW string — it does not modify the original in place. The result must be reassigned back: line = line.replace(punctuation, ' '). Without reassignment, line stays unchanged.

✗

Incorrect

Strings are immutable. What do you need to do with the value that replace() returns?

Question 51

What will this code print for every possible value of reason?

reason = "Scheduled"
if reason != "Scheduled" or reason != "Killing":
    print("skip")
else:
    print("keep")

What will this code output?

skip keep Error Prints nothing

✓

Correct!

This condition is ALWAYS True, so it always prints ‘skip’. A value cannot equal both ‘Scheduled’ and ‘Killing’ at the same time, so it will always be unequal to at least one — making the or condition always True. To keep only Scheduled and Killing, use and: if reason != 'Scheduled' and reason != 'Killing': continue. Even cleaner: if reason not in {'Scheduled', 'Killing'}: continue.

✗

Incorrect

Can a single value simultaneously equal ‘Scheduled’ AND equal ‘Killing’?

Question 52

What happens when this code runs?

events = {}
pod_name = "web-pod"
events[pod_name]["s_event"] = 10

What will this code output?

Creates `events['web-pod']` automatically and sets `s_event` to 10 Raises KeyError on `events[pod_name]` Sets `s_event` to 10 but leaves `pod_name` as None Silently fails with no output

✓

Correct!

Python evaluates events[pod_name] FIRST before attempting to assign s_event. Since ‘web-pod’ doesn’t exist yet, this raises a KeyError immediately. Dictionaries do NOT auto-create parent keys for nested assignment. Always initialize the parent first: events[pod_name] = {} then events[pod_name]['s_event'] = 10. Or use events.setdefault(pod_name, {})['s_event'] = 10.

✗

Incorrect

Python must look up events[pod_name] before it can assign to ['s_event']. What happens if the key doesn’t exist?

Question 53

This code groups web requests by endpoint and tracks count and total latency. It crashes on the first request. Add the single missing line to fix it:

Fill in the missing initialization line

stats = {}
for log in access_logs:
    ep = log['path']
    _____
    stats[ep]['count'] += 1
    stats[ep]['total_latency'] += log['latency']

Your answer:

✓

Correct!

Before writing into a nested dict, ask: ‘Does this parent key exist?’ If not, initialize it first. This pattern appears in every group-by operation — HTTP logs, session tracking, metric aggregation. The trigger question is always the same: am I writing into a container that might not exist yet? Two idiomatic fixes: (1) if key not in d: d[key] = {...} — explicit and readable. (2) d.setdefault(key, {...}) — concise one-liner.

✗

Incorrect

Before you can do stats[ep]['count'] += 1, what must be true about stats[ep]?

Question 54

When should you sort log timestamps as strings vs. convert to datetime objects?

String sort is sufficient when:

You only need chronological ordering
All timestamps share the same consistent ISO 8601-like format (e.g., 2025-08-12T09:05:00)
Lexicographic order == chronological order ✓

Convert to datetime when:

You need duration math: logout_dt - login_dt
You need to add/subtract time: ts + timedelta(hours=1)
You need to compare across different formats

Common pattern:

# Step 1 — sort cheaply as strings
sorted_events = sorted(events, key=lambda e: e['timestamp'])

# Step 2 — convert only for duration calculation
from datetime import datetime
login_dt  = datetime.fromisoformat(login['timestamp'])
logout_dt = datetime.fromisoformat(logout['timestamp'])
session_duration = logout_dt - login_dt

Key insight: Don’t pay the cost of datetime conversion unless you need arithmetic.

Did you get it right?

✓

Correct!

✗

Incorrect

Question 55

You want to sort this list of events by timestamp. Which expression is correct?

events = [
  {'timestamp': '2025-08-12T09:10:00', 'user_id': 'B', 'action': 'view_page'},
  {'timestamp': '2025-08-12T09:05:00', 'user_id': 'B', 'action': 'login'},
]

sorted(events, key=lambda x: x[0]) sorted(events, key=lambda x: x[1]) sorted(events, key=lambda x: x['timestamp']) sorted(events.items(), key=lambda x: x[0])

✓

Correct!

Each element x is a dict, so you access fields by key name: x['timestamp']. Index-based access like x[0] or x[1] would raise a TypeError — dicts aren’t indexed by position. .items() doesn’t apply to a list.

✗

Incorrect

What is the type of each element in the list?

Question 56

You want to sort this dict by count, highest first. Which expression is correct?

warn_reasons = {'BackOff': 2, 'Failed': 2, 'FailedMount': 1, 'Unhealthy': 2}

sorted(warn_reasons, key=lambda x: x['count'], reverse=True) sorted(warn_reasons.items(), key=lambda x: x[1], reverse=True) sorted(warn_reasons.items(), key=lambda x: x['count'], reverse=True) sorted(warn_reasons, key=lambda x: x[1], reverse=True)

✓

Correct!

.items() produces (key, value) tuples like ('BackOff', 2). Each x in the lambda is a tuple, so x[1] is the count. Using x['count'] would fail — tuples use index access, not key names. Sorting warn_reasons directly (without .items()) iterates over keys only, so x[1] would access the second character of the key string — wrong result.

✗

Incorrect

What does .items() return, and how do you access a value from a tuple?

Question 57

Which line correctly resets login_time in the dictionary?

session_tracker = {'alice': {'login_time': '09:00'}}
login_time = session_tracker['alice']['login_time']

login_time = None session_tracker[‘alice’][’login_time’] = None del login_time login_time.clear()

✓

Correct!

login_time = session_tracker['alice']['login_time'] copies the string value into a local variable. Reassigning login_time = None only changes that local variable — it never writes back to the dictionary. To reset the stored value you must update the dictionary directly: session_tracker['alice']['login_time'] = None.

✗

Incorrect

A local variable holds a copy of the value. Where must you write to change the dictionary?

Question 58

What does events contain after this loop finishes?

events = []
for item in ['a', 'b', 'c']:
    events = events.append(item)

print(events)

What will this code output?

['a', 'b', 'c'] None [] TypeError

✓

Correct!

list.append() mutates the list in-place and returns None. So events = events.append(item) first appends item to the list, then overwrites events with None. On the next iteration, events.append(item) raises AttributeError — but even if it didn’t, the result is None. The fix is simply events.append(item) with no reassignment.

✗

Incorrect

What does append() return?

Quiz Results

Score

0/0

Accuracy

Right

Wrong

Skipped

Last updated on March 16, 2026

Working With Data