Log Analysis Quiz
Quiz
import json
event = json.loads('{"type": "Warning", "obj": {"name": "pod1"}}')
result = event.get("obj", {}).get("namespace", "default")
print(result)total_latency = sum(latencies)
count = len(latencies)
_____:
avg_latency = total_latency / count
else:
avg_latency = 0from datetime import timedelta
td1 = timedelta(hours=2)
td2 = timedelta(hours=3)
avg = (td1 + td2) / 2
print(type(avg).__name__)Two-Pass Processing Pattern:
A two-pass processing pattern separates analysis from mutation by processing the dataset twice.
Pass 1: Analyze (read-only)
- Scan the data to detect patterns or conflicts and build the required state (e.g., find duplicate UIDs and create a reassignment map).
Pass 2: Apply (write/action)
- Apply corrections using only the results from Pass 1, without making new decisions.
When to use: Use this pattern when fixes depend on global knowledge of the data, and modifying records safely requires seeing the full dataset first. Example: Finding duplicate UIDs and reassigning them to available unique values.
Did you get it right?
count = {}
for item in ['a', 'b', 'a', 'c', 'a']:
count[item] = count.get(item, 0) + 1
print(count['a'])active_sessions = {}
for event in events:
user_id = event['user_id']
if event['action'] == 'login' and _____:
active_sessions[user_id] = event['timestamp']find()
- Returns -1 if substring not found
- Silent failure - can cause subtle bugs with slicing
- Use when you want to check and handle missing substrings with conditionals
index()
- Raises ValueError if substring not found
- Explicit error handling with try/except
- Clearer for error cases
Best practice: Use index() with try/except for clearer error handling in log parsing.
Did you get it right?
import re
pattern = r'\[(?P<LEVEL>\w+)\]'
line = '[ERROR] Connection failed'
match = re.search(pattern, line)
print(match.groupdict()['LEVEL'])import re
pattern = r'_____'
line = '"GET /api/users HTTP/1.1"'
match = re.search(pattern, line)
request = match.group(1)Three options with trade-offs:
dict with .get() - Simple counting, minimal overhead
count[item] = count.get(item, 0) + 1defaultdict(int) - Cleaner code, no .get() needed
count[item] += 1Counter - When you need .most_common() or count arithmetic
Choose based on needs: Start simple (dict), use defaultdict for cleaner code, use Counter when you need its special features.
Did you get it right?
set - Ideal for uniqueness tracking
Benefits:
- O(1) membership testing:
if item in seen_set - Automatic deduplication
- Memory efficient for large unique sets
Example use cases: Track unique IP addresses, pod names, user IDs
Alternative: dict keys work but are overkill unless you need associated values.
Did you get it right?
defaultdict(list) - Purpose-built for grouping
Why it’s ideal:
- Auto-initializes missing keys to empty lists
- No existence checks needed:
pod_events[pod_name].append(event) - Clean, readable code
Manual alternatives (from explicit to Pythonic):
# Most explicit - shows the logic clearly
if key in my_dict:
my_dict[key].append(value)
else:
my_dict[key] = [value]
# More concise with setdefault
dict.setdefault(key, []).append(value)Did you get it right?
Use Counter when you need:
- .most_common(N): Find top N most common items
- Count arithmetic: Combine counts from multiple sources (
count1 + count2) - Multiple count operations: Subtract, intersect, union
Use dict/defaultdict when:
- Simple counting without special operations
- Want minimal overhead
- Don’t need Counter’s features
Key insight: Counter is a specialized tool - use it when you need its features, not just for basic counting.
Did you get it right?
from collections import Counter
words = Counter(['apple', 'banana', 'apple', 'cherry', 'banana', 'apple'])
result = words.most_common(2)
print(len(result))error_requests = [log for log in logs if log['status'] >= 400]
slowest = max(error_requests, key=lambda x: x['latency'], _____)Delimited Files (CSV-like):
- Use split(’:’) or split(’,')
- Fixed column positions
- Flat structure
- Must handle comments (#) and empty lines manually
- Simple index-based access: data[2]
JSON Logs:
- Use json.loads() per line
- Named fields with .get()
- Nested structures
- Type-safe (booleans, numbers preserved)
- Safer with .get() defaults: event.get(‘field’, {})
Key insight: JSON is self-describing and handles nesting; delimited is simpler but requires knowing column positions.
Did you get it right?
text = 'ERROR: failed'
for punc in '!@#$%^&*(),.:':
text = text.replace(punc, ' ')
words = text.split()
print(len(words))log_levels = {'INFO', 'error', 'WARNING', 'info'}
counts = {}
for level in log_levels:
normalized = _____
counts[normalized] = counts.get(normalized, 0) + 1Early Continue Pattern:
Early continue enables fail-fast logic, keeping the main logic flat, readable, and focused on valid cases.
for event in events:
if obj.get('kind') != 'Pod':
continue
if event.get('type') != 'Warning':
continue
# Process filtered eventBenefits:
- Improves readability with many conditions
- Reduces nesting levels
- Makes filtering logic explicit
- Each condition is independent and clear
Alternative: Chain with ‘and’ for compact code with few conditions
Best practice: Use early continue when you have 3+ filtering conditions.
Did you get it right?
line = '192.168.1.1 - [ERROR] message'
start = line.find('[')
end = line.find(']')
result = line[start+1:end]
print(result).items() returns (key, value) tuples, so x[1] correctly accesses the value for sorting and the result contains full (key, value) pairs you can unpack. Sorting d directly iterates over keys only — x[1] then silently accesses the second character of each key string, giving wrong order and a list of bare strings instead of tuples..items() returns (key, value) tuples, so x[1] correctly accesses the value for sorting and the result contains full (key, value) pairs you can unpack. Sorting d directly iterates over keys only — x[1] then silently accesses the second character of each key string, giving wrong order and a list of bare strings instead of tuples.line prints as hello world (punctuation replaced with spaces):line = 'hello, world.'
punctuations = ['.', ',']
for punctuation in punctuations:
_____
print(line) # hello world line = line.replace(punctuation, ' '). Without reassignment, line stays unchanged.line = line.replace(punctuation, ' '). Without reassignment, line stays unchanged.reason?reason = "Scheduled"
if reason != "Scheduled" or reason != "Killing":
print("skip")
else:
print("keep")or condition always True. To keep only Scheduled and Killing, use and: if reason != 'Scheduled' and reason != 'Killing': continue. Even cleaner: if reason not in {'Scheduled', 'Killing'}: continue.or condition always True. To keep only Scheduled and Killing, use and: if reason != 'Scheduled' and reason != 'Killing': continue. Even cleaner: if reason not in {'Scheduled', 'Killing'}: continue.events = {}
pod_name = "web-pod"
events[pod_name]["s_event"] = 10events[pod_name] FIRST before attempting to assign s_event. Since ‘web-pod’ doesn’t exist yet, this raises a KeyError immediately. Dictionaries do NOT auto-create parent keys for nested assignment. Always initialize the parent first: events[pod_name] = {} then events[pod_name]['s_event'] = 10. Or use events.setdefault(pod_name, {})['s_event'] = 10.events[pod_name] FIRST before attempting to assign s_event. Since ‘web-pod’ doesn’t exist yet, this raises a KeyError immediately. Dictionaries do NOT auto-create parent keys for nested assignment. Always initialize the parent first: events[pod_name] = {} then events[pod_name]['s_event'] = 10. Or use events.setdefault(pod_name, {})['s_event'] = 10.events[pod_name] before it can assign to ['s_event']. What happens if the key doesn’t exist?stats = {}
for log in access_logs:
ep = log['path']
_____
stats[ep]['count'] += 1
stats[ep]['total_latency'] += log['latency']if key not in d: d[key] = {...} — explicit and readable. (2) d.setdefault(key, {...}) — concise one-liner.if key not in d: d[key] = {...} — explicit and readable. (2) d.setdefault(key, {...}) — concise one-liner.stats[ep]['count'] += 1, what must be true about stats[ep]?String sort is sufficient when:
- You only need chronological ordering
- All timestamps share the same consistent ISO 8601-like format (e.g.,
2025-08-12T09:05:00) - Lexicographic order == chronological order ✓
Convert to datetime when:
- You need duration math:
logout_dt - login_dt - You need to add/subtract time:
ts + timedelta(hours=1) - You need to compare across different formats
Common pattern:
# Step 1 — sort cheaply as strings
sorted_events = sorted(events, key=lambda e: e['timestamp'])
# Step 2 — convert only for duration calculation
from datetime import datetime
login_dt = datetime.fromisoformat(login['timestamp'])
logout_dt = datetime.fromisoformat(logout['timestamp'])
session_duration = logout_dt - login_dtKey insight: Don’t pay the cost of datetime conversion unless you need arithmetic.
Did you get it right?
You want to sort this list of events by timestamp. Which expression is correct?
events = [
{'timestamp': '2025-08-12T09:10:00', 'user_id': 'B', 'action': 'view_page'},
{'timestamp': '2025-08-12T09:05:00', 'user_id': 'B', 'action': 'login'},
]x is a dict, so you access fields by key name: x['timestamp']. Index-based access like x[0] or x[1] would raise a TypeError — dicts aren’t indexed by position. .items() doesn’t apply to a list.x is a dict, so you access fields by key name: x['timestamp']. Index-based access like x[0] or x[1] would raise a TypeError — dicts aren’t indexed by position. .items() doesn’t apply to a list.You want to sort this dict by count, highest first. Which expression is correct?
warn_reasons = {'BackOff': 2, 'Failed': 2, 'FailedMount': 1, 'Unhealthy': 2}.items() produces (key, value) tuples like ('BackOff', 2). Each x in the lambda is a tuple, so x[1] is the count. Using x['count'] would fail — tuples use index access, not key names. Sorting warn_reasons directly (without .items()) iterates over keys only, so x[1] would access the second character of the key string — wrong result..items() produces (key, value) tuples like ('BackOff', 2). Each x in the lambda is a tuple, so x[1] is the count. Using x['count'] would fail — tuples use index access, not key names. Sorting warn_reasons directly (without .items()) iterates over keys only, so x[1] would access the second character of the key string — wrong result..items() return, and how do you access a value from a tuple?Which line correctly resets login_time in the dictionary?
session_tracker = {'alice': {'login_time': '09:00'}}
login_time = session_tracker['alice']['login_time']login_time = session_tracker['alice']['login_time'] copies the string value into a local variable. Reassigning login_time = None only changes that local variable — it never writes back to the dictionary. To reset the stored value you must update the dictionary directly: session_tracker['alice']['login_time'] = None.login_time = session_tracker['alice']['login_time'] copies the string value into a local variable. Reassigning login_time = None only changes that local variable — it never writes back to the dictionary. To reset the stored value you must update the dictionary directly: session_tracker['alice']['login_time'] = None.events contain after this loop finishes?events = []
for item in ['a', 'b', 'c']:
events = events.append(item)
print(events)list.append() mutates the list in-place and returns None. So events = events.append(item) first appends item to the list, then overwrites events with None. On the next iteration, events.append(item) raises AttributeError — but even if it didn’t, the result is None. The fix is simply events.append(item) with no reassignment.list.append() mutates the list in-place and returns None. So events = events.append(item) first appends item to the list, then overwrites events with None. On the next iteration, events.append(item) raises AttributeError — but even if it didn’t, the result is None. The fix is simply events.append(item) with no reassignment.