Extras
Helpers for ingestion pipelines. Import them from rmp_client:
from rmp_client import (
analyze_sentiment, normalize_comment,
is_valid_comment, build_course_mapping,
clean_course_label,
)
Sentiment #
Compute a sentiment score and label from comment text (uses TextBlob internally).
result = analyze_sentiment("Great prof, explains concepts clearly.")
print(result.score, result.label) # e.g. 0.65 "positive"
Helpers #
normalize_comment
Normalizes a comment for comparison or deduplication. Trims whitespace, strips HTML tags (opt-out), lowercases, and collapses runs of whitespace. Optionally strips punctuation for looser matching.
| Parameter | Type | Default | Description |
|---|---|---|---|
text | str | — | Comment text |
strip_html | bool | True | Remove HTML tags |
strip_punctuation | bool | False | Remove all punctuation |
a = normalize_comment(" Great Professor! ")
b = normalize_comment("great professor!")
assert a == b # True
normalize_comment("<b>Loved</b> this class") # "loved this class"
normalize_comment("Hello, world!", strip_punctuation=True) # "hello world"
is_valid_comment
Validates a comment and returns detailed diagnostics. Checks for empty text, insufficient length, all-caps, excessive repeated characters, and absence of alphabetic characters.
| Parameter | Type | Default | Description |
|---|---|---|---|
text | str | — | Comment text |
min_len | int | 10 | Minimum character length |
Returns: ValidationResult with valid (bool) and issues (list of CommentIssue).
Each issue has a code ("empty", "too_short", "all_caps", "excessive_repeats", "no_alpha") and a human-readable message.
result = is_valid_comment("Good")
# ValidationResult(valid=False, issues=[CommentIssue(code="too_short", ...)])
result = is_valid_comment("Great class, learned a lot")
# ValidationResult(valid=True, issues=[])
result = is_valid_comment("WORST PROF EVER!!!")
# ValidationResult(valid=False, issues=[CommentIssue(code="all_caps", ...)])
Course Code Helpers #
Map scraped RMP course labels to your course catalog.
scraped = ["ANAT 215 (12)", "phys115"]
valid = ["ANAT 215", "PHYS 115"]
mapping = build_course_mapping(scraped, valid)
# {"ANAT 215 (12)": "ANAT 215", "phys115": "PHYS 115"}
cleaned = clean_course_label("MATH 101 (5)")
# "MATH 101"