Extras

Helpers for ingestion pipelines. Import them from rmp_client:

from rmp_client import (
    analyze_sentiment, normalize_comment,
    is_valid_comment, build_course_mapping,
    clean_course_label,
)

Sentiment #

Compute a sentiment score and label from comment text (uses TextBlob internally).

result = analyze_sentiment("Great prof, explains concepts clearly.")
print(result.score, result.label)  # e.g. 0.65 "positive"

Helpers #

normalize_comment

normalize_comment(text: str, *, strip_html: bool = True, strip_punctuation: bool = False) -> str

Normalizes a comment for comparison or deduplication. Trims whitespace, strips HTML tags (opt-out), lowercases, and collapses runs of whitespace. Optionally strips punctuation for looser matching.

ParameterTypeDefaultDescription
textstrComment text
strip_htmlboolTrueRemove HTML tags
strip_punctuationboolFalseRemove all punctuation
a = normalize_comment("  Great  Professor!  ")
b = normalize_comment("great professor!")
assert a == b  # True

normalize_comment("<b>Loved</b> this class")  # "loved this class"
normalize_comment("Hello, world!", strip_punctuation=True)  # "hello world"

is_valid_comment

is_valid_comment(text: str, *, min_len: int = 10) -> ValidationResult

Validates a comment and returns detailed diagnostics. Checks for empty text, insufficient length, all-caps, excessive repeated characters, and absence of alphabetic characters.

ParameterTypeDefaultDescription
textstrComment text
min_lenint10Minimum character length

Returns: ValidationResult with valid (bool) and issues (list of CommentIssue).

Each issue has a code ("empty", "too_short", "all_caps", "excessive_repeats", "no_alpha") and a human-readable message.

result = is_valid_comment("Good")
# ValidationResult(valid=False, issues=[CommentIssue(code="too_short", ...)])

result = is_valid_comment("Great class, learned a lot")
# ValidationResult(valid=True, issues=[])

result = is_valid_comment("WORST PROF EVER!!!")
# ValidationResult(valid=False, issues=[CommentIssue(code="all_caps", ...)])

Course Code Helpers #

Map scraped RMP course labels to your course catalog.

scraped = ["ANAT 215 (12)", "phys115"]
valid = ["ANAT 215", "PHYS 115"]

mapping = build_course_mapping(scraped, valid)
# {"ANAT 215 (12)": "ANAT 215", "phys115": "PHYS 115"}

cleaned = clean_course_label("MATH 101 (5)")
# "MATH 101"