ted의 개발블로그

📍 가장 흔한 단어(most common word)

금지된 단어를 제외한 가장 흔하게 등장하는 단어를 출력하라. 대소문자 구분을 하지 않으며, 구두점(마침표, 쉼표 등)또한 무시한다.

⚡️ 나의 풀이

입력값 전처리(preprocessing)과정 중 구두점(punctuation)을 제거하는 방법에서 시간을 많이 쏟았는데, 결론적으로 string.punctuation 문자열 함수를 사용하면 쉽게 해결 할 수 있다.

이 방법외에도 정규식(regular expression)을 사용해도 되는데 오히려 string.punctuation보다 간편해 보였다. 자주 사용하도록 외워둬야겠다. 정규식에서 \w는 단어 문자(word character)을 뜻하며, ^는 not을 의미한다. (re.sub(r'[^\w]', ' ', paragraph))

또, 금지된 단어인 banned의 값도 제거해야하는데, 이는 replace를 사용해서 제거했다. 책에서는 not in으로 제거하는 방법을 사용했다.

마지막 방법은 전처리 과정까지는 동일하지만 Counter모듈 대신 defaultdict를 사용했다. defaultdict는 앞서 배웠지만 key값이 몇 갠지 세고 싶을 때 적절한 모듈이다. 딕셔너리 중 제일 큰 value를 나타내는 key값을 추출하고 싶을 때는 max()함수에 key = dict.get()을 사용하면 된다. 이번에 처음 알았는데, 기억하고 있어야겠다.

paragraph = "Bob hit a ball, the hit BALL flew far after it was hit."
banned = ['hit']

# 내 코드
def solution1(paragraph, banned):
    paragraph = paragraph.replace(str(*banned), '')
    result = ''.join([i for i in paragraph if i not in string.punctuation]).lower().split()
    return Counter(result).most_common(1)[0][0]

# reference book
def solution2(paragraph, banned):
    result = [i for i in re.sub(r'[^\w\s]', '', paragraph).lower().split() if i not in banned]
    return Counter(result).most_common(1)[0][0]

# reference book2
def solution3(paragraph, banned):
    arr = [i for i in re.sub(r'[^\w\s]', '', paragraph).lower().split() if i not in banned]
    result = defaultdict(int)
    for i in arr:
        result[i]+=1
    return max(result, key= result.get)

'Python > 파이썬을 알고리즘 인터뷰' 카테고리의 다른 글

[ 6. 문자열 조작 ] - 그룹 애너그램(Group Anagrams) (0)	2021.04.06
[ 6. 문자열 조작 ] - 로그파일 재정렬 (Reorder Log Files) (0)	2021.04.06
[ 6. 문자열 조작 ] - 문자열 뒤집기(Reverse String) (0)	2021.04.02
[ 6. 문자열 조작 ] - 문자열 슬라이싱(String Slicing) (0)	2021.04.02
[ 6. 문자열 조작 ] - 유효한 팰린드롬(Valid Palindrome) (0)	2021.04.02

일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28

ted's tech-blog

[ 6. 문자열 조작 ] - 가장 흔한 단어(Most Common Word)

📍 가장 흔한 단어(most common word)

⚡️ 나의 풀이

'Python > 파이썬을 알고리즘 인터뷰' 카테고리의 다른 글

댓글

티스토리툴바

[ 6. 문자열 조작 ] - 가장 흔한 단어(Most Common Word)

📍 가장 흔한 단어(most common word)

⚡️ 나의 풀이

'Python > 파이썬을 알고리즘 인터뷰' 카테고리의 다른 글

관련글

댓글

티스토리툴바