๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
๐Ÿ–ฅ๏ธ ์ตœ์ข…ํ”„๋กœ์ ํŠธ/โœ๏ธ TIL

[ TIL ] ์ตœ์ข… ํ”„๋กœ์ ํŠธ_Day 15

by carrot0911 2025. 2. 24.

์˜ค๋Š˜ ์ง„ํ–‰ํ•œ ๋‚ด์šฉ๋“ค ๐Ÿง 

match vs wildcard

  • match
    • ์—ญ์ƒ‰์ธ(Inverted Index)์„ ํ™œ์šฉํ•˜๋Š” ํ’€ํ…์ŠคํŠธ ๊ฒ€์ƒ‰(Full-Text Search) ๋ฐฉ์‹
    • analyzed ๋œ ํ•„๋“œ์—์„œ ํ† ํฐํ™”(Tokenization)์™€ ํ…์ŠคํŠธ ์ •๊ทœํ™”๋ฅผ ๊ฑฐ์ณ ๊ฒ€์ƒ‰์–ด๋ฅผ ๋งค์นญ
    • ๋™์ž‘ ๋ฐฉ์‹
      • ๊ฒ€์ƒ‰์–ด๋ฅผ ํ† ํฐํ™”ํ•˜๊ณ , ์œ ์‚ฌํ•œ ๋‹จ์–ด๊นŒ์ง€ ํฌํ•จํ•ด์„œ ๊ฒ€์ƒ‰
      • ์ผ๋ฐ˜์ ์œผ๋กœ text ํƒ€์ž… ํ•„๋“œ์—์„œ ์‚ฌ์šฉํ•˜๋ฉฐ, standard, ngram, edge_ngram ๋“ฑ์˜ ๋ถ„์„๊ธฐ๊ฐ€ ์ ์šฉ
      • ํ† ํฐ ๋‹จ์œ„๋กœ ๊ฒ€์ƒ‰ํ•˜๋ฏ€๋กœ ํŠน์ • ์œ„์น˜์— ์žˆ๋Š” ๋ฌธ์ž๋‚˜ ์™„์ „ํ•œ ์ผ์น˜ ๊ฒ€์ƒ‰์—๋Š” ์ ํ•ฉํ•˜์ง€ ์•Š์Œ
    • ์žฅ์ 
      • ๋น ๋ฅธ ๊ฒ€์ƒ‰ ์†๋„ (์—ญ์ƒ‰์ธ์„ ํ™œ์šฉํ•˜์—ฌ ์ตœ์ ํ™”๋จ)
      • ์œ ์‚ฌํ•œ ๊ฒ€์ƒ‰์–ด๋„ ๋งค์นญ ๊ฐ€๋Šฅ (ํ† ํฐํ™”๋œ ๋ฐ์ดํ„ฐ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๊ฒ€์ƒ‰)
      • ์ž์—ฐ์–ด ๊ฒ€์ƒ‰(NLP)๊ณผ ์ž˜ ์–ด์šธ๋ฆผ
    • ๋‹จ์ 
      • ๋ถ€๋ถ„ ๋ฌธ์ž์—ด ๊ฒ€์ƒ‰(contains ๊ฐ™์€ ๊ฒ€์ƒ‰)์—๋Š” ์ ํ•ฉํ•˜์ง€ ์•Š์Œ
      • ์ •ํ™•ํ•œ ๋ฌธ์ž์—ด ์ผ์น˜๋ฅผ ์›ํ•  ๊ฒฝ์šฐ match_phrase ๋“ฑ์˜ ์ถ”๊ฐ€ ์„ค์ • ํ•„์š”
  • wildcard
    • wildcard๋Š” ๋ฌธ์ž์—ด ํŒจํ„ด ๋งค์นญ์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๊ฒ€์ƒ‰ ๋ฐฉ์‹
    • SQL์˜ LIKE '% keyword%'์™€ ์œ ์‚ฌํ•˜๊ฒŒ ๋™์ž‘
    • text ํƒ€์ž…์ด ์•„๋‹Œ keyword ํƒ€์ž… ํ•„๋“œ์—์„œ ์ฃผ๋กœ ์‚ฌ์šฉ
    • ๋™์ž‘ ๋ฐฉ์‹
      • ์™€์ผ๋“œ์นด๋“œ ๊ธฐํ˜ธ๋ฅผ ํ™œ์šฉํ•ด ๊ฒ€์ƒ‰.
      • ์—ญ์ƒ‰์ธ์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š๊ณ  ์ „์ฒด ๋ฌธ์ž์—ด์„ ํƒ์ƒ‰(Brute-force ๊ฒ€์ƒ‰) ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์†๋„๊ฐ€ ๋А๋ฆผ
    • ์žฅ์ 
      • ๋ถ€๋ถ„ ๋ฌธ์ž์—ด ๊ฒ€์ƒ‰ ๊ฐ€๋Šฅ (contains์™€ ์œ ์‚ฌ)
      • ์™„์ „ํ•œ ๋ฌธ์ž์—ด ๋งค์นญ์ด ํ•„์š”ํ•  ๋•Œ ์ ํ•ฉ
    • ๋‹จ์ 
      • ๊ฒ€์ƒ‰ ์„ฑ๋Šฅ์ด ๋งค์šฐ ๋‚ฎ์Œ (์—ญ์ƒ‰์ธ์„ ํ™œ์šฉํ•˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— ์ „์ฒด ๋ฌธ์„œ๋ฅผ ์Šค์บ”ํ•จ)
      • ๋Œ€๋Ÿ‰ ๋ฐ์ดํ„ฐ์—์„œ ์‚ฌ์šฉ ์‹œ ์‹ฌ๊ฐํ•œ ์„ฑ๋Šฅ ๋ฌธ์ œ ๋ฐœ์ƒ
      • ๊ฒ€์ƒ‰์–ด ์•ž๋ถ€๋ถ„์— *๋ฅผ ๋„ฃ๋Š” ๊ฒฝ์šฐ ํŠนํžˆ ๋А๋ ค์ง

Text vs Keyword

  • Text
    • ์ž…๋ ฅ๋œ ๋ฌธ์ž์—ด์„ ํ† ํฐ ๋‹จ์œ„๋กœ ์ชผ๊ฐœ์–ด ์ €์žฅ
    • ์—ญ์ƒ‰์ธ(Inverted Index) ๊ตฌ์กฐ๋ฅผ ๋งŒ๋“ค์–ด ์ „๋ฌธ ๊ฒ€์ƒ‰(Full-text Search)์— ์ ํ•ฉ
    • ์ง€์ •ํ•œ ๋ถ„์„๊ธฐ(Analyzer)์— ๋”ฐ๋ผ ํ…์ŠคํŠธ๊ฐ€ ๋ถ„๋ฆฌ๋จ
    • ์œ ์‚ฌ ๊ฒ€์ƒ‰(๋น„์Šทํ•œ ์˜๋ฏธ์˜ ๋‹จ์–ด ๊ฒ€์ƒ‰ ๊ฐ€๋Šฅ)์ •๋ ฌ, ์ง‘๊ณ„์—๋Š” ์ ํ•ฉํ•˜์ง€ ์•Š์Œ
  • Keyword
    • ์ž…๋ ฅ๋œ ๋ฌธ์ž์—ด์„ ํ•˜๋‚˜์˜ ํ† ํฐ์œผ๋กœ ์ €์žฅ (๋ถ„์„๋˜์ง€ ์•Š์Œ)
    • ์ •ํ™•ํ•œ ๊ฐ’ ์ผ์น˜ ๊ฒ€์ƒ‰์— ์ ํ•ฉ (term ์ฟผ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฒ€์ƒ‰)
    • ์ง‘๊ณ„(aggregation), ์ •๋ ฌ(sorting), ํ•„ํ„ฐ๋ง(filter)์— ์ ํ•ฉ
    • ๋ถ„์„๊ธฐ๊ฐ€ ์ ์šฉ๋˜์ง€ ์•Š์Œ (์ž…๋ ฅ๋œ ๊ฐ’ ๊ทธ๋Œ€๋กœ ์ €์žฅ๋จ)
    • ์ฃผ๋กœ ์งง๊ณ  ๊ณ ์œ ํ•œ ๋ฌธ์ž์—ด(ํƒœ๊ทธ, ์ƒํƒœ ๊ฐ’, ID ๋“ฑ)์— ์‚ฌ์šฉ๋จ

 

๋‚ด์ผ ๊ณ„ํš โฐ

  • Elasticsearch ๋™์  ์ฟผ๋ฆฌ ํ…Œ์ŠคํŠธ ์™„๋ฃŒํ•˜๊ธฐ
  • ์ง€๊ธˆ๊นŒ์ง€ ์ž‘์—…ํ•œ ๊ฒƒ๋“ค ๋ฌธ์„œํ™”ํ•˜๊ธฐ

+ ์ถ”๊ฐ€ ๊ณ„ํš์ด ๋” ์ƒ๊ธธ ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค~_~