๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
๋งˆ์ผ€ํŒ…/๋ฐ์ดํ„ฐ ๋ถ„์„

[ํŒŒ์ด์ฌ] seaborn

by ํผํฌ๋งˆ์ฒผ๋ผ 2025. 3. 13.

 

seaborn์œผ๋กœ ๊ทธ๋ž˜ํ”„ ๊ทธ๋ฆฌ๊ธฐ

seaborn : ๊ฐ„ํŽธํ•˜๊ฒŒ ๊ทผ์‚ฌํ•œ ๊ทธ๋ž˜ํ”„ ์ƒ์„ฑ

matplotlib : ์›ํ•˜๋Š”๋Œ€๋กœ ์ปค์Šคํ…€ํ•˜๊ฒŒ ๊ทธ๋ž˜ํ”„ ์ƒ์„ฑ

 

์‚ฌ์šฉ ๋ฐฉ๋ฒ•

 

์ผ๋ณ„ ๋ฐ์ดํ„ฐ๋กœ ์ž๋™์œผ๋กœ ํ‰๊ท ๊ฐ’์„ ๊ตฌํ•ด month๋กœ ๊ทธ๋ž˜ํ”„๋ฅผ ๊ทธ๋ ค์คŒ

๋””์ž์ธ

 

 


set_theme()ํ•จ์ˆ˜๋กœ ๊ทธ๋ž˜ํ”„ ์ปค์Šคํ„ฐ๋งˆ์ด์ง•ํ•˜๊ธฐ

 

 

์‹ค์Šต

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

bike_df = pd.read_csv('data/bike.csv')
sns.set_theme(rc={'figure.figsize': (10, 5)}, style='white')

# ์—ฌ๊ธฐ์— ์ฝ”๋“œ๋ฅผ ์ž‘์„ฑํ•˜์„ธ์š”.
sns.barplot(data=bike_df, x='quarter', y='registered', errorbar=None, hue='workingday')

 


๋ฐ์ดํ„ฐ ๋ถ„ํฌ ์‹œ๊ฐํ™” 1

 

 

stripplot

swarmplot

๋งŽ์ด ๋ถ„ํฌ๋œ ๊ณณ์€ ๋‚˜๋ž€ํžˆ ์˜†์œผ๋กœ ๋‚˜์˜ค๊ฒŒ

stripplot, swarmplot : ๋กœ์šฐ๊ฐ€ ์ˆ˜์‹ญ๊ฐœ ~ ์ˆ˜๋ฐฑ๊ฐœ ์ •๋„ ๋˜๋Š” ๋น„๊ต์  ์ž‘์€ ๋ฐ์ดํ„ฐ์…‹์„ ๋ถ„์„ํ•  ๋•Œ ์‚ฌ์šฉ

 

์‹ค์Šต

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

bike_df = pd.read_csv('data/bike.csv')
sns.set_theme(rc={'figure.figsize': (10, 5)}, style='white')

# ์—ฌ๊ธฐ์— ์ฝ”๋“œ๋ฅผ ์ž‘์„ฑํ•˜์„ธ์š”.
sns.stripplot(data=bike_df, x='month', y='temperature')

 


๋ฐ์ดํ„ฐ ๋ถ„ํฌ ์‹œ๊ฐํ™” 2

 

boxplot

 

order ๋ฅผ ์จ์„œ ์š”์ผ์„ ์ •๋ ฌํ•  ์ˆ˜ ์žˆ๋‹ค.

 

violinplot

๋ฐ•์Šค ํ”Œ๋กฏ๊ณผ ๋ณด๋Š” ๋ฐฉ์‹์ด ๊ฑฐ์˜ ์œ ์‚ฌํ•˜๋‹ค.

ํฐ์ ์ด ์ค‘๊ฐ„๊ฐ’ ๋‘๊บผ์šด ๋ง‰๋Œ€๊ฐ€ IQR

 

histplot

ํ•œ๊ฐ€์ง€ ๊ฐ’์— ๋Œ€ํ•ด ๋ถ„ํฌ๋ฅผ ํ™•์ธํ•˜๋Š” ๊ฒƒ์ด๋ฏ€๋กœ x๊ฐ’๋งŒ ๋„ฃ์Œ

y๋กœ ๋ฐ”๊พธ๋ฉด ๊ฐ€๋กœ๋กœ ๋‚˜์˜ด

 

multiple์„ ์‚ฌ์šฉํ•˜๋ฉด ์›๋ž˜ ๋ง‰๋Œ€๊ทธ๋ž˜ํ”„์— ๊ฒน์ณ์„œ ๋‚˜์˜จ๋‹ค.

 

 

kdeplot

 

์‹ค์Šต 1

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

flight_df = pd.read_csv('data/flight.csv')
sns.set_theme(rc={'figure.figsize': (6, 6)}, style='white')

# ์—ฌ๊ธฐ์— ์ฝ”๋“œ๋ฅผ ์ž‘์„ฑํ•˜์„ธ์š”.
sns.violinplot(data=flight_df, x='class', y='price')
plt.show()

 

์‹ค์Šต 2

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

insurance_df = pd.read_csv('data/insurance_charge.csv')
sns.set_theme(rc={'figure.figsize': (10, 5)}, style='white')

# ์—ฌ๊ธฐ์— ์ฝ”๋“œ๋ฅผ ์ž‘์„ฑํ•˜์„ธ์š”.
sns.histplot(data=insurance_df, x='charge', hue='smoking', multiple='stack')
plt.show()


์ƒ๊ด€๊ด€๊ณ„ ์‹œ๊ฐํ™”

์‚ฐ์ ๋„

์–ด๋–ค ๋ฐ์ดํ„ฐ์˜ ์ƒ๊ด€๊ด€๊ณ„ ํ™•์ธ

 

์ƒ๊ด€๊ณ„์ˆ˜

์ƒ๊ด€๊ด€๊ณ„๋ฅผ ์ˆ˜๋กœ ํ‘œํ˜„ํ•œ ๊ฒƒ

 

ํ”ผ์–ด์Šจ ์ƒ๊ด€๊ณ„์ˆ˜ (-1 ~ 1)

์ƒ๊ด€๊ณ„์ˆ˜ = 0 ์ผ๋•Œ, ๋‘ ๊ฐ’ ์‚ฌ์ด์— ์ƒ๊ด€ ๊ด€๊ณ„๊ฐ€ ์—†๋‹ค.

์ƒ๊ด€๊ณ„์ˆ˜ > 0 ์ด๋ฉด, ์–ด๋–ค ๊ฐ’์ด ์ปค์งˆ ๋•Œ ๋‹ค๋ฅธ ๊ฐ’๋„ ํ•จ๊ป˜ ์ปค์ง„๋‹ค. (์–‘์˜ ์ƒ๊ด€๊ด€๊ณ„)

์ƒ๊ด€๊ณ„์ˆ˜ < 0 ์ด๋ฉด, ์–ด๋–ค ๊ฐ’์ด ์ปค์งˆ ๋•Œ ๋‹ค๋ฅธ ๊ฐ’์€ ์ž‘์•„์ง„๋‹ค. (์Œ์˜ ์ƒ๊ด€๊ด€๊ณ„)

 

์ƒ๊ด€๊ด€๊ณ„์˜ ๊ฐ•๋„

์ƒ๊ด€๊ณ„์ˆ˜๊ฐ€ ์–ผ๋งˆ๋‚˜ ํฌ๊ฑฐ๋‚˜ ์ž‘์€ ๊ฐ’์ธ์ง€ ํ™•์ธ

๊ฐ’์ด 1์— ๊ฐ€๊นŒ์šธ์ˆ˜๋ก ๊ฐ•ํ•ด์ง€๊ณ  0์— ๊ฐ€๊นŒ์šธ์ˆ˜๋ก ์—ฐ๊ด€์„ฑ์ด ์•ฝํ•ด์ง„๋‹ค.

 

scatterplot

regplot

ํšŒ๊ท€์„  : ๊ฐ€์šด๋ฐ ์„ 

๋‘ ๊ฐ’์˜ ์ƒ๊ด€๊ด€๊ณ„๋ฅผ ์š”์•ฝํ•ด์„œ ํ‘œํ˜„ํ•˜๋Š” ์—ญํ• 

์ด์ „ ๊ทธ๋ž˜ํ”„์— ๋น„ํ•ด ์ƒ๊ด€๊ด€๊ณ„๊ฐ€ ์•ฝํ•ด๋ณด์ด๊ณ  ์Œ์˜ ์ƒ๊ด€๊ด€๊ณ„ ์ธ ๊ฒƒ ๊ฐ™๋‹ค.

์ƒ๊ด€๊ณ„์ˆ˜๋ฅผ ํ™•์ธํ•˜๋Š” ๋ฒ•

total์— ๋Œ€ํ•œ ์ƒ๊ด€๊ณ„์ˆ˜

์ƒ๊ด€๊ณ„์ˆ˜์˜ ๊ฐ’ ์ •๋ ฌ

์ƒ‰์ด ์ง„ํ• ์ˆ˜๋ก ์ž‘์€ ๊ฐ’, ์ƒ‰์ด ์—ฐํ• ์ˆ˜๋ก ๋†’์€ ๊ฐ’

์‹ค์Šต 1 

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

wine_df = pd.read_csv('data/wine.csv')
sns.set_theme(rc={'figure.figsize': (6, 6)}, style='white')

# ์—ฌ๊ธฐ์— ์ฝ”๋“œ๋ฅผ ์ž‘์„ฑํ•˜์„ธ์š”.
sns.regplot(data=wine_df, x='price', y='points')
plt.show()

 

์‹ค์Šต 2

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

insurance_df = pd.read_csv('data/insurance_premium.csv')
sns.set_theme(rc={'figure.figsize': (8, 6)}, style='white')

# ์—ฌ๊ธฐ์— ์ฝ”๋“œ๋ฅผ ์ž‘์„ฑํ•˜์„ธ์š”.
sns.heatmap(insurance_df.corr(numeric_only=True),annot=True)

 


์ฝ”๋“œ์ž‡ 12 seaborn