Notice
Recent Posts
Recent Comments
Link
Today
Total
ยซ   2025/07   ยป
์ผ ์›” ํ™” ์ˆ˜ ๋ชฉ ๊ธˆ ํ† 
1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31
Tags more
Archives
๊ด€๋ฆฌ ๋ฉ”๋‰ด

๊ฐ์ž์˜ Data Lab ๐Ÿ“Š

[๋ฉ‹์Ÿ์ด์‚ฌ์ž์ฒ˜๋Ÿผ ๋ฐ์ดํ„ฐ๋ถ„์„ ๋ถ€ํŠธ์บ ํ”„ 5๊ธฐ] EDA ํ”„๋กœ์•ผ๊ตฌ ๋ฐ์ดํ„ฐโšพ๏ธ ๋ถ„์„ ํ•ด๋ณด๊ธฐ ๋ณธ๋ฌธ

๋ฉ‹์Ÿ์ด์‚ฌ์ž์ฒ˜๋Ÿผ เป’(โŠ™แด—โŠ™)เฅญโœŽ

[๋ฉ‹์Ÿ์ด์‚ฌ์ž์ฒ˜๋Ÿผ ๋ฐ์ดํ„ฐ๋ถ„์„ ๋ถ€ํŠธ์บ ํ”„ 5๊ธฐ] EDA ํ”„๋กœ์•ผ๊ตฌ ๋ฐ์ดํ„ฐโšพ๏ธ ๋ถ„์„ ํ•ด๋ณด๊ธฐ

๊ฐ์ž์Šˆ๋‹ˆ 2025. 5. 21. 21:31

0. ํ•™์Šต๋ชฉํ‘œ

โœ… ์ˆ˜์—…์‹œ๊ฐ„์— ๋ฐฐ์šด ๋‚ด์šฉ ์ •๋ฆฌํ•˜๊ธฐ
โœ… ๋ฐ์ดํ„ฐ ์ •์ œ ํ˜ผ์ž ํ•ด๋ณด๊ธฐ
โœ… ์‚ผ์„ฑ ๋ผ์ด์˜จ์ฆˆ์— ๋Œ€ํ•œ ๋ถ„์„ ๋” ํ•ด๋ณด๊ธฐ


1. ๋ฐ์ดํ„ฐ ํ›‘์–ด๋ณด๊ธฐ & ์ „์ฒ˜๋ฆฌ

์˜ค๋Š˜์˜ ๋ฐ์ดํ„ฐ๋Š” ํ”„๋กœ์•ผ๊ตฌ ๋ฐ์ดํ„ฐ์ด๋‹ค ๐Ÿงขโšพ๏ธ


๋ถ„์„์— ์•ž์„œ์„œ ์ „์ฒ˜๋ฆฌ๋ฅผ ํ•  ๊ฒƒ์ธ๋ฐ,, (์‚ฌ์‹ค ์ด ๋ถ€๋ถ„์ด ์ข€ ์–ด๋ ค์› ๋‹ค)

๋กฏ๋ฐ vs ์‚ผ์„ฑ ๊ฒฝ๊ธฐ์—์„œ ๋กฏ๋ฐ๊ฐ€ ์ด๊ฒผ๋‹ค๊ณ  ํ•˜์ž,
๊ทธ๋Ÿผ ์ด๋•Œ ๋‚˜๋Š” ๋ถ„์„์„ ์œ„ํ•ด์„œ ๋กฏ๋ฐ์˜ ์ž…์žฅ์œผ๋กœ ์ด๊ฒผ๋‹ค์™€ ์‚ผ์„ฑ์˜ ์ž…์žฅ์œผ๋กœ ์กŒ๋‹ค.
์ด ๋‘๊ฐ€์ง€ ๋ฐ์ดํ„ฐ๋ฅผ ๋ณด๊ณ ์‹ถ๋‹ค.

์ฆ‰, ํ•œ ๊ฒฝ๊ธฐ์— ์ฐธ์—ฌํ•˜๋Š” ๋‘ ํŒ€์˜ ์ž…์žฅ ๋ชจ๋‘ ๋ฐ์ดํ„ฐ๋กœ ๋ณด๊ณ ์‹ถ๋‹ค๋Š” ๋œป !!
๊ทผ๋ฐ ์œ„ ๋ฐ์ดํ„ฐ์—์„œ ํŒ€1์„ ๊ธฐ์ค€์œผ๋กœ ํ–ˆ์„ ๋•Œ, ํŒ€1์˜ ์ž…์žฅ๋งŒ ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

๊ทธ๋ž˜์„œ ์œ„ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์„ ์นดํ”ผํ•ด์„œ df2์— ๋„ฃ์–ด์ฃผ๊ณ  ํŒ€1 ↔๏ธ ํŒ€2 ๋ฅผ ๋ฐ”๊พผ df2๋ฅผ df1์— concat ํ•ด์ค„ ๊ฒƒ์ด๋‹ค.


# ์›๋ณธ ๋ฐ์ดํ„ฐ๋ฅผ df1์— ๋„ฃ์–ด์ค€๋‹ค.
df1 = pd.read_csv('data/baseball2.csv')

# ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ ๋ณต์ œํ•œ๋‹ค.
df2 = df1.copy()
# ์ปฌ๋Ÿผ ์ด๋ฆ„์„ ๋ณ€๊ฒฝํ•œ๋‹ค.
df2.columns = ['๊ฒฝ๊ธฐ์ผ์ž', 'ํŒ€2์ด๋ฆ„', 'ํŒ€1์ด๋ฆ„', 'ํŒ€2์ ์ˆ˜', 'ํŒ€1์ ์ˆ˜']

# ๋‘ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ ํ•ฉ์นœ๋‹ค.
df3 = pd.concat([df1, df2])
df3.reset_index(inplace=True, drop=True)
df3

ใ„ด ๊ธฐ์กด์—” 8749ํ–‰์ธ๋ฐ, concatํ•˜๊ณ  17498ํ–‰์œผ๋กœ ๋Š˜์–ด๋‚œ ๊ฒƒ์„ ํ™•์ธ 

 


2. ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ

๋จผ์ €, ํŒ€1, ํŒ€2์˜ ์ด๋ฆ„์„ ํ™•์ธํ•ด๋ณด๋ฉด

# ํŒ€ ์ด๋ฆ„์„ ํ™•์ธํ•œ๋‹ค.
a1 = df3['ํŒ€1์ด๋ฆ„'].value_counts()
a2 = df3['ํŒ€2์ด๋ฆ„'].value_counts()

display(a1)
display(a2)

์‚ฌ์ง„์€ ํŒ€1๋งŒ ๊ฐ€์ ธ์˜ด ใ…Žใ…Ž

 

๐Ÿšจ Issues ๐Ÿšจ

- ์˜›๋‚  ๋ฐ์ดํ„ฐ๋ผ์„œ ๋„ฅ์„ผ์ด ์กด์žฌํ•œ๋‹ค   โžก๏ธ ํ‚ค์›€์œผ๋กœ ํ•ฉ์น˜๊ธฐ
- ์˜›๋‚  ๋ฐ์ดํ„ฐ๋ผ์„œ SK๊ฐ€ ์กด์žฌํ•œ๋‹ค   โžก๏ธ ssg์œผ๋กœ ํ•ฉ์น˜๊ธฐ
- kt, KT ๋‘ ๊ฐœ๊ฐ€ ์กด์žฌํ•œ๋‹ค   โžก๏ธ KT๋กœ ํ•ฉ์น˜๊ธฐ
- ๋‚˜๋ˆ”, ๋“œ๋ฆผ, ์›จ์Šคํ„ด, ์ด์Šคํ„ด์€ ์˜ฌ์Šคํƒ€์ „์ด๊ธฐ ๋•Œ๋ฌธ์— ํ˜„์žฌ ๋ถ„์„์—์„œ๋Š” ๋นผ๊ธฐ๋กœ ํ•œ๋‹ค   โžก๏ธ drop ํ•˜๊ธฐ

# ํŒ€ ์ด๋ฆ„์ด ๋„ฅ์„ผ์ธ ๊ฒƒ์ธ ๊ฒƒ์„ ํ‚ค์›€์œผ๋กœ ๋ณ€๊ฒฝํ•œ๋‹ค.
idx1 = df3.query('ํŒ€1์ด๋ฆ„ == "๋„ฅ์„ผ"').index
df3.loc[idx1, 'ํŒ€1์ด๋ฆ„'] = 'ํ‚ค์›€'

idx2 = df3.query('ํŒ€2์ด๋ฆ„ == "๋„ฅ์„ผ"').index
df3.loc[idx2, 'ํŒ€2์ด๋ฆ„'] = 'ํ‚ค์›€'

# ํŒ€ ์ด๋ฆ„์ด kt์ธ ๊ฒƒ์ธ ๊ฒƒ์„ KT์œผ๋กœ ๋ณ€๊ฒฝํ•œ๋‹ค.
idx3 = df3.query('ํŒ€1์ด๋ฆ„ == "kt"').index
df3.loc[idx3, 'ํŒ€1์ด๋ฆ„'] = 'KT'

idx4 = df3.query('ํŒ€2์ด๋ฆ„ == "kt"').index
df3.loc[idx4, 'ํŒ€2์ด๋ฆ„'] = 'KT'

# ํŒ€ ์ด๋ฆ„์ด sk ์ธ ๊ฒƒ์„ ssg๋กœ ๋ณ€๊ฒฝํ•œ๋‹ค.
idx5 = df3.query('ํŒ€1์ด๋ฆ„ == "SK"').index
df3.loc[idx5, 'ํŒ€1์ด๋ฆ„'] = 'SSG'

idx6 = df3.query('ํŒ€2์ด๋ฆ„ == "SK"').index
df3.loc[idx6, 'ํŒ€2์ด๋ฆ„'] = 'SSG'


# ์˜ฌ์Šคํƒ€์ „ Drop
drop_list = ['๋“œ๋ฆผ', '๋‚˜๋ˆ”', '์ด์Šคํ„ด', '์›จ์Šคํ„ด']

idx7 = df3.query('ํŒ€1์ด๋ฆ„ in @drop_list or ํŒ€2์ด๋ฆ„ in @drop_list').index

df3.drop(idx7, inplace=True)

 


๐Ÿ† ๊ฒฝ๊ธฐ ๊ฒฐ๊ณผ ์ปฌ๋Ÿผ์„ ์ถ”๊ฐ€ํ•˜์ž

#ํŒŒ์ƒ๋ณ€์ˆ˜

์ผ๋‹จ '๊ฒฝ๊ธฐ๊ฒฐ๊ณผ' ์ปฌ๋Ÿผ์— '๋น„๊น€' ์ด๋ผ๋Š” ๊ฐ’์„ ๋„ฃ์–ด๋‘๊ณ ,
์กฐ๊ฑด๋ฌธ์„ ์ด์šฉํ•˜์—ฌ '์Šน๋ฆฌ' ์™€ 'ํŒจ๋ฐฐ'๋ฅผ ๊ฐ’์œผ๋กœ ๋„ฃ์–ด์ค€๋‹ค.

# ๊ฒฝ๊ธฐ ๊ฒฐ๊ณผ ์ปฌ๋Ÿผ์„ ์ถ”๊ฐ€ํ•œ๋‹ค.
df3['๊ฒฝ๊ธฐ๊ฒฐ๊ณผ'] = '๋น„๊น€'

# ํŒ€1์ด ์ด๊ธด ๊ฒฝ๊ธฐ๋ฅผ ๊ฐ€์ ธ์˜จ๋‹ค.
idx1 = df3.query('ํŒ€1์ ์ˆ˜ > ํŒ€2์ ์ˆ˜').index
df3.loc[idx1, '๊ฒฝ๊ธฐ๊ฒฐ๊ณผ'] = '์Šน๋ฆฌ'

# ํŒ€1์ด ์ง„ ๊ฒฝ๊ธฐ๋ฅผ ๊ฐ€์ ธ์˜จ๋‹ค.
idx2 = df3.query('ํŒ€1์ ์ˆ˜ < ํŒ€2์ ์ˆ˜').index
df3.loc[idx2, '๊ฒฝ๊ธฐ๊ฒฐ๊ณผ'] ='ํŒจ๋ฐฐ'

df3['๊ฒฝ๊ธฐ๊ฒฐ๊ณผ'].value_counts()

ใ„ด ๊ฒฐ๊ณผ์—์„œ ์Šน๋ฆฌ์™€ ํŒจ๋ฐฐ ๊ฐ’์ด ๊ฐ™์€๊ฑด ๋‹น์—ฐํ•˜๋‹ค.
df3์€ df1๊ณผ df2(df1์˜ ํŒ€1, ํŒ€2 ์ˆœ์„œ ๋ฐ”๊ฟ”๋†“์€)๋ฅผ ํ•ฉ์นœ ๊ฒƒ์ด๊ธฐ ๋•Œ๋ฌธ์—
ํŒ€1๊ณผ ํŒ€2์˜ ๊ตฌ์„ฑ์€ ๋™์ผํ•˜๋‹ค.


3. EDA

1) ๊ฐ ํŒ€์ด ์น˜๋ฅธ ๊ฒฝ๊ธฐ ํšŸ์ˆ˜๋ฅผ ๊ตฌํ•ด๋ณด์ž.

# ํ•„์š”ํ•œ ๋ฐ์ดํ„ฐ ๊ฐ€์ ธ์˜ค๊ธฐ
df4 = df3[['ํŒ€1์ด๋ฆ„', '๊ฒฝ๊ธฐ๊ฒฐ๊ณผ']]
play_count = df4.groupby('ํŒ€1์ด๋ฆ„').count()

# ๋‚ด๋ฆผ์ฐจ์ˆœ์œผ๋กœ ๊ฐ€์ ธ์˜ค๊ธฐ
play_count.reset_index().sort_values('๊ฒฝ๊ธฐ๊ฒฐ๊ณผ', ascending = False)
 


2) ๊ฐ ํŒ€์˜ ์Šน๋ฆฌ ํšŸ์ˆ˜๋ฅผ ๊ตฌํ•ด๋ณด์ž

# ์Šน๋ฆฌํ•œ ํŒ€๋งŒ ๊ฐ€์ ธ์˜ค๊ธฐ
df4 = df3.query('๊ฒฝ๊ธฐ๊ฒฐ๊ณผ == "์Šน๋ฆฌ"')
df5 = df4[['ํŒ€1์ด๋ฆ„', '๊ฒฝ๊ธฐ๊ฒฐ๊ณผ']]
win_count = df5.groupby('ํŒ€1์ด๋ฆ„').count()
win_count


3) ์œ„ ๋ฐ์ดํ„ฐ๋“ค์„ ์ด์šฉํ•˜์—ฌ ๊ฐํŒ€์˜ ์Šน๋ฅ ์„ ๊ตฌํ•ด๋ณด์ž

์Šน๋ฅ  =( ์ด๊ธด ํšŸ์ˆ˜ / ์ด ๊ฒฝ๊ธฐ ํšŸ์ˆ˜) * 100

team_win_rate = win_count / play_count * 100
# ์ •์ˆ˜ํ˜•์œผ๋กœ ํƒ€์ž… ๋ณ€ํ™˜
team_win_rate = team_win_rate.astype('int')
team_win_rate = team_win_rate.sort_values('๊ฒฝ๊ธฐ๊ฒฐ๊ณผ', ascending = False)

์•ผ๊ตฌ๋Š” ์ž˜ ๋ชจ๋ฅด์ง€๋งŒ ํ•œํ™”๋Š” ์ž˜ ์•ˆ๋‹ค.... ์—ฌ๋Ÿฌ ์˜๋ฏธ๋กœ ใ…‹ใ…‹

 

์Šน๋ฅ  ์‹œ๊ฐํ™” ํ•ด๋ณด๊ธฐ

seaborn์˜ barplot์œผ๋กœ ๊ทธ๋ ค๋ณด์ž.

a1 = team_win_rate.reset_index()
a1.columns = ['ํŒ€1์ด๋ฆ„', '์Šน๋ฅ ']


sns.barplot(data = a1, x = 'ํŒ€1์ด๋ฆ„', y = '์Šน๋ฅ ', hue = 'ํŒ€1์ด๋ฆ„', palette='summer')
plt.show()

 


4) ์‚ผ์„ฑ์˜ ์Šน๋ฅ ์ด ๊ฐ€์žฅ ๋‚ฎ์•˜๋˜ ํ•ด์— ์‚ผ์„ฑ๊ณผ ๋งž๋ถ™์€ ํŒ€๋“ค์˜ ์ƒ๋Œ€์Šน๋ฅ ์„ ๊ตฌํ•ด๋ณด์ž (๊ฐœ์ธ ๋ถ„์„)

์•ผ๊ตฌ์— ๋Œ€ํ•ด ๋ณ„๋กœ ๊ด€์‹ฌ์€ ์—†์ง€๋งŒ ์—ฐ๊ณ ์ง€ ์•ผ๊ตฌํŒ€ ๋ถ„์„์€ ํ•ด๋ด์•ผํ•˜์ง€ ์•Š๊ฒ ๋‚˜...ใ…Žใ…Ž

๋ถ„์„ ์ˆœ์„œ

  • ์‚ผ์„ฑ๋ผ์ด์˜จ์ฆˆ์˜ ๋…„๋„๋ณ„ ์Šน๋ฅ ์„ ๊ตฌํ•œ๋‹ค.
  • ์Šน๋ฅ ์„ ์˜ค๋ฆ„์ฐจ์ˆœ ์ •๋ ฌํ•œ๋‹ค.
  • 0๋ฒˆ์งธ ํ–‰(์Šน๋ฅ ์ด ๊ฐ€์žฅ ๋‚ฎ์€ ํ–‰)์˜ ๋…„๋„๋ฅผ ๊ฐ€์ ธ์˜จ๋‹ค.
  • ํ•ด๋‹น ํ•ด์— ์‚ผ์„ฑ๊ณผ ๊ฒฝ๊ธฐํ•œ ํŒ€ ๋“ค์˜ ์Šน๋ฅ ์„ ๊ตฌํ•ด์„œ ์‹œ๊ฐํ™” ํ•œ๋‹ค.

 

# ์‚ผ์„ฑ์˜ ๋…„๋„๋ณ„ ์Šน๋ฅ ์„ ๊ตฌํ•˜๊ธฐ
# ๊ฒฝ๊ธฐ์ผ์ž๊ฐ€ yyyy-mm-dd ํ˜•ํƒœ์ด๋ฏ€๋กœ ๋ฌธ์ž์—ด 4๋ฒˆ์งธ ๊นŒ์ง€์˜ ๊ฐ’๋งŒ ๋…„๋„์— ๋„ฃ์–ด์ค€๋‹ค.
df3['๊ฒฝ๊ธฐ๋…„๋„'] = df3['๊ฒฝ๊ธฐ์ผ์ž'].str[:4]

# ์‚ผ์„ฑ์˜ ๋…„๋„๋ณ„ ์Šน๋ฅ 
# ๊ฒฝ๊ธฐ์ผ์ž๊ฐ€ yyyy-mm-dd ํ˜•ํƒœ์ด๋ฏ€๋กœ ๋ฌธ์ž์—ด 4๋ฒˆ์งธ ๊นŒ์ง€์˜ ๊ฐ’๋งŒ ๋…„๋„์— ๋„ฃ์–ด์ค€๋‹ค.
df3['๊ฒฝ๊ธฐ๋…„๋„'] = df3['๊ฒฝ๊ธฐ์ผ์ž'].str[:4]

# ํ•„์š”ํ•œ ๋ฐ์ดํ„ฐ ๊ฐ€์ ธ์˜ค๊ธฐ
df4 = df3[['ํŒ€1์ด๋ฆ„', '๊ฒฝ๊ธฐ๊ฒฐ๊ณผ', '๊ฒฝ๊ธฐ๋…„๋„']]

# ์‚ผ์„ฑ ํŒ€์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ ธ์˜จ๋‹ค.
samsung = df4.query('ํŒ€1์ด๋ฆ„ == "์‚ผ์„ฑ"')
# ์Šน๋ฆฌํ•œ ๊ฒฝ๊ธฐ๋งŒ ๊ฐ€์ ธ์˜ค๊ธฐ
win_game = samsung.query('๊ฒฝ๊ธฐ๊ฒฐ๊ณผ == "์Šน๋ฆฌ"')
    
# ๊ฐ ๋…„๋„๋ณ„ ๊ฒฝ๊ธฐ ํšŸ์ˆ˜ ๊ฐ€์ ธ์˜ค๊ธฐ
total_by_year = samsung.groupby('๊ฒฝ๊ธฐ๋…„๋„')['๊ฒฝ๊ธฐ๊ฒฐ๊ณผ'].count()
# ๊ฐ ๋…„๋„๋ณ„ ์Šน๋ฆฌ ํšŸ์ˆ˜ ๊ฐ€์ ธ์˜ค๊ธฐ
win_by_year = win_game.groupby('๊ฒฝ๊ธฐ๋…„๋„')['๊ฒฝ๊ธฐ๊ฒฐ๊ณผ'].count()
    
# ์Šน๋ฅ  ๊ตฌํ•˜๊ธฐ
win_rate = win_by_year / total_by_year * 100
win_rate = win_rate.astype('int')
win_rate = win_rate.sort_index()


win_rate_sort = win_rate.sort_values()
print("samsung์˜ ๋…„๋„๋ณ„ ์Šน๋ฅ ")
print(win_rate_sort)


์™€์šฐ ๊ฝค๋‚˜ ์ตœ๊ทผ์— ๋งŽ์ด ๋–จ์–ด์กŒ๊ตฐ .... 

 

์ด์ œ ์‚ผ์„ฑ์˜ ์Šน๋ฅ ์ด ๊ฐ€์žฅ ๋‚ฎ์•˜๋˜ ๋…„๋„๋ฅผ bad_year์— ์ €์žฅํ•ด์ค€๋‹ค.

# ์‚ผ์„ฑ์˜ ์Šน๋ฅ ์ด ๊ฐ€์žฅ ๋‚ฎ์•˜๋˜ ๋…„๋„
bad_year = win_rate_sort.index[0]


์ด์ œ ์‚ผ์„ฑ์— ์ด๊ธด ํŒ€๋“ค์˜ ์Šน๋ฅ ์„ ๊ณ„์‚ฐํ•œ๋‹ค.

# ์‚ผ์„ฑ์˜ ์Šน๋ฅ ์ด ๊ฐ€์žฅ ๋‚ฎ์•˜๋˜ ๋…„๋„
bad_year = win_rate_sort.index[0]
# ํ•ด๋‹น ๋…„๋„์˜ ๊ฒŒ์ž„๋งŒ ๊ฐ€์ ธ์˜จ๋‹ค.
df5 = df3.query('๊ฒฝ๊ธฐ๋…„๋„ == @bad_year')

# ์‚ผ์„ฑ์ด ํŒ€1๋กœ ์ฐธ๊ฐ€ํ–ˆ๊ณ  ํŒจ๋ฐฐํ•œ ๊ฒฝ๊ธฐ๋งŒ ํ•„ํ„ฐ๋ง
samsung_loss = df5.query('ํŒ€1์ด๋ฆ„ == "์‚ผ์„ฑ" and ๊ฒฝ๊ธฐ๊ฒฐ๊ณผ == "ํŒจ๋ฐฐ"')

# ์ƒ๋Œ€ ํŒ€ ์ด๋ฆ„ ๊ฐ€์ ธ์˜ค๊ธฐ (ํŒ€2์˜ ์ด๋ฆ„)
opponent_wins = samsung_loss['ํŒ€2์ด๋ฆ„'].value_counts()

opponent_wins

๊ธฐ์•„๋ž‘ ๋‘์‚ฐ์ด ๋งŽ์ด ์ด๊ฒผ๊ตฐ....
๊ทผ๋ฐ, ์ด๊ธด ๊ฒฝ๊ธฐ ํšŸ์ˆ˜๋Š” ์‚ฌ์‹ค ์ง€ํ‘œ๋กœ ๋ณด๊ธฐ ์–ด๋ ต๋‹ค.
์™œ๋ƒ๋ฉด ๊ฐ ํŒ€๋ณ„๋กœ ๊ฒฝ๊ธฐํ•œ ํšŸ์ˆ˜๊ฐ€ ๋‹ค ๋‹ค๋ฅด๊ธฐ ๋•Œ๋ฌธ..!
๊ทธ๋ž˜์„œ ์ƒ๋Œ€์ง€ํ‘œ์ธ ์Šน๋ฅ ๋กœ ๋ด์•ผํ•œ๋‹ค.

์ด์ œ ์‹œ๊ฐํ™”๋ฅผ ์œ„ํ•ด ์‚ผ์„ฑ ์ƒ๋Œ€์Šน๋ฅ ์„ ๊ตฌํ•ด๋ณด์ž!

# ์‚ผ์„ฑ์˜ ์Šน๋ฅ ์ด ๊ฐ€์žฅ ๋‚ฎ์•˜๋˜ ๋…„๋„
bad_year = win_rate_sort.index[0]
# ํ•ด๋‹น ์—ฐ๋„ ๊ฒฝ๊ธฐ๋งŒ ์ถ”์ถœ
samsung_games = df3.query('๊ฒฝ๊ธฐ๋…„๋„ == @bad_year')

### ์ฐธ๊ณ : ๊ฒฝ๊ธฐ ๊ฒฐ๊ณผ๋Š” ํŒ€1์„ ๊ธฐ์ค€์œผ๋กœ ์Šน๋ฆฌ, ํŒจ๋ฐฐ์ด๋‹ค.
# ์‚ผ์„ฑ์ด ํŒ€1์ธ ๊ฒฝ์šฐ, ๊ฒฝ๊ธฐ๊ฒฐ๊ณผ๊ฐ€ ํŒจ๋ฐฐ๋ฉด ํŒ€2๊ฐ€ ์Šน
team1_loss = samsung_games.query('ํŒ€1์ด๋ฆ„ == "์‚ผ์„ฑ" and ๊ฒฝ๊ธฐ๊ฒฐ๊ณผ == "ํŒจ๋ฐฐ"')
team1_total = samsung_games.query('ํŒ€1์ด๋ฆ„ == "์‚ผ์„ฑ"').groupby('ํŒ€2์ด๋ฆ„').size().to_frame('์ด๊ฒฝ๊ธฐ์ˆ˜')
team1_wins = team1_loss.groupby('ํŒ€2์ด๋ฆ„').size().to_frame('์‚ผ์„ฑ ์ƒ๋Œ€์Šน')


#  ์‚ผ์„ฑ์ด ํŒ€2์ธ ๊ฒฝ์šฐ, ๊ฒฝ๊ธฐ๊ฒฐ๊ณผ๊ฐ€ ์Šน๋ฆฌ๋ฉด ํŒ€1๊ฐ€ ์Šน
team2_loss = samsung_games.query('ํŒ€2์ด๋ฆ„ == "์‚ผ์„ฑ" and ๊ฒฝ๊ธฐ๊ฒฐ๊ณผ == "์Šน๋ฆฌ"')
team2_total = samsung_games.query('ํŒ€2์ด๋ฆ„ == "์‚ผ์„ฑ"').groupby('ํŒ€1์ด๋ฆ„').size().to_frame('์ด๊ฒฝ๊ธฐ์ˆ˜')
team2_wins = team2_loss.groupby('ํŒ€1์ด๋ฆ„').size().to_frame('์‚ผ์„ฑ ์ƒ๋Œ€์Šน')

# ๋‘ ๊ฒฝ์šฐ๋ฅผ ํ•ฉ์น˜๊ธฐ
total_games = pd.concat([team1_total, team2_total]).groupby(level=0).sum()
win_games = pd.concat([team1_wins, team2_wins]).groupby(level=0).sum()

win_rate = (win_games['์‚ผ์„ฑ ์ƒ๋Œ€์Šน'] / total_games['์ด๊ฒฝ๊ธฐ์ˆ˜'] * 100).round(0)

print(win_rate)

 

ํœด ์ฝ”๋“œ ๋„ˆ๋ฌด ์–ด๋ ต๋‹ค...

์ด์ œ ์œ„ ๋ฐ์ดํ„ฐ๋กœ ์‹œ๊ฐํ™”๋ฅผ ํ• ๊ฑด๋ฐ seaborn์˜ barplot์œผ๋กœ ํ•ด๋ณผ ๊ฒƒ์ด๋‹ค.

sns.barplot(win_rate, palette = 'summer')

 

๐Ÿง  ๊ทธ๋ž˜ํ”„ ํ•ด์„ํ•ด๋ณด๊ธฐ

๊ธฐ์•„์™€ ๋‘์‚ฐ์˜ ์ƒ๋Œ€ ์Šน๋ฅ ์ด ๊ฐ€์žฅ ๋†’๊ณ , ํ•œํ™”๊ฐ€ ๊ฐ€์žฅ ๋‚ฎ๋‹ค.
์Šน๋ฅ ๋งŒ ๋ดค์„ ๋• NC๊ฐ€ ๋†’์•˜๋Š”๋ฐ, ์‚ผ์„ฑ ์ƒ๋Œ€ ์Šน๋ฅ ์€ ํ•˜์œ„๊ถŒ์ด๋‹ค.
์™œ์ด๋Ÿด๊นŒ..? ์•ผ์•Œ๋ชป์ธ ๋‚˜๋Š” ๋ฌด์Šจ ์‚ฌ๊ฑด์ด ์žˆ์—ˆ๋Š”์ง€,,, ์™œ๊ทธ๋Ÿฐ์ง€ ์ž˜ ๋ชจ๋ฅด๊ฒ ๋‹ค. ใ…Žใ…Ž;;


๐Ÿ’ญ ๋А๋‚€ ์  ๋ฐ ํ–ฅํ›„ ๊ณ„ํš

์ง€๋‚œ๋ฒˆ ๋ณด๋‹ค ์ฝ”๋“œ ๋‚œ์ด๋„๊ฐ€ ์กฐ๊ธˆ ๋” ์˜ฌ๋ผ๊ฐ€์„œ ๊ทธ๋Ÿฐ๊ฐ€ ์ข€ ๋” ๋ฒ„๊ฑฐ์› ๋‹ค.
๊ทธ๋ž˜๋„ ์ง€ํ”ผํ‹ฐ ๋„์›€์œผ๋กœ ๊พธ์—ญ๊พธ์—ญ ์›ํ•˜๋Š” ๋ถ„์„์— ์„ฑ๊ณตํ•˜๋‹ˆ ๋ฟŒ๋“ฏํ•˜๋‹ค.

์ตœ๊ทผ์— ํƒœ๋ธ”๋กœ๋ฅผ ๋ฐฐ์šฐ๊ณ  ์žˆ๋Š”๋ฐ, ํƒœ๋ธ”๋กœ์—์„œ๋„ ๋ถ„์„์„ ํ•ด๋ณด๊ณ  ์‹ถ๋‹ค.
์‚ผ์„ฑ๋ผ์ด์˜จ์ฆˆ ํ…Œ๋งˆ๋กœ ๋Œ€์‹œ๋ณด๋“œ๋„ ๊พธ๋ฉฐ๋ณผ๊นŒ....? ์ƒ๊ฐ ์ค‘ !!

ํœด๊ฐ•์ผ์ด๋‚˜ ์ฃผ๋ง์— ๊ผญ ๋งŒ๋“ค์–ด ๋ด์•ผ๊ฒ ๋‹ค.

 

์ถœ์ฒ˜ : ๋ฉ‹์Ÿ์ด์‚ฌ์ž์ฒ˜๋Ÿผ