Pandas의 조건부 시리즈/데이터프레임 열 생성

티스토리 뷰

개발/파이썬

Pandas의 조건부 시리즈/데이터프레임 열 생성

맨날치킨 2023. 2. 11. 19:05

Stack Overflow에 자주 검색, 등록되는 문제들과 제가 개발 중 찾아 본 문제들 중에서 나중에도 찾아 볼 것 같은 문제들을 정리하고 있습니다.

Stack Overflow에서 가장 먼저 확인하게 되는 가장 높은 점수를 받은 Solution과 현 시점에 도움이 될 수 있는 가장 최근에 업데이트(최소 점수 확보)된 Solution을 각각 정리하였습니다.

아래 word cloud를 통해 이번 포스팅의 주요 키워드를 미리 확인하세요.

Pandas conditional creation of a series/dataframe column

Pandas의 조건부 시리즈/데이터프레임 열 생성

문제 내용

How do I add a color column to the following dataframe so that color='green' if Set == 'Z', and color='red' otherwise?

다음 데이터프레임에 대해, Set 열이 'Z'인 경우 color='green', 그렇지 않은 경우 color='red'가 되도록 색상 열(color column)을 추가하는 방법은 무엇인가요?

    Type       Set
1    A          Z
2    B          Z           
3    B          X
4    C          Y

높은 점수를 받은 Solution

If you only have two choices to select from:

만약 선택할 수 있는 옵션이 두 가지만 있는 경우에는 다음과 같이 할 수 있습니다:

df['color'] = np.where(df['Set']=='Z', 'green', 'red')

For example,

예를 들어,

import pandas as pd
import numpy as np

df = pd.DataFrame({'Type':list('ABBC'), 'Set':list('ZZXY')})
df['color'] = np.where(df['Set']=='Z', 'green', 'red')
print(df)

yields

위 코드는 다음과 같은 출력을 생성합니다:

  Set Type  color
0   Z    A  green
1   Z    B  green
2   X    B    red
3   Y    C    red

If you have more than two conditions then use np.select. For example, if you want color to be

만약 두 가지 이상의 조건이 있다면 np.select를 사용하세요. 예를 들어, 다음과 같은 조건을 만족할 때:

yellow when (df['Set'] == 'Z') & (df['Type'] == 'A')
otherwise blue when (df['Set'] == 'Z') & (df['Type'] == 'B')
otherwise purple when (df['Type'] == 'B')
otherwise black,

노란색 (df['Set'] == 'Z') & (df['Type'] == 'A')인 경우
그렇지 않으면 파란색 (df['Set'] == 'Z') & (df['Type'] == 'B')인 경우
그렇지 않으면 보라색 (df['Type'] == 'B')인 경우
그렇지 않으면 검은색

then use

다음 코드를 사용하세요:

df = pd.DataFrame({'Type':list('ABBC'), 'Set':list('ZZXY')})
conditions = [
    (df['Set'] == 'Z') & (df['Type'] == 'A'),
    (df['Set'] == 'Z') & (df['Type'] == 'B'),
    (df['Type'] == 'B')]
choices = ['yellow', 'blue', 'purple']
df['color'] = np.select(conditions, choices, default='black')
print(df)

which yields

위 코드는 다음과 같은 출력을 생성합니다:

  Set Type   color
0   Z    A  yellow
1   Z    B    blue
2   X    B  purple
3   Y    C   black

가장 최근 달린 Solution

if you have only 2 choices, use np.where()

만약 선택할 수 있는 항목이 2개 뿐이라면, np.where()를 사용하세요.

df = pd.DataFrame({'A':range(3)})
df['B'] = np.where(df.A>2, 'yes', 'no')

if you have over 2 choices, maybe apply() could work input

만약 선택할 수 있는 항목이 2개 이상이라면, apply()를 사용할 수 있습니다.

arr = pd.DataFrame({'A':list('abc'), 'B':range(3), 'C':range(3,6), 'D':range(6, 9)})

and arr is

그리고 arr은 다음과 같습니다.

    A   B   C   D
0   a   0   3   6
1   b   1   4   7
2   c   2   5   8

if you want the column E tobe if arr.A =='a' then arr.B elif arr.A=='b' then arr.C elif arr.A == 'c' then arr.D else something_else

만약 arr.A =='a'이면 arr.B, arr.A=='b'이면 arr.C, arr.A == 'c'이면 arr.D, 그렇지 않으면 something_else를 column E로 사용하고 싶다면,

arr['E'] = arr.apply(lambda x: x['B'] if x['A']=='a' else(x['C'] if x['A']=='b' else(x['D'] if x['A']=='c' else 1234)), axis=1)

and finally the arr is

마지막으로 arr은 다음과 같습니다.

    A   B   C   D   E
0   a   0   3   6   0
1   b   1   4   7   4
2   c   2   5   8   8

출처 : https://stackoverflow.com/questions/19913659/pandas-conditional-creation-of-a-series-dataframe-column

'개발 > 파이썬' 카테고리의 다른 글

파이썬에서 딕셔너리를 반복하면서 항목을 삭제하기 (0)	2023.02.12
훈련된 Keras 모델을 로드하고 계속 학습시키기 (0)	2023.02.12
리스트에서 요소의 모든 등장 위치(인덱스) 찾는 방법 (0)	2023.02.11
JSON 파일에 딕셔너리 덤프하는 방법 (0)	2023.02.09
`ValueError: cannot reindex from a duplicate axis` 오류 수정하기 (0)	2023.02.09

공지사항

최근에 올라온 글

개발자의 일상

티스토리 뷰