열 값을 기준으로 데이터 프레임에서 행을 선택하려면 어떻게 해야 합니까?

티스토리 뷰

개발/파이썬

열 값을 기준으로 데이터 프레임에서 행을 선택하려면 어떻게 해야 합니까?

맨날치킨 2022. 11. 28. 15:05

Stack Overflow에 자주 검색, 등록되는 문제들과 제가 개발 중 찾아 본 문제들 중에서 나중에도 찾아 볼 것 같은 문제들을 정리하고 있습니다.

Stack Overflow에서 가장 먼저 확인하게 되는 가장 높은 점수를 받은 Solution과 현 시점에 도움이 될 수 있는 가장 최근에 업데이트(최소 점수 확보)된 Solution을 각각 정리하였습니다.

아래 word cloud를 통해 이번 포스팅의 주요 키워드를 미리 확인하세요.

How do I select rows from a DataFrame based on column values?

열 값을 기준으로 데이터 프레임에서 행을 선택하려면 어떻게 해야 합니까?

문제 내용

How can I select rows from a DataFrame based on values in some column in Pandas?

pandas의 일부 열의 값을 기반으로 데이터 프레임에서 행을 선택하려면 어떻게 해야 합니까?

In SQL, I would use:

SQL에서는 다음을 사용합니다.

SELECT *
FROM table
WHERE column_name = some_value

높은 점수를 받은 Solution

To select rows whose column value equals a scalar, some_value, use ==:

열 값이 스칼라 some_value인 행을 선택하려면 ==:

df.loc[df['column_name'] == some_value]

To select rows whose column value is in an iterable, some_values, use isin:

열 값이 반복할 수 없는 some_values인 행을 선택하려면 isin을 사용합니다.

df.loc[df['column_name'].isin(some_values)]

Combine multiple conditions with &:

여러 조건을 &와 결합:

df.loc[(df['column_name'] >= A) & (df['column_name'] <= B)]

Note the parentheses. Due to Python's operator precedence rules, & binds more tightly than <= and >=. Thus, the parentheses in the last example are necessary. Without the parentheses

괄호를 확인합니다. Python의 연산자 우선 순위 규칙으로 인해 &는 <= 및 >=보다 더 엄격하게 바인딩됩니다. 따라서 마지막 예제의 괄호가 필요합니다. 괄호 없음

df['column_name'] >= A & df['column_name'] <= B

is parsed as

로 해석된다.

df['column_name'] >= (A & df['column_name']) <= B

which results in a Truth value of a Series is ambiguous error.

결과적으로 영상 시리즈의 진실 값은 모호한 오류입니다.

To select rows whose column value does not equal some_value, use !=:

열 값이 some_value와 같지 않은 행을 선택하려면 !=:

df.loc[df['column_name'] != some_value]

isin returns a boolean Series, so to select rows whose value is not in some_values, negate the boolean Series using ~:

isin은 boolean Series를 반환하므로 값이 some_values에 없는 행을 선택하려면 ~를 사용하여 boolean Series를 무효화합니다.

df.loc[~df['column_name'].isin(some_values)]

For example,

예를들면,

import pandas as pd
import numpy as np
df = pd.DataFrame({'A': 'foo bar foo bar foo bar foo foo'.split(),
                   'B': 'one one two three two two one three'.split(),
                   'C': np.arange(8), 'D': np.arange(8) * 2})
print(df)
#      A      B  C   D
# 0  foo    one  0   0
# 1  bar    one  1   2
# 2  foo    two  2   4
# 3  bar  three  3   6
# 4  foo    two  4   8
# 5  bar    two  5  10
# 6  foo    one  6  12
# 7  foo  three  7  14

print(df.loc[df['A'] == 'foo'])

yields

     A      B  C   D
0  foo    one  0   0
2  foo    two  2   4
4  foo    two  4   8
6  foo    one  6  12
7  foo  three  7  14

If you have multiple values you want to include, put them in a list (or more generally, any iterable) and use isin:

포함할 값이 여러 개인 경우 값을 목록(또는 더 일반적으로 사용 가능) 및 용도:

print(df.loc[df['B'].isin(['one','three'])])

yields

     A      B  C   D
0  foo    one  0   0
1  bar    one  1   2
3  bar  three  3   6
6  foo    one  6  12
7  foo  three  7  14

Note, however, that if you wish to do this many times, it is more efficient to make an index first, and then use df.loc:

그러나 여러 번 이 작업을 수행하려면 다음 작업을 수행하는 것이 더 효율적입니다. 인덱스를 먼저 만든 다음 df.loc를 사용합니다.

df = df.set_index(['B'])
print(df.loc['one'])

yields

       A  C   D
B              
one  foo  0   0
one  bar  1   2
one  foo  6  12

or, to include multiple values from the index use df.index.isin:

또는 인덱스에서 여러 값을 포함하려면 df.index.isin을 사용합니다.

df.loc[df.index.isin(['one','two'])]

yields

       A  C   D
B              
one  foo  0   0
one  bar  1   2
two  foo  2   4
two  foo  4   8
two  bar  5  10
one  foo  6  12

가장 최근 달린 Solution

SQL statements on DataFrames to select rows using DuckDB

DuckDB를 사용하여 행을 선택하는 DataFrames의 SQL 문

With DuckDB we can query pandas DataFrames with SQL statements, in a highly performant way.

DuckDB를 사용하면 성능이 뛰어난 방식으로 SQL 문으로 팬더 데이터 프레임을 쿼리할 수 있다.

Since the question is How do I select rows from a DataFrame based on column values?, and the example in the question is a SQL query, this answer looks logical in this topic.

열 값을 기반으로 데이터 프레임에서 행을 선택하려면 어떻게 해야 합니까?라는 질문과 질문의 예가 SQL 쿼리이므로 이 항목에서 이 대답은 논리적으로 보입니다.

Example:

예:

In [1]: import duckdb

In [2]: import pandas as pd

In [3]: con = duckdb.connect()

In [4]: df = pd.DataFrame({"A": range(11), "B": range(11, 22)})

In [5]: df
Out[5]:
     A   B
0    0  11
1    1  12
2    2  13
3    3  14
4    4  15
5    5  16
6    6  17
7    7  18
8    8  19
9    9  20
10  10  21

In [6]: results = con.execute("SELECT * FROM df where A > 2").df()

In [7]: results
Out[7]:
    A   B
0   3  14
1   4  15
2   5  16
3   6  17
4   7  18
5   8  19
6   9  20
7  10  21

출처 : https://stackoverflow.com/questions/17071871/how-do-i-select-rows-from-a-dataframe-based-on-column-values

'개발 > 파이썬' 카테고리의 다른 글

'for' 루프를 사용하여 Dictionary에서 반복하기 (0)	2022.11.28
"this" 모듈의 소스 코드는 무엇을 하고 있습니까? (0)	2022.11.28
Pandas에서 데이터 프레임의 행을 반복하는 방법 (0)	2022.11.28
pip가 패키지를 성공적으로 설치하지만 command line에서 실행 파일을 찾을 수 없습니다. (0)	2022.11.28
Requests 패키지를 사용할 때 SSL InsecurePlatform 오류 발생 (0)	2022.11.28

공지사항

최근에 올라온 글

개발자의 일상

티스토리 뷰