Pandas DataFrame를 딕셔너리로 변환하기

티스토리 뷰

개발/파이썬

Pandas DataFrame를 딕셔너리로 변환하기

맨날치킨 2023. 2. 14. 09:05

Stack Overflow에 자주 검색, 등록되는 문제들과 제가 개발 중 찾아 본 문제들 중에서 나중에도 찾아 볼 것 같은 문제들을 정리하고 있습니다.

Stack Overflow에서 가장 먼저 확인하게 되는 가장 높은 점수를 받은 Solution과 현 시점에 도움이 될 수 있는 가장 최근에 업데이트(최소 점수 확보)된 Solution을 각각 정리하였습니다.

아래 word cloud를 통해 이번 포스팅의 주요 키워드를 미리 확인하세요.

Convert a Pandas DataFrame to a dictionary

Pandas DataFrame를 딕셔너리로 변환하기

문제 내용

I have a DataFrame with four columns. I want to convert this DataFrame to a python dictionary. I want the elements of first column be keys and the elements of other columns in same row be values.

네 개의 열(column)이 있는 DataFrame이 있습니다. 이 DataFrame을 파이썬 딕셔너리로 변환하고 첫 번째 열의 요소를 키(key)로, 해당 행의 다른 열의 요소를 값(value)으로 사용하고 싶습니다.

DataFrame:

    ID   A   B   C
0   p    1   3   2
1   q    4   3   2
2   r    4   0   9

Output should be like this:

출력은 다음과 같이 나와야 합니다:

Dictionary:

{'p': [1,3,2], 'q': [4,3,2], 'r': [4,0,9]}

높은 점수를 받은 Solution

The to_dict() method sets the column names as dictionary keys so you'll need to reshape your DataFrame slightly. Setting the 'ID' column as the index and then transposing the DataFrame is one way to achieve this.

to_dict() 메소드는 열(column) 이름을 딕셔너리 키(key)로 설정하므로 DataFrame을 약간 재구성해야 합니다. 'ID' 열을 인덱스로 설정한 다음 DataFrame을 전치(transpose)하는 것은 이를 달성하는 한 가지 방법입니다.

to_dict() also accepts an 'orient' argument which you'll need in order to output a list of values for each column. Otherwise, a dictionary of the form {index: value} will be returned for each column.

to_dict()는 'orient' 인수도 허용하는데, 각 열(column)에 대한 값 목록을 출력하려면 이 인수가 필요합니다. 그렇지 않으면 각 열(column)에 대해 {인덱스: 값} 형태의 딕셔너리가 반환됩니다.

These steps can be done with the following line:

이러한 단계는 다음과 같은 한 줄로 수행할 수 있습니다:

>>> df.set_index('ID').T.to_dict('list')
{'p': [1, 3, 2], 'q': [4, 3, 2], 'r': [4, 0, 9]}

In case a different dictionary format is needed, here are examples of the possible orient arguments. Consider the following simple DataFrame:

만약 다른 딕셔너리 형식이 필요한 경우, 가능한 'orient' 인수의 예제가 여기에 있습니다. 다음과 같은 간단한 DataFrame이 있다고 가정해 봅시다:

>>> df = pd.DataFrame({'a': ['red', 'yellow', 'blue'], 'b': [0.5, 0.25, 0.125]})
>>> df
        a      b
0     red  0.500
1  yellow  0.250
2    blue  0.125

Then the options are as follows.

그러면 옵션은 다음과 같습니다.

dict - the default: column names are keys, values are dictionaries of index:data pairs

dict - 기본값: 열(column) 이름이 키(key)이고 값은 index:data 쌍의 딕셔너리입니다.

>>> df.to_dict('dict')
{'a': {0: 'red', 1: 'yellow', 2: 'blue'}, 
 'b': {0: 0.5, 1: 0.25, 2: 0.125}}

list - keys are column names, values are lists of column data

list - 키는 열(column) 이름이고 값은 열(column) 데이터의 목록입니다.

>>> df.to_dict('list')
{'a': ['red', 'yellow', 'blue'], 
 'b': [0.5, 0.25, 0.125]}

series - like 'list', but values are Series

series - 'list'와 비슷하지만 값이 Series입니다.

>>> df.to_dict('series')
{'a': 0       red
      1    yellow
      2      blue
      Name: a, dtype: object, 

 'b': 0    0.500
      1    0.250
      2    0.125
      Name: b, dtype: float64}

split - splits columns/data/index as keys with values being column names, data values by row and index labels respectively

split - 키는 열(column)/데이터(index)입니다. 값은 열(column) 이름/데이터 행과 인덱스 라벨입니다.

>>> df.to_dict('split')
{'columns': ['a', 'b'],
 'data': [['red', 0.5], ['yellow', 0.25], ['blue', 0.125]],
 'index': [0, 1, 2]}

records - each row becomes a dictionary where key is column name and value is the data in the cell

records - 각 행이 열 이름을 키(key)로, 셀(cell)의 데이터를 값(value)으로 가지는 딕셔너리가 됩니다.

>>> df.to_dict('records')
[{'a': 'red', 'b': 0.5}, 
 {'a': 'yellow', 'b': 0.25}, 
 {'a': 'blue', 'b': 0.125}]

index - like 'records', but a dictionary of dictionaries with keys as index labels (rather than a list)

index - 'records'와 유사하지만, 인덱스 라벨이 키(key)인 딕셔너리의 딕셔너리가 됩니다. (리스트 대신)

>>> df.to_dict('index')
{0: {'a': 'red', 'b': 0.5},
 1: {'a': 'yellow', 'b': 0.25},
 2: {'a': 'blue', 'b': 0.125}}

가장 최근 달린 Solution

Most of the answers do not deal with the situation where ID can exist multiple times in the dataframe. In case ID can be duplicated in the Dataframe df you want to use a list to store the values (a.k.a a list of lists), grouped by ID:

대부분의 답변은 DataFrame에서 ID가 중복될 수 있는 상황을 다루지 않습니다. DataFrame df에서 ID가 중복될 수 있는 경우 값(즉, 리스트)을 저장하는 데 리스트(리스트의 리스트)를 사용해야 합니다. 이때 ID를 그룹화합니다:

{k: [g['A'].tolist(), g['B'].tolist(), g['C'].tolist()] for k,g in df.groupby('ID')}

출처 : https://stackoverflow.com/questions/26716616/convert-a-pandas-dataframe-to-a-dictionary

'개발 > 파이썬' 카테고리의 다른 글

Python의 argparse.Namespace()을 dictionary로 처리하기 (0)	2023.02.14
Python에서 변수가 dictionary인지 확인하는 방법 (0)	2023.02.14
파일이 없으면 새 파일에 쓰고, 있으면 추가로 쓰기 (0)	2023.02.13
두 값 사이에 있는 숫자로 이루어진 리스트 만들기 (0)	2023.02.13
파이썬 딕셔너리를 문자열로 변환하고 다시 되돌리는 방법 (0)	2023.02.13

공지사항

최근에 올라온 글

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

개발자의 일상

티스토리 뷰