기존 Pandas 데이터 프레임에 새 열을 추가하기

티스토리 뷰

개발/파이썬

기존 Pandas 데이터 프레임에 새 열을 추가하기

맨날치킨 2022. 12. 7. 14:05

Stack Overflow에 자주 검색, 등록되는 문제들과 제가 개발 중 찾아 본 문제들 중에서 나중에도 찾아 볼 것 같은 문제들을 정리하고 있습니다.

Stack Overflow에서 가장 먼저 확인하게 되는 가장 높은 점수를 받은 Solution과 현 시점에 도움이 될 수 있는 가장 최근에 업데이트(최소 점수 확보)된 Solution을 각각 정리하였습니다.

아래 word cloud를 통해 이번 포스팅의 주요 키워드를 미리 확인하세요.

How to add a new column to an existing DataFrame?

기존 데이터 프레임에 새 열을 추가하는 방법은 무엇입니까?

문제 내용

I have the following indexed DataFrame with named columns and rows not- continuous numbers:

명명된 열과 행이 연속되지 않은 다음과 같은 색인화된 데이터 프레임이 있습니다.

          a         b         c         d
2  0.671399  0.101208 -0.181532  0.241273
3  0.446172 -0.243316  0.051767  1.577318
5  0.614758  0.075793 -0.451460 -0.012493

I would like to add a new column, 'e', to the existing data frame and do not want to change anything in the data frame (i.e., the new column always has the same length as the DataFrame).

기존 데이터 프레임에 새 열 'e'를 추가하고 데이터 프레임에서 아무것도 변경하고 싶지 않습니다(즉, 새 열의 길이는 항상 데이터 프레임과 동일합니다).

0   -0.335485
1   -1.166658
2   -0.385571
dtype: float64

How can I add column e to the above example?

위의 예에 e열을 추가하려면 어떻게 해야 합니까?

높은 점수를 받은 Solution

Edit 2017

2017년 편집

As indicated in the comments and by @Alexander, currently the best method to add the values of a Series as a new column of a DataFrame could be using assign:

코멘트와 @Alexander에서 알 수 있듯이, 현재 시리즈의 값을 데이터 프레임의 새 열로 추가하는 가장 좋은 방법은 assign을 사용하는 것입니다.

df1 = df1.assign(e=pd.Series(np.random.randn(sLength)).values)

Edit 2015
Some reported getting the SettingWithCopyWarning with this code.
However, the code still runs perfectly with the current pandas version 0.16.1.

2015년 편집 일부에서는 이 코드로 SettingWithCopyWarning이 발생했다고 보고했습니다. 그러나 이 코드는 현재 판다 버전 0.16.1에서도 완벽하게 실행된다.

>>> sLength = len(df1['a'])
>>> df1
          a         b         c         d
6 -0.269221 -0.026476  0.997517  1.294385
8  0.917438  0.847941  0.034235 -0.448948

>>> df1['e'] = pd.Series(np.random.randn(sLength), index=df1.index)
>>> df1
          a         b         c         d         e
6 -0.269221 -0.026476  0.997517  1.294385  1.757167
8  0.917438  0.847941  0.034235 -0.448948  2.228131

>>> pd.version.short_version
'0.16.1'

The SettingWithCopyWarning aims to inform of a possibly invalid assignment on a copy of the Dataframe. It doesn't necessarily say you did it wrong (it can trigger false positives) but from 0.13.0 it let you know there are more adequate methods for the same purpose. Then, if you get the warning, just follow its advise: Try using .loc[row_index,col_indexer] = value instead

Setting With Copy Warning은 데이터 프레임의 복사본에 잘못된 할당이 있을 수 있음을 알리는 것을 목적으로 합니다. 반드시 잘못했다고 말하지는 않지만(거짓 긍정을 유발할 수 있음) 0.13.0부터는 동일한 목적에 더 적합한 방법이 있다는 것을 알 수 있습니다. 그런 다음 경고를 받으면 다음과 같은 조언을 따르십시오. 대신 .loc[row_index,col_filename] = 값을 사용해 보십시오.

>>> df1.loc[:,'f'] = pd.Series(np.random.randn(sLength), index=df1.index)
>>> df1
          a         b         c         d         e         f
6 -0.269221 -0.026476  0.997517  1.294385  1.757167 -0.050927
8  0.917438  0.847941  0.034235 -0.448948  2.228131  0.006109
>>>

In fact, this is currently the more efficient method as described in pandas docs

사실, 이것은 현재 Pandas 문서에 설명된 것처럼 더 효율적인 방법입니다.

Original answer:

원래 답변:

Use the original df1 indexes to create the series:

원본 df1 인덱스를 사용하여 시리즈를 만듭니다.

df1['e'] = pd.Series(np.random.randn(sLength), index=df1.index)

가장 최근 달린 Solution

If we want to assign a scaler value eg: 10 to all rows of a new column in a df:

스칼라 값(예: 10)을 df의 새 열의 모든 행에 할당하려는 경우

df = df.assign(new_col=lambda x:10)  # x is each row passed in to the lambda func

df will now have new column 'new_col' with value=10 in all rows.

df는 이제 모든 행에서 값이 10인 새 열 'new_col'을 갖습니다.

출처 : https://stackoverflow.com/questions/12555323/how-to-add-a-new-column-to-an-existing-dataframe

'개발 > 파이썬' 카테고리의 다른 글

리스트에서 아이템 무작위로 선택하기 (0)	2022.12.08
Pandas에서 SettingWithCopyWarning을 처리하는 방법 (0)	2022.12.07
빈 Pandas 데이터프레임 만든 후 한 행씩 추가하기 (0)	2022.12.07
Pandas DataFrame 열 전체 리스트 가져오기 (0)	2022.12.06
데이터 프레임 열 순서 변경하기 (0)	2022.12.06

공지사항

최근에 올라온 글

개발자의 일상

티스토리 뷰