하나의 리스트에서 다른 리스트에 있는 모든 요소 제거하기

개발/파이썬

하나의 리스트에서 다른 리스트에 있는 모든 요소 제거하기

맨날치킨 2023. 2. 9. 09:05

Stack Overflow에 자주 검색, 등록되는 문제들과 제가 개발 중 찾아 본 문제들 중에서 나중에도 찾아 볼 것 같은 문제들을 정리하고 있습니다.

Stack Overflow에서 가장 먼저 확인하게 되는 가장 높은 점수를 받은 Solution과 현 시점에 도움이 될 수 있는 가장 최근에 업데이트(최소 점수 확보)된 Solution을 각각 정리하였습니다.

아래 word cloud를 통해 이번 포스팅의 주요 키워드를 미리 확인하세요.

Remove all the elements that occur in one list from another

하나의 리스트에서 다른 리스트에 있는 모든 요소 제거하기

문제 내용

Let's say I have two lists, l1 and l2. I want to perform l1 - l2, which returns all elements of l1 not in l2.

두 개의 리스트 l1과 l2가 있다고 가정해봐요. l1 - l2를 수행하여 l2에 없는 모든 l1 요소를 반환하려고 합니다.

I can think of a naive loop approach to doing this, but that is going to be really inefficient. What is a pythonic and efficient way of doing this?

이를 수행하기 위해 단순 루프 접근법을 생각할 수 있지만, 이는 매우 비효율적일 것입니다. 이를 위한 파이썬적이고 효율적인 방법은 무엇인가요?

As an example, if I have l1 = [1,2,6,8] and l2 = [2,3,5,8], l1 - l2 should return [1,6]

예를 들어, l1 = [1,2,6,8], l2 = [2,3,5,8]이면 l1 - l2는 [1,6]을 반환해야 합니다.

높은 점수를 받은 Solution

Python has a language feature called List Comprehensions that is perfectly suited to making this sort of thing extremely easy. The following statement does exactly what you want and stores the result in l3:

리스트 내포(List Comprehensions)라는 파이썬의 언어 기능을 사용하면 이러한 작업을 매우 쉽게 수행할 수 있습니다. 다음 문장은 원하는 작업을 수행하고 결과를 l3에 저장합니다.

l3 = [x for x in l1 if x not in l2]

l3 will contain [1, 6].

l3에는 [1, 6]이 포함될 것입니다.

가장 최근 달린 Solution

Using `filterfalse` without lambda-expression

람다 표현식 없이 filterfalse 사용하기

When using functions like filter or filterfalse and similar from itertools you can usually save performance by avoiding lambda-expressions and using already existing functions. Instances of list and set defines a __contains__-method to use for containment checks. The in-operator calls this method under the hood, so using x in l2 can be replaced by l2.__contains__(x). Usually this replacement is not really prettier but in this specific case it allows us to gain better performance than using a lambda-expression, when used in combination with filterfalse:

itertools에서 filter 또는 filterfalse와 같은 함수를 사용할 때 람다 표현식을 피하고 이미 존재하는 함수를 사용하여 성능을 향상시킬 수 있습니다. 리스트와 세트의 인스턴스는 포함 여부 확인에 사용할 contains 메서드를 정의합니다. in 연산자는 내부적으로 이 메서드를 호출하므로 x in l2는 l2. contains (x)로 대체될 수 있습니다. 일반적으로 이 대체는 그다지 예쁘지는 않지만, filterfalse와 결합하여 사용할 때 람다 표현식을 사용하는 것보다 성능을 더 향상시킬 수 있습니다.

>>> from itertools import filterfalse
>>> l1 = [1, 2, 6, 8]
>>> l2 = [2, 3, 5, 8]
>>> list(filterfalse(l2.__contains__, l1))
[1, 6]

filterfalse creates an iterator yielding all elements that returns false when used as an argument for l2.__contains__.

filterfalse는 l2.__contains__에 인수로 사용할 때 false를 반환하는 모든 요소를 생성하는 반복자를 만듭니다.

Sets has a faster implementation of __contains__ so even better is:

set은 __contains__의 빠른 구현을 가지므로 다음과 같이 set을 사용하는 것이 더 나은 성능을 발휘합니다.

>>> from itertools import filterfalse
>>> l1 = [1, 2, 6, 8]
>>> l2 = set([2, 3, 5, 8])
>>> list(filterfalse(l2.__contains__, l1))
[1, 6]

Performance

성능

Using list:

리스트 사용:

$  python3 -m timeit -s "from itertools import filterfalse; l1 = [1,2,6,8]; l2 = set([2,3,5,8]);" "list(filterfalse(l2.__contains__, l1))"
500000 loops, best of 5: 522 nsec per loop

Using set:

세트 사용:

$ python3 -m timeit -s "from itertools import filterfalse; l1 = [1,2,6,8]; l2 = set([2,3,5,8]);" "list(filterfalse(l2.__contains__, l1))"
1000000 loops, best of 5: 359 nsec per loop

출처 : https://stackoverflow.com/questions/4211209/remove-all-the-elements-that-occur-in-one-list-from-another