파일의 마지막 n줄 가져오기

티스토리 뷰

개발/파이썬

파일의 마지막 n줄 가져오기

맨날치킨 2022. 12. 30. 19:05

Stack Overflow에 자주 검색, 등록되는 문제들과 제가 개발 중 찾아 본 문제들 중에서 나중에도 찾아 볼 것 같은 문제들을 정리하고 있습니다.

Stack Overflow에서 가장 먼저 확인하게 되는 가장 높은 점수를 받은 Solution과 현 시점에 도움이 될 수 있는 가장 최근에 업데이트(최소 점수 확보)된 Solution을 각각 정리하였습니다.

아래 word cloud를 통해 이번 포스팅의 주요 키워드를 미리 확인하세요.

Get last n lines of a file, similar to tail

tail과 유사한 파일의 마지막 n줄 가져오기

문제 내용

I'm writing a log file viewer for a web application and for that I want to paginate through the lines of the log file. The items in the file are line based with the newest item at the bottom.

저는 웹 애플리케이션을 위한 로그 파일 뷰어를 작성하고 있으며 이를 위해 로그 파일의 행을 통해 페이지를 작성하고 있다. 파일의 항목은 맨 아래에 최신 항목이 있는 라인 기반입니다.

So I need a tail() method that can read n lines from the bottom and support an offset. This is hat I came up with:

그래서 저는 밑에서 줄을 읽을 수 있고 오프셋을 지원할 수 있는 tail() 방법이 필요합니다. 이게 제가 생각해낸 방법입니다:

def tail(f, n, offset=0):
    """Reads a n lines from f with an offset of offset lines."""
    avg_line_length = 74
    to_read = n + offset
    while 1:
        try:
            f.seek(-(avg_line_length * to_read), 2)
        except IOError:
            # woops.  apparently file is smaller than what we want
            # to step back, go to the beginning instead
            f.seek(0)
        pos = f.tell()
        lines = f.read().splitlines()
        if len(lines) >= to_read or pos == 0:
            return lines[-to_read:offset and -offset or None]
        avg_line_length *= 1.3

Is this a reasonable approach? What is the recommended way to tail log files with offsets?

이것이 합리적인 접근법인가요? 로그 파일에 오프셋을 추가하는 권장 방법은 무엇인가요?

높은 점수를 받은 Solution

This may be quicker than yours. Makes no assumptions about line length. Backs through the file one block at a time till it's found the right number of '\n' characters.

이것은 당신의 것보다 빠를지도 모릅니다. 라인 길이에 대한 가정을 하지 않습니다. 올바른 수의 '\n' 문자를 찾을 때까지 파일을 한 번에 한 블록씩 뒤로 이동합니다.

def tail( f, lines=20 ):
    total_lines_wanted = lines

    BLOCK_SIZE = 1024
    f.seek(0, 2)
    block_end_byte = f.tell()
    lines_to_go = total_lines_wanted
    block_number = -1
    blocks = [] # blocks of size BLOCK_SIZE, in reverse order starting
                # from the end of the file
    while lines_to_go > 0 and block_end_byte > 0:
        if (block_end_byte - BLOCK_SIZE > 0):
            # read the last block we haven't yet read
            f.seek(block_number*BLOCK_SIZE, 2)
            blocks.append(f.read(BLOCK_SIZE))
        else:
            # file too small, start from begining
            f.seek(0,0)
            # only read what was not read
            blocks.append(f.read(block_end_byte))
        lines_found = blocks[-1].count('\n')
        lines_to_go -= lines_found
        block_end_byte -= BLOCK_SIZE
        block_number -= 1
    all_read_text = ''.join(reversed(blocks))
    return '\n'.join(all_read_text.splitlines()[-total_lines_wanted:])

I don't like tricky assumptions about line length when -- as a practical matter -- you can never know things like that.

저는 라인 길이에 대한 까다로운 가정을 좋아하지 않습니다. 현실적인 문제로서, 그런 것들을 절대 알 수 없을 때 말이죠.

Generally, this will locate the last 20 lines on the first or second pass through the loop. If your 74 character thing is actually accurate, you make the block size 2048 and you'll tail 20 lines almost immediately.

일반적으로 루프를 통과하는 첫 번째 또는 두 번째 패스에서 마지막 20개의 라인을 찾습니다. 74개의 문자가 정확하다면, 블록 크기를 2048로 만들면, 거의 즉시 20줄을 따라가게 될 것입니다.

Also, I don't burn a lot of brain calories trying to finesse alignment with physical OS blocks. Using these high-level I/O packages, I doubt you'll see any performance consequence of trying to align on OS block boundaries. If you use lower-level I/O, then you might see a speedup.

또한, 저는 물리적인 OS 블록과의 정렬을 정교하게 하려고 노력하면서 뇌의 칼로리를 많이 소모하지 않는다. 이러한 높은 수준의 I/O 패키지를 사용하면 OS 블록 경계를 조정하는 데 따른 성능 결과를 확인할 수 없을 것입니다. 낮은 수준의 I/O를 사용하면 속도가 빨라질 수 있습니다.

UPDATE

갱신하다

for Python 3.2 and up, follow the process on bytes as In text files (those opened without a "b" in the mode string), only seeks relative to the beginning of the file are allowed (the exception being seeking to the very file end with seek(0, 2)).:

파이썬 3.2 이상의 경우 텍스트 파일(모드 문자열에서 "b" 없이 열린 파일)은 파일의 시작 부분에 대한 상대적인 탐색만 허용됩니다(예외는 seek(0, 2)):

eg: f = open('C:/.../../apache_logs.txt', 'rb')

예: f = open('C:/.../../apache_logs.txt', 'rb')

 def tail(f, lines=20):
    total_lines_wanted = lines

    BLOCK_SIZE = 1024
    f.seek(0, 2)
    block_end_byte = f.tell()
    lines_to_go = total_lines_wanted
    block_number = -1
    blocks = []
    while lines_to_go > 0 and block_end_byte > 0:
        if (block_end_byte - BLOCK_SIZE > 0):
            f.seek(block_number*BLOCK_SIZE, 2)
            blocks.append(f.read(BLOCK_SIZE))
        else:
            f.seek(0,0)
            blocks.append(f.read(block_end_byte))
        lines_found = blocks[-1].count(b'\n')
        lines_to_go -= lines_found
        block_end_byte -= BLOCK_SIZE
        block_number -= 1
    all_read_text = b''.join(reversed(blocks))
    return b'\n'.join(all_read_text.splitlines()[-total_lines_wanted:])

가장 최근 달린 Solution

The simplest way is to use deque:

가장 간단한 방법은 deque를 사용하는 것입니다:

from collections import deque

def tail(filename, n=10):
    with open(filename) as f:
        return deque(f, n)

출처 : https://stackoverflow.com/questions/136168/get-last-n-lines-of-a-file-similar-to-tail

'개발 > 파이썬' 카테고리의 다른 글

Error: " 'dict' object has no attribute 'iteritems' " 수정하기 (0)	2022.12.31
Python에서 모든 하위 디렉터리 가져오기 (0)	2022.12.31
웹 스크래핑의 HTTP 403 에러 수정하기 (0)	2022.12.29
데이터프레임 특정 셀의 값 가져오기 (0)	2022.12.29
데이터프레임의 두 열을 인자로 받는 람다 함수 만들기 (0)	2022.12.29

공지사항

최근에 올라온 글

개발자의 일상

티스토리 뷰