파이썬으로 Google 스프레드시트와 엑셀 데이터를 자동 동기화하기

IT·디지털 자동화 블로그 · 데이터 동기화이 글은 파이썬으로 Google 스프레드시트와 엑셀 파일을 자동 동기화하는 완전한 파이프라인을 구축한다. 서비스 계정 인증, 스키마 매핑, 증분 동기화, 충돌 해결 규칙, 속도 제한, 스케줄러 운영, 로깅까지 단계별 코드를 제공한다.

왜 자동 동기화인가

현업에서는 구글 시트로 실시간 협업을 하고, 엑셀로 보고서와 분석을 진행한다. 수동 복사·붙여 넣기는 누락과 포맷 깨짐을 유발한다. 자동 동기화를 적용하면 데이터 일관성, 이력 추적, 반복 업무 절감 효과를 즉시 얻을 수 있다.

환경 준비와 인증

패키지 설치

pip install gspread gspread-dataframe pandas openpyxl google-auth

폴더 구조

📁 sheets_excel_sync/
 ├─ creds/                   # 서비스 계정 JSON
 ├─ sync_config.yaml         # 시트ID, 워크시트명, 컬럼 매핑
 ├─ sync_oneway_gs_to_xlsx.py
 ├─ sync_oneway_xlsx_to_gs.py
 ├─ sync_two_way.py
 └─ outputs/                 # 동기화 산출물

Google Cloud에서 서비스 계정 키(JSON)를 발급하고, 대상 스프레드시트에 서비스 계정 메일을 편집 권한으로 공유한다.

조직 정책에 따라 개인 OAuth 대신 서비스 계정을 권장한다. 키 파일은 팀 드라이브에 노출되지 않도록 권한을 제한한다.

스키마 매핑과 필수 컬럼

동기화의 핵심은 키 칼럼과 수정 시각 컬럼이다. 예시 설정 파일은 다음과 같다.

# file: sync_config.yaml
google_sheet:
  spreadsheet_id: "1AbCdEf...시트ID..."
  worksheet: "DATA"
excel:
  path: "outputs/sheet_mirror.xlsx"
schema:
  key: "id"
  updated_at: "updated_at"   # ISO8601 문자열 권장
  columns:
    - id
    - name
    - category
    - value
    - updated_at

updated_at은 양방향 병합에서 충돌 해결 기준으로 사용된다. 단방향이라도 증분 동기화 시 유용하다.

단방향 동기화: 시트에서 엑셀로

# file: sync_oneway_gs_to_xlsx.py
import gspread, pandas as pd, yaml
from google.oauth2.service_account import Credentials

SCOPES = ["https://www.googleapis.com/auth/spreadsheets.readonly",
          "https://www.googleapis.com/auth/drive.readonly"]

def load_cfg():
    with open("sync_config.yaml","r",encoding="utf-8") as f:
        return yaml.safe_load(f)

def auth():
    creds = Credentials.from_service_account_file("creds/service_account.json", scopes=SCOPES)
    return gspread.authorize(creds)

if __name__ == "__main__":
    cfg = load_cfg()
    gc = auth()
    sh = gc.open_by_key(cfg["google_sheet"]["spreadsheet_id"])
    ws = sh.worksheet(cfg["google_sheet"]["worksheet"])
    data = ws.get_all_records()
    df = pd.DataFrame(data)[cfg["schema"]["columns"]]
    # 타입 보정
    df["updated_at"] = pd.to_datetime(df["updated_at"], errors="coerce")
    # 엑셀 저장
    out = cfg["excel"]["path"]
    with pd.ExcelWriter(out, engine="openpyxl") as w:
        df.to_excel(w, index=False, sheet_name="DATA")
    print("시트→엑셀 동기화 완료:", out)

시트 헤더와 설정 파일의 컬럼 목록을 일치시켜야 한다. 불일치 시 get_all_records의 키 이름을 재매 핑해 적용한다.

단방향 동기화: 엑셀에서 시트로

# file: sync_oneway_xlsx_to_gs.py
import gspread, pandas as pd, yaml
from gspread_dataframe import set_with_dataframe
from google.oauth2.service_account import Credentials

SCOPES = ["https://www.googleapis.com/auth/spreadsheets"]

def load_cfg():
    with open("sync_config.yaml","r",encoding="utf-8") as f:
        return yaml.safe_load(f)

def auth():
    from google.oauth2.service_account import Credentials
    creds = Credentials.from_service_account_file("creds/service_account.json", scopes=SCOPES)
    return gspread.authorize(creds)

if __name__ == "__main__":
    cfg = load_cfg()
    df = pd.read_excel(cfg["excel"]["path"], sheet_name="DATA")
    df["updated_at"] = pd.to_datetime(df["updated_at"], errors="coerce")

    gc = auth()
    sh = gc.open_by_key(cfg["google_sheet"]["spreadsheet_id"])
    ws = sh.worksheet(cfg["google_sheet"]["worksheet"])

    # 기존 데이터 덮어쓰기(단방향)
    ws.clear()
    set_with_dataframe(ws, df, include_index=False, include_column_header=True, resize=True)
    print("엑셀→시트 동기화 완료")

단방향에서는 clear 후 전체 재업로드가 간단하다. 대용량일 경우 범위 업데이트로 최적화하거나 변경분만 패치하도록 확장할 수 있다.

양방향 동기화: 타임스탬프 기반 병합

키와 updated_at 기준으로 최신 레코드를 선택해 병합한다. 충돌 정책은 시트 우선, 엑셀 우선, 최신 타임스탬프 우선 중 하나로 결정한다. 아래 예시는 최신 타임스탬프 우선 정책이다.

# file: sync_two_way.py
import gspread, pandas as pd, yaml
from gspread_dataframe import set_with_dataframe
from google.oauth2.service_account import Credentials

SCOPES = ["https://www.googleapis.com/auth/spreadsheets"]

def load_cfg():
    with open("sync_config.yaml","r",encoding="utf-8") as f:
        return yaml.safe_load(f)

def auth():
    creds = Credentials.from_service_account_file("creds/service_account.json", scopes=SCOPES)
    return gspread.authorize(creds)

def tidy(df, cols):
    out = df.copy()
    out = out[cols]
    out["updated_at"] = pd.to_datetime(out["updated_at"], errors="coerce")
    return out

if __name__ == "__main__":
    cfg = load_cfg()
    cols = cfg["schema"]["columns"]

    # 1) 시트 읽기
    gc = auth()
    sh = gc.open_by_key(cfg["google_sheet"]["spreadsheet_id"])
    ws = sh.worksheet(cfg["google_sheet"]["worksheet"])
    data = ws.get_all_records()
    gs = pd.DataFrame(data) if data else pd.DataFrame(columns=cols)

    # 2) 엑셀 읽기
    try:
        xl = pd.read_excel(cfg["excel"]["path"], sheet_name="DATA")
    except FileNotFoundError:
        xl = pd.DataFrame(columns=cols)

    gs, xl = tidy(gs, cols), tidy(xl, cols)

    # 3) 외부 소스 두 개 병합
    merged = pd.concat([gs, xl], ignore_index=True)
    merged.sort_values(["id","updated_at"], inplace=True)
    # 키별 최신 레코드만 유지
    latest = merged.groupby("id", as_index=False).tail(1)
    latest = latest[cols].sort_values("id").reset_index(drop=True)

    # 4) 결과를 시트와 엑셀에 반영
    ws.clear()
    set_with_dataframe(ws, latest, include_index=False, include_column_header=True, resize=True)

    with pd.ExcelWriter(cfg["excel"]["path"], engine="openpyxl") as w:
        latest.to_excel(w, index=False, sheet_name="DATA")

    print("양방향 동기화 완료: 레코드", len(latest))

필요하면 삭제 동기화(레코드 제거)를 위해 is_deleted 플래그 칼럼을 도입하고, true인 경우 두 쪽에서 숨기기 또는 제거하도록 정책을 확장한다.

속도 제한·에러 처리·로깅

gspread의 API 호출은 초당/분당 제한이 있다. 반복 업데이트 시 time.sleep을 적절히 배치한다.
네트워크 오류, 권한 변경, 헤더 불일치 등 예외를 잡아 로그 파일에 기록한다.
데이터 검증 규칙 예시: 키 중복 금지, 필수 컬럼 누락 방지, updated_at 미래 시각 금지.

# 간단 로깅 스니펫
import logging, os
logging.basicConfig(filename="sync.log", level=logging.INFO,
                    format="%(asctime)s %(levelname)s %(message)s")
logging.info("동기화 시작")

동기화 전후 레코드 수, 추가/변경/삭제 카운트를 로그에 남기면 운영 이슈 추적이 쉬워진다.

스케줄러 등록

Windows 작업 스케줄러

작업 만들기 → 트리거를 매 30분으로 설정
동작에 프로그램 시작 → python 경로와 sync_two_way.py 지정
사용자 로그온 여부와 무관하게 실행 옵션 활성화

macOS·Linux cron

*/30 * * * * /usr/bin/python3 /path/sheets_excel_sync/sync_two_way.py >> /path/sync_cron.log 2>&1

중요 변경이 잦은 문서라면 5분 주기를 고려하되, 호출 제한과 충돌 위험을 감안해 주기를 조정한다.

운영 체크리스트

서비스 계정이 대상 스프레드시트에 편집 권한으로 공유되어 있는가
키 컬럼과 updated_at 컬럼이 두 소스에서 동일한 스키마로 유지되는가
충돌 해결 정책(시트 우선, 엑셀 우선, 최신 우선)이 명시되어 있는가
로그 파일과 실패 재시도 전략이 마련되어 있는가
대용량 업데이트에서 범위 업데이트로 최적화했는가

다음 단계

필드 수준 권한(민감 데이터 마스킹), 변경 이력 테이블, 슬랙·메일 알림을 결합하면 엔터프라이즈급 데이터 허브로 확장할 수 있다. 또한 PDF 리포트 자동 생성과 배포까지 이어 붙여 완전 자동 운영을 구현해 보자.

다음 글 보러 가기: 변경 이력·감사 로그 대시보드 만들기

jbparkbill 님의 블로그