Convert GitHub API JSON to CSV
GitHub API responses are clean JSON arrays, but labels, assignees, and milestone fields are nested objects and arrays that don't map directly to CSV columns. Here's how to get a flat, usable spreadsheet from any GitHub endpoint.
GitHub's REST API returns issues, pull requests, commits, releases, and repository data as JSON arrays. Common use cases include project management exports, sprint reporting, open source contribution analysis, and release tracking. The main challenge is the nested structure: each issue has a user object, an array of label objects, an optional milestone object, and an optional assignee object. Flattening these correctly is the difference between a useful export and a mess.
Want to skip the code? Paste your JSON directly into the converter — it handles nested objects, arrays, and large files automatically.
Open JSON to CSV ConverterWhat the API returns
GitHub GET /repos/{owner}/{repo}/issues — single issue
[
{
"number": 142,
"title": "Fix pagination on mobile dashboard",
"state": "open",
"user": {
"login": "alicechen",
"id": 12345,
"type": "User"
},
"labels": [
{"id": 1, "name": "bug", "color": "d73a4a"},
{"id": 2, "name": "mobile", "color": "0075ca"}
],
"assignee": {
"login": "bobkumar"
},
"milestone": {
"number": 5,
"title": "v2.4",
"state": "open"
},
"comments": 3,
"created_at": "2025-02-10T11:42:00Z",
"updated_at": "2025-03-01T09:15:00Z",
"closed_at": null,
"body": "When navigating to page 2 on mobile..."
}
]Field mapping: JSON path → CSV column
| JSON path | CSV column | Notes |
|---|---|---|
| number | issue_number | The #142 displayed in GitHub UI |
| title | title | |
| state | state | open or closed |
| user.login | author | GitHub username of creator |
| labels[].name | labels | Join multiple labels with comma |
| assignee.login | assignee | null if unassigned |
| milestone.title | milestone | null if no milestone |
| comments | comment_count | Integer |
| created_at | created_at | ISO 8601 |
| updated_at | updated_at | ISO 8601 |
| closed_at | closed_at | null if still open |
The body field (issue description) can contain newlines and commas. The csv module wraps it in double quotes automatically — but omit it if recipients will open the CSV in Excel without proper quoting support.
Python conversion
Export all issues from a repository (handles pagination and rate limits)
import requests
import pandas as pd
import time
TOKEN = "ghp_..." # GitHub personal access token
OWNER = "torvalds"
REPO = "linux"
headers = {
"Authorization": f"Bearer {TOKEN}",
"Accept": "application/vnd.github+json",
"X-GitHub-Api-Version": "2022-11-28",
}
issues = []
page = 1
while True:
r = requests.get(
f"https://api.github.com/repos/{OWNER}/{REPO}/issues",
headers=headers,
params={"state": "all", "per_page": 100, "page": page},
)
r.raise_for_status()
# Check rate limit
remaining = int(r.headers.get("X-RateLimit-Remaining", 1))
if remaining < 5:
reset_at = int(r.headers.get("X-RateLimit-Reset", time.time() + 60))
wait = max(0, reset_at - time.time()) + 2
print(f"Rate limit low — waiting {wait:.0f}s")
time.sleep(wait)
batch = r.json()
if not batch:
break
issues.extend(batch)
page += 1
time.sleep(0.1) # be polite
print(f"Fetched {len(issues)} issues/PRs")
# Flatten
rows = []
for issue in issues:
rows.append({
"number": issue["number"],
"title": issue["title"],
"state": issue["state"],
"author": issue["user"]["login"] if issue.get("user") else "",
"labels": ", ".join(l["name"] for l in issue.get("labels", [])),
"assignee": issue["assignee"]["login"] if issue.get("assignee") else "",
"milestone": issue["milestone"]["title"] if issue.get("milestone") else "",
"comments": issue.get("comments", 0),
"is_pr": "pull_request" in issue, # PRs appear in issues endpoint
"created_at": issue.get("created_at", "")[:10],
"updated_at": issue.get("updated_at", "")[:10],
"closed_at": (issue.get("closed_at") or "")[:10],
})
df = pd.DataFrame(rows)
df.to_csv(f"{OWNER}_{REPO}_issues.csv", index=False, encoding="utf-8-sig")
print(f"Exported {len(df)} rows")Export pull requests with review status and merge info
import requests
import pandas as pd
TOKEN = "ghp_..."
OWNER = "your-org"
REPO = "your-repo"
headers = {
"Authorization": f"Bearer {TOKEN}",
"Accept": "application/vnd.github+json",
}
prs = []
page = 1
while True:
r = requests.get(
f"https://api.github.com/repos/{OWNER}/{REPO}/pulls",
headers=headers,
params={"state": "all", "per_page": 100, "page": page},
)
batch = r.json()
if not batch:
break
prs.extend(batch)
page += 1
rows = []
for pr in prs:
rows.append({
"pr_number": pr["number"],
"title": pr["title"],
"state": pr["state"],
"author": pr["user"]["login"] if pr.get("user") else "",
"base_branch": pr.get("base", {}).get("ref"),
"head_branch": pr.get("head", {}).get("ref"),
"draft": pr.get("draft", False),
"merged": pr.get("merged_at") is not None,
"labels": ", ".join(l["name"] for l in pr.get("labels", [])),
"reviewers": ", ".join(r["login"] for r in pr.get("requested_reviewers", [])),
"created_at": (pr.get("created_at") or "")[:10],
"merged_at": (pr.get("merged_at") or "")[:10],
"closed_at": (pr.get("closed_at") or "")[:10],
"additions": pr.get("additions", 0),
"deletions": pr.get("deletions", 0),
"changed_files": pr.get("changed_files", 0),
})
pd.DataFrame(rows).to_csv(f"{OWNER}_{REPO}_prs.csv", index=False, encoding="utf-8-sig")
print(f"Exported {len(rows)} pull requests")Common issues with this API
Pull requests appear in the issues endpoint
GitHub's /issues endpoint returns both issues and pull requests. PRs have a 'pull_request' key in the response; plain issues don't. Filter with 'pull_request' in issue to separate them.
issues_only = [i for i in all_issues if "pull_request" not in i]
prs_only = [i for i in all_issues if "pull_request" in i]Rate limit: 5,000 requests per hour with a token
Unauthenticated requests are limited to 60/hour. With a personal access token or GitHub App token, you get 5,000/hour. Check the X-RateLimit-Remaining header and back off before you hit zero — the API returns 403 when exhausted.
remaining = int(r.headers.get("X-RateLimit-Remaining", 1))
reset_time = int(r.headers.get("X-RateLimit-Reset", 0))
if remaining < 10:
import time
wait = max(0, reset_time - time.time()) + 5
time.sleep(wait)assignees is an array, assignee is the first one
GitHub issues can have multiple assignees. The 'assignee' field is the primary assignee (or null). The 'assignees' field is the full array. If you need all assignees, join the assignees array — not just the single assignee field.
"assignees": ", ".join(a["login"] for a in issue.get("assignees", []))Frequently asked questions
How do I export issues from a private repository?
Use a personal access token (classic) with the 'repo' scope, or a fine-grained token with Issues read permission for that repository. Pass it in the Authorization header: 'Bearer ghp_...'.
Can I export GitHub Projects data to CSV?
GitHub Projects (v2) uses the GraphQL API, not the REST API. The REST /issues endpoint gives you issue metadata but not project-specific fields like status, priority, or sprint. For project fields, use the GraphQL API to query ProjectV2Items.
Why do I get a 422 error when fetching more than 1000 issues?
GitHub's REST API limits pagination to 1000 results (100 per page × 10 pages) for some endpoints when using simple page-number pagination. Switch to using the since parameter with created or updated timestamps to page through more than 1000 records.
How do I export commit history to CSV?
Use GET /repos/{owner}/{repo}/commits. Each commit has a commit.author.name, commit.author.date, commit.message, and sha. The structure is different from issues — use json_normalize(commits, sep='.') to flatten it, then rename the dot-notation columns.