Like most sites, Reddit has a freely available API. Launched in 2005, this discussion site offers valuable data in its over 1MM subreddit channels. For those of you who want to mine this data, here is a step-by-step guide to using the API.
1) Click this link to set up a new app. If you are just trying to get some data, mark your app as a script.
2) Make sure you have PRAW installed. If not, the following will work for those who have pip:
pip install praw
PRAW stands for 'Python Reddit API Wrapper' and is a handy package for accessing Reddit's API using Python.
3) In a Jupyter Notebook, input the following:
import praw reddit = praw.Reddit(client_id='your_client_id', client_secret='your_client_secret', password='your_reddit_password', user_agent='testscript by /u/your_username', username='your_username')
You will find the client id and client secret immediately after creating your script app.
4) You're now ready to request data! The following code will print comments from the 25 'hot' posts on the python subreddit along with the user who made each comment.
submissions =  for submission in reddit.subreddit('python').hot(limit=25): submissions.append(submission) comments =  for s in submissions: print(s.title) submission = reddit.submission(id=s) for c in list(submission.comments): print(c.author, c.body)
Your output should look something like this:
/r/Python official Job Board! (Redditor(name='TOASTEngineer'), u"I've successfully delivered products to people a couple of times and I'm pretty comfortable with whatever technology you want me to use.\n\nI *generally* prefer doing native application\\backend stuff, and most of my experience is in tools, glue-code, and scripting. I'll do whatever you need me to do though, I'm happy as long as I can code.")
If you wanted to find the top commenters in a subreddit and take a look at their comments on various boards, you could do the following:
from collections import defaultdict submissions =  for submission in reddit.subreddit('python').hot(limit=10): submissions.append(submission) comments =  user_comment_counts = defaultdict(int) for s in submissions: submission = reddit.submission(id=s) for c in list(submission.comments): user_comment_counts[c.author] +=1 top_users =  for k,v in user_comment_counts.items(): if v>n: # n is the comment threshold top_users.append(k) print(k) for u in top_users: for comment in reddit.redditor(u.name).comments.new(limit=None): print(comment.body.split('\n', 1)[:250])
Tools: python, PRAW, Jupyter Notebook
Reddit Documentation: https://www.reddit.com/dev/api/