Deanonymizing Anonymous Users on Piazza Q&A

Unravel - Deanonymize anonymous Users on Piazza Q&A.

My TAship Experience

In Fall 2018, my grad school life began. I was accepted to Penn State with teaching assistantship offer for my first two semesters. I had experience as a TA from Sabanci University (bachelor’s) and I loved helping students. TAship would cover my tuition and my stipend, it was a wonderful opportunity. I became a TA for CMPSC461 Programming Language Concepts in Fall 2018.

Those two semesters have passed and I am going to admit that it was not always a breeze. I gave lectures frequently to a class of ~150 students. Tried to be available outside of my office hours for every kind of help students want. I have helped students’ questions in my office hours, prepared homework, held additional review recitations before midterms, and graded students’ papers.

Piazza

One thing I had no idea before my time at Penn State is Piazza. You probably know what Piazza is, but simply put:

Piazza is a Q&A website where students and instructors share questions, answers, and general notes for classes.

It definitely helped me as a TA. Almost half of the questions we get from students are similar in one way or the other and being able to redirect students to previous questions eliminates copy pasting same answers to several students and creating an isolated communication. Having questions and answers visible to everyone encourages students to participate in the discussion.

Anonymity on Piazza

Minimap

Another advantage of Piazza is the anonymity. You can allow two levels of anonymity in a Piazza class. Anonymous to Classmates and Anonymous to everyone. Anonymity increases student participation. Students ask questions without hesitation and try to write answers on their own without being scared of getting it wrong. And contrary to popular rumor around school, Anonymous to everyone really makes posts anonymous even for instructors.

But with great anonymity comes uncivil and/or troll posts. Whenever the class I was an instructor for gets an inappropriate anonymous post I wonder who wrote it.

User Statistics on Piazza

Another feature of Piazza is that instructors can see the participation statistics of students. These statistics are extremely detailed real-time information for students. They are probably extremely useful for instructors who grade participation. For example, the statistics page for CMPSC461 Spring 2019 looks like the following:

User Statistics

This table made me wonder, Is it possible to find the anonymous owner of the posts using the statistics? The answer is YES.

So what keeps an instructor from keeping track of user statistics and posts data to deanonymize anonymous users? Technically, if we have the user statistics data before and after every new post on Piazza (new question, followup discussion, answer), there would be only a single difference between the before and after statistics: the contributions count for the user who made the anonymous post.

Piazza API

User statistics data that Piazza has, is actually more detailed than the table above shows. Piazza does not have an official API but searching on GitHub I encountered hfaran’s Piazza API.

I created a Piazza instance by following the README in the repository and using get_statistics() on the instance, Piazza API returns a list of user dict1 in the following structure:

{
"user_id": "jw14z1uv4ik2jo",
"name": "User 1",
"email": "eralp@example.com",
"lti_ids": [],
"days": 2,
"posts": 52,
"asks": 11,
"answers": 16,
"views": 12
}

views, days, lti_ids, and user_id are not required to deanonymize posts, so we can get rid of those fields. Everything we need is in the remaining dict.

Unravel

I created a CLI application called Unravel as a proof of concept. Check out the README to see how to execute or check out the source code.

It can successfully deanonymize users for all kinds of posts a student can make on Piazza.

How does it work?

Unravel uses TinyDB2 for storing each statistics records retrieved from Piazza.
It compares user statistics every ~15 seconds, if a user creates an anonymous post while Unravel is tracking the class, it will find a difference between two user statistics. Then it retrieves the latest posts data and searches for the added/modified post by comparing the latest posts data with the one retrieved before the user made a post. After it finds the post that has been added/modified, it displays the changes to the user.

UserUnravelPiazzaTinyDBuser,pass,classuser_login(user, pass)Piazza instancenetwork(class)Piazza class instanceLogging and classconnection wassuccessful.TinyDB("users","posts")Creates <class>.jsondatabase.postdb and userdb instancesget_user_statistics()statistics JSONuserdb.insert(statistics)get_all_posts()posts JSONpostdb.insert(posts)Initial statistics in DB.get_user_statistics()statistics JSONuserdb.insert(statistics)We have two recordsof statistics nowwe can diff them.diff(userdb.get(0), userdb.get(1))get_all_posts()posts JSONpostdb.insert(posts)Find the modifiedpost by diffing twoposts json.diff(postdb.get(0), postdb.get(1))Deanonymized user and postpostdb.remove(0)opt[ Difference Found in User Statistics ]userdb.remove(0)Remove previousstatistics and setcurrent ones asprevious for next diff.loop[ Diffing Statistics (Every ~15 seconds) ]UserUnravelPiazzaTinyDB

Demo Video

Unravel

Check out the demo video in a class with 3 students. Accounts in top left, bottom left, and bottom right windows post anonymous questions, answers, followups and followup answers. You can see the findings of Unravel in the terminal.

Limitations

  • Unravel can only work with an instructor account since students can not see the detailed user statistics in the table above.
  • Since Piazza does not have an endpoint for batch fetching all posts, iterating over all posts to find the new post in the class does not scale well.
  • If multiple posts get updated in the time interval between two records, approximately 15 seconds for a class with ~250 posts, Unravel can not deduce the user. While this is definitely possible, in my experience this rarely happens for classes with less than ~200 students.

Footnotes


  1. Python Dictionary ↩︎

  2. TinyDB ↩︎