SCANZ 2023: Data democratisation panel notes

conferences stats-ed

Resources and useful links.

Liza Bolton true
11-20-2023

Event description

Notes and links to support the Data Democratisation panel, 2023 Science Communicators Association of New Zealand Conference: Techtopia: Navigating the power, potential and perils of technology in science communication.

November 16 & 17, 2023 in Te Whanganui-a-Tara Wellington, Aotearoa New Zealand.

A panel discussion following Karaitiana Taiuruā€™s keynote.

Speakers

Moved by data

ā€œIt is the mark of a truly intelligent person to be moved by statisticsā€
George Bernard Shaw, Bertrand Russell (kinda)

Oft mis-attributed to George Bernard Shaw (what an excellent quipper to be in company with!), this quote has been versioned and recycled across time. It was really about education and developing peopleā€™s sympathies to the point that abstract information about the world could be as moving, and thus motivating, as a really personal story from a specific person.

Intelligence is notoriously problematic to measure (probably because it is notoriously poorly definedā€¦), so Iā€™d propose a variationā€¦ā€œgood science communication can help all people be moved by statistics, or with the help of themā€.

ā€¦Not as catchy, guess Iā€™ll stick to my day job.

The statistical-thinking bogs

Post-panel note: I was thrilled to hear how many people found this a really useful framework!

The following is about students in intro stats classes, but I think it is also really helpful for checking in on your own thinking and anticipating how others may engage with your communication about data and models.

ā€¦students tend to enter and leave most introductory statistics courses thinking of statistics in one of at least two incorrect ways:

1. Students believe that statistics and mathematics are similar in that statistical problems have a single correct answer; an answer that tells us indisputable facts about the world we live in (Bog #1: overconfidence) (Nicholson & Darnton, 2003; Pfannkuch & Brown, 1996),

or,

2. Students believe that statistics can be ā€˜made to say anything,ā€™ like ā€˜magic,ā€™ and so cannot be trusted. Thus, statistics is viewed as disconnected and useless for scientific research and society (Bog #2: disbelief) (Martin, 2003; Pfannkuch & Brown, 1996).

Tintle, N., Chance, B., Cobb, G., Roy, S., Swanson, T., & VanderStoep, J. (2015). Combating Anti-Statistical Thinking Using Simulation-Based Methods Throughout the Undergraduate Curriculum. The American Statistician, 69(4), 362ā€“370. http://www.jstor.org/stable/24592138

šŸ“„ PDF: https://arxiv.org/pdf/1508.00543.pdf

Two AI images generated by DALLE 2 from the prompt "dramatic oil painting of a swamp"
Two AI images generated by DALLE 2 from the prompt ā€œdramatic oil painting of a swampā€. The left is labelled ā€˜bog of overconfidenceā€™ and the right is labelled ā€˜bog of disbeliefā€™.

Three archetypes of data stories

This didnā€™t come up in the talk itself, but for those communicating with and about data, I think you could claim that there are three data story archetypes. Most situations include parts of all three of these, but I think it can help to consider their interplay in the data youā€™re collecting, using or disseminating.

  1. Means to an end
  2. Means (to an end?)
  3. Means to an end

Means to an end

Data was needed and used to tell a story, but the data itself isnā€™t a big part of the story, just enabling answering the research question. If you could wave a magic wand that guaranteed the data was appropriately collected and measured for your research question, youā€™d not need to talk about it all. šŸŖ„ But of course, the worthiness of the research question itself is important, too! And Iā€™m honestly not sure this truly exists in the real world, I suppose neither of the first two archetypes really exist in full purityā€¦but folks certainly might present things like this.

šŸ‘Ž 5D approaches to indigenous data

Statistics have long been used as a tool for shaping the narrative about Indigenous People, communities, and nations through the use of 5 D dataā€”"a set of items related almost exclusively to measure Indigenous difference, disparity, disadvantage, dysfunction and deprivation". This pathologizing approach to data creation and analysis has led to dysfunctional policies that are then used to justify the need for more data focused on the 5 Ds and that shape dominant society's understanding of Indigenous People.

~ From Indigenous Data Sovereignty and Policy edited by Maggie Walter, Tahu Kukutai, Stephanie Russo Carroll, and Desi Rodriguez-Lonebear
Taylor & Francis, 2021

šŸ‘ Journos adding data notes

Iā€™m seeing more an more a little (or not so little) data notes portion at the end of articles. This is a great way to balance what flows in an article and isnā€™t the core story, with allowing folks who want to sink their teeth in further. To be fair these examples are very much data journalism, so these are the folks rocking it!

Measurement is hard

There is so much to this, but there is an incredible anecdote in the book The Victory Lab: The Secret Science of Winning Campaigns by Sasha Issenberg that also highlights why our STEM folks need exposure to ethics and humanities. It goes something like so: In a meeting on collecting data to try to assess the possible impacts of racist sentiments among voters on Obamaā€™s campaign, one of the suggestions from a hot shot data wonk from MIT (or something like that) was that they just ask people in surveys if they were racistā€¦then youā€™d know if they were racist. Yep, that is definitely a suggestion.

Means (to an end?)

This is when the data is the story itself. Whether related to privacy, consent, data sovereignty, etc., even without any particular ā€˜findingā€™, there are concerns/conversations about the data. Often this is because of the risks of misusing the data, but these donā€™t have to have eventuated.

Open data & government data

šŸ‘ Datasheets for explaining data

While the article is specifically about machine learning, the questions worked through when creating a datasheet are valuable for sharing any kind of dataset in an accessible and consistent way.

Gebru, Timnit, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal DaumĆ© III, and Kate Crawford. 2021. ā€œDatasheets for Datasets.ā€ Communications of the ACM 64 (12): 86ā€“92. https://doi.org/10.1145/3458723.

Datasheets for datasets are intended to address the needs of two key stakeholder groups: dataset creators and dataset consumers. For dataset creators, the primary objective is to encourage careful reflection on the process of creating, distributing, and maintaining a dataset, including any underlying assumptions, potential risks or harms, and implications of use. For dataset consumers, the primary objective is to ensure they have the information they need to make informed decisions about using a dataset. Transparency on the part of dataset creators is necessary for dataset consumers to be sufficiently well informed that they can select appropriate datasets for their chosen tasks and avoid unintentional misuse.

šŸ“„ Article PDF: https://arxiv.org/pdf/1803.09010.pdf

++ A worked example for the awesome textbook by Rohan Alexander, Telling Stories with Data https://tellingstorieswithdata.com/25-datasheet.html

Means to an end

Uh oh, missing, 404, data not found.

This might sounds strange coming from a statistician, but sometimes the most important data story is the story about where the data isnā€™t.Whatā€™s MISSING?

šŸ–¼ļø Check out the incredible mixed-media installation The Library of Missing Datasets (2016) by MIMI į»ŒNį»¤į»ŒHA. It is ā€œa physical repository of those things that have been excluded in a society where so much is collected.

ā€œMissing data setsā€ are the blank spots that exist in spaces that are otherwise data-saturated.ā€

A feminine hand with darker skin reaches into a filing cabinet with many files inside, each with tabbed labels. The hand is reaching for ā€œPublicaly available gun trace dataā€. The image is from MIMI į»ŒNį»¤į»ŒHAā€™s website and is of ā€˜The Library of Missing Datasetsā€™ (2016)

šŸ‘®ā€ā™€ļø This recent article by the marvellous Sapeer Mayron draws attention to a pattern of increasingly missing ethnicity data in police warrants. There are a lot of possible reasons we might see this from the plausibly positive (officers not assuming ethnicity where they might once have?), to the clerical (overly burdensome reporting mechanisms? quality data entry not rewarded/taught/checked?), to the problematic (pressure to minimise known disparities in warrants by ethnicity). In the end, we donā€™t know why the missing have increased, but Iā€™m glad to see this question being posed.

Other

šŸ“ŗ Watch

šŸ“š Read

We like patterns and certainty, and that can be a problem

Weā€™re either really good at seeing patterns where they may or may not exist, or at ignoring a more complex picture in favour of the false solidity of a single number.

Image description: A GIF with an scatter plot on the left and summary statistics on the right: X Mean, Y Mean, X SD, Y SD, Corr. These are all the same to two decimal places. The plot on the left alternates between a range of shapes, including a star, 9 dot clusters, parallel lines (vertical, horizontal, diagonals), and most importantly, a T-Rex.


Need a statistics buddy for something youā€™re working on? You can see more of my talks, including some of my general statistics communication on my Talks page.

Citation

For attribution, please cite this work as

Bolton (2023, Nov. 20). Liza Bolton: SCANZ 2023: Data democratisation panel notes. Retrieved from blog.lizabolton.com/posts/2023-11-15-scanz/

BibTeX citation

@misc{bolton2023scanz,
  author = {Bolton, Liza},
  title = {Liza Bolton: SCANZ 2023: Data democratisation panel notes},
  url = {blog.lizabolton.com/posts/2023-11-15-scanz/},
  year = {2023}
}