Microsoft's new research about how AI impacts critical thinking is deceptive

As of writing, Microsoft has recently released a new piece of field research regarding how AI impacts critical thinking, and I personally think it's deceptive.

For anyone curious, the original post can be found here, but my goal is to skim everything down, and provide my personal opinion.

The landing page

Before we actually get into the content of the whitepaper (also called the study), lets break down the landing page.

The Impact of Generative AI on Critical Thinking: Self-Reported Reductions in Cognitive Effort and Confidence Effects From a Survey of Knowledge Workers

Already, this title is incredibly revealing, whilst also being an uninformative attention grabber. In terms of the first claim, we see that the study involves a survey, where the results are self-reported. Now, what I'm about to say shouldn't come as a massive shock to anyone, but human's are infact not great at self-evaluations. It's why we have third-party evaluations, in everything from therapy to business decisions. We take a third party, and ask them to evaluate something based on their own perceptions. This avoids personal bias and myopia, that can very frequently cloud decision making. I personally think that the form of collection Microsoft has opted to use is incredibly innefective.

Secondly, as mentioned earlier, its an attention grabber. It immediately jumpts to how generative AI is affecting your ability to think critically. This is at the very least reductive, and at the other extreme, incredibly harmful.

Let me explain: If you have someone you trust tell you something, your significantly less likely to fact check them. Lets say your parent's tell you a lie, you'd probably believe them without a second doubt. With AI, your doing the same thing. Your presuming it wouldn't outright lie to you. Microsoft acts like this is a new revelation only prevelant with the use of AI, when infact, this is present everywhere, from social media and search engines, to your local newspaper and your family.

You presume the information is accurate, because it's all you've been given to go off from. Hell, i would argue that's a pretty normal thing to do. With this statement, Microsoft is trying to paint genAI as this boogeyman that you should hide your kids from. Maybe you should, but ultimately, telling your kid they cannot trust AI whatsoever is comparable to telling them not to trust their friends, and that's a pretty unhealthy attitiude.

The landing page's description

Okay, all of that evaluation from just the title, this is gonna be a long blog post. Let's move on to the inital description in the posting.

Microsoft says that they interviewed 319 knowledge workers, and they collectively shared 936 first-hand examples of using GenAI in work tasks. You might have alot of questions reading that, such as that's a small sample size, and whats a knowledge worker? These are perfectly reasonable questions that i also had when reading this for the first time. To answer the first question, your absoultely right! It's an incredibly small sample size. Whats more is that each of these knowledge workers shared on average 3 examples. Yeah, a number like 936 sounds like alot, but really isn't. Three examples isn't anywhere near enough to form a complete evaluation on a single person. This is likely maginified across each of these 319 knowledge workers.

Okay so on to question 2, what exactly is a knowledge worker? The answer is: they don't say. Wow, really informative whitepaper guys. Luckily, wikipedia has an answer for us:

Knowledge workers are workers whose main capital is knowledge. Examples include ICT professionals, physicians, pharmacists, architects, engineers, scientists, design thinkers, public accountants, lawyers, editors, and academics, whose job is to think for a living.

That clears things up atleast. A knowledge worker is someone who's value as labour is derived from what they know. Simple. Could've been useful to say that in your whitepaper Microsoft.

Now, Microsoft goes on to say

Specifically, higher confidence in GenAI is associated with less critical thinking, while higher self-confidence is associated with more critical thinking.

Now, to someone who's new to the space, that could sound scary, but it really isn't. Again, all they're saying is People who trust AI more are less likely to fact check it, and vice versa. This should be painfully obvious to anyone who actually thinks about what Microsoft is saying. They're trying to paint this picture of AI being this tool that's killing your ability to think critically, but it really isn't.

Okay, nearly done with the posting's description, and then we'll move onto the white paper itself. For this last point, we can look towards the quote Qualitatively, GenAI shifts the nature of critical thinking toward information verification, response integration, and task stewardship. This is an attempt to describe the AI workflow. You prompt the AI, you get information, you verify it, you integrate it. What some readers might not understand is that this workflow is present everywhere. You wanna code an algorithm implementation? Well, you think about it, draft an implementation, you query the internet to see if its any good, and then you write it. You wanna perform a surgery? You think about it, plan out a surgery, make sure its backed by logic, and then you perform it. You wanna pursue a legal case? You think about, you make a list of points, you make sure its backed by evidence, and then you go to court.

Microsoft is attempting to sell you the idea that GenAI wildely changes the way that humans are productive, when it just doesn't. For the moment atleast, if you wanna get any serious work done with GenAI, alot of the work has to be done yourself. It's not this magic box Microsoft and other AI companies so desperately want their investors to believe it is.

Discussing the content of the whitepaper

Okay, description over. Now onto the whitepaper. This blog post is already getting a bit long, so I'll be picking out specific points of interest, and then explaining them, along with my opinion. Your more than welcome to read the whitepaper yourself, but that's not what most people came here to do.

We're gonna skip the entire first section of the paper. This section effectively states what generative AI is, and what Microsoft is intending to prove with this whitepaper.

Onto the second section, where we can see the following quote:

Generative AI tools like Copilot and Chat-GPT can boost writing productivity by assisting with tasks such as content generation, idea creation, and stylistic editing, helping both expert and novice writers.

Okay, so we've just been warned that GenAI is actively imparing your ability to think critically, and now Microsoft is... praising it? Like, this is literally the first line in 2.3, titled Effects of automation on thinking and knowledge workflows: writing and memory. So, whereas the inital title and description would leave the reader to believe that GenAI is bad, apparently it isn't? Also, you might notice that AI's Microsoft refferences are specifically Copilot and Chat-GPT. I'll save you reading the paper yourself, and tell you that these are the only AI systems Microsoft actively refferences. So, your telling me, that a company that either outright owns these AI's or have major shares in them, is saying that these systems aren't so bad after all? Doesn't sound very unbiased and professional to me.

Onto the next paragraph, where Microsoft talks about Effects on memory. Here's a reduced quote from the paper:

Re-search shows that [...] real-world summary writing is often passive and ineffective [...]. GenAI tools like ChatGPT and Copilot can mitigate these drawbacks, [...] by providing high-quality summaries upon which collaborative, self-monitored writing tasks can be conducted.

Okay, so i know that's alot of shortening, but it's enough to effectively convey what Microsoft is saying. Again, saying that tools like ChatGPT and Copilot (yes, they use two different ways of writing ChatGPT / Chat-GPT in this paper) are actually really good for these roles where knowledge is the main capital. It'd be nice to hear about models like Claude 3.5 Sonnet, who are often rated as the best models available for programming.

How Microsoft conducted the research

Next up, we get to hear about how Microsoft actually conducted this research in section 3

we conducted an online survey on the Prolific platform to study knowledge workers’ experiences with critical thinking when using GenAI tools for their work.

Now, if your like me, you've never heard of this platform before right now. Here's a link to Prolific's website. Off the bat, there's not much that's suspicious, other than the data being propietary, because of course it is. However, if we scroll down to the very bottom of this very short page, we can see them talking about AI development. Okay, so it's a company that gives out propietary data in the form of interviews... and also lets you train AI? Again, this seems like a conflict of interest. This paper is supposed to show the effect of GenAI on critical thinking capabilities, but they get their data from a propietary pool that's heavily invested in AI. Really struggling to see where the impartial judgement comes in here

The rest of section 3 is mostly dedicated to describing in detail how Microsoft conducted their study. I've already given you the gist of it, and the actual spcific questions aren't that relevant to my arguement. Your more than welcome to read it at your own leisure, but again, that's not what most people are here for.

Microsoft's results and findings

Additionally, section 4 and section 5 are dedicated to their specific results. I'm only going to very briefly summarise what these two sections say. In section 4, they analyse the stages people go through when interacting with GenAI. I've already made my point about this line of reasoning, you don't need to hear the same ramble again. They then repeat the same talking point about how higher confidence in GenAI is associated with less critical thinking, while higher self-confidence is associated with more critical thinking.

Section 5 goes over more specifics regarding cognitive effort in critical thinking, according to Bloom’s taxonomy. This is how Microsoft is able to nit pick specific qualities that GenAI impacts regarding one's ability to think critically. It defines critical thinking as the sum of Knowledge, Comprehension, Application, Analysis, Synthesis and Evaluation. While I don't completely disagree with the fact that these elements contribute to critical thinking, it doesn't accurately reflect the public's perception of critical thinking.

When someone says "I need to engage in critical thinking", what does your brain jump to? I guarantee that alot of readers thought "I need to think hard", or something to that effect. One of the major problems i have with the offical listing for this whitepaper is that it insinuates to the average person that GenAI is making you dumber. They do this by using an assertive tone, and providing "scientific evidence" to back it up. In reality, this isn't true, and isn't what Microsoft is telling investors and other academics.

Doing this allows them to control the narative. It lets them tell your average person one thing, but academics and investors something quite different. This can be a small seed that sows discourse between people, when realistically, they've been fed a lie.

I know this is now a really long article, but stick with me here, I'm not too far off finishing.

Self-admited limitations

Subsections 6.1 and 6.2 don't contain anything we haven't covered already, but subsection 6.3 does. This subsection is titled Limitations, where Microsoft goes over some specific hurting points for their research. According to Microsoft, participants occasionally conflated reduced effort in using GenAI with reduced effort in critical thinking with GenAI. What this means is that just because someone is "being more lazy when using GenAI", doesn't corelate to the amount of critically thinking they're doing. I can't disagree with them here.

Secondly, they have this to say

Secondly, we assess users’ subjective task confidence following prior work on AI-assisted decision-making. Still, one’s subjective self-confidence may not always be well-calibrated with respect to objective expertise on tasks.

Now, don't be fooled, this isn't Microsoft agreeing with me about people's self-evaluations during AI usage. It's about the subjects perceptions of their work prior. Again, I do agree with them, but this still doesn't adress one of my major concerns with this posting and whitepaper.

Thirdly, our survey was conducted exclusively in English, with participants required to be fluent English speakers. This approach ensured consistency in data collection and feasibility of analysis by our English-speaking research team, but has no representation of non-English speaking populations or multilingual contexts.

Suprise suprise, I don't disagree with them stating their limitations.

Fourthly, our sample was biased towards younger, more tech-nologically skilled participants who regularly use GenAI tools at work at least once per week. This demographic skew may not fully represent the broader population of knowledge workers, potentially overlooking the experiences and perceptions of older or less tech-oriented professionals.

This part hurts. This entire paper uses averages and bias to persuade you into thinking that what Microsoft is saying is absoulte. They are once again correct that this is a limitation, but it completely ignores the larger problem: People are inherently different. Just because Microsoft said that 60% of 319 people experienced lower levels of critical thinking, doesn't mean that it's the case as a whole. Of course, they do insinuate that in their original posting.

Lastly, GenAI tools are constantly evolving, and the ways in which knowledge workers interact with these technologies are likely to change over time. We adopted the task taxonomy [...] to capture [...] characteristics [...] without overcomplicating our explanatory models.

Yeah, hard agree here. If their posting and study wasn't ridddled with issues and deceit, it would be outdated in less than a year.

My conclusion

I think this paper is incredibly flawed. I also think that the original posting is deceptive. Everything from Microsoft's data collection, to how they present their "evidence", to the evaluations, are either flawed, deceitful, or are a ploy to make investors happy. You shouldn't any whitepaper at face value, and always engage in critical thinking to determine your own opinion.

My suggestions

Bet you weren't expecting this to be a section. While i do think this whitepaper is flawed, I can see the potential value in research like this. The long-term effects of GenAI are relatively unknown, and should be progressively documented. Here are my suggestions to Microsoft, and any other entity wishing to do this sort of evaluation.

Don't refference technology your highly invested in.
- Can't really believe I have to say this, but saying that your GenAI's are "less likely to decrease critical thinking" is a huge giveaway that your biased.
Don't use propietary pools of specially selected individuals.
- If you haven't randomly selected these individuals first-hand, you can't guarantee that they're not biased.
Use a more quantitative collection method.
- Something like using "Needle in a Haystack" techniques could be useful here, where you purposefully polute the GenAI's responses with a trackable output, and see if people notice.
Use more neutral language
- People already have a perception regarding the term "Critical Thinking", and it doesn't align with what the study actually want's to measure.