https://x.com/emollick
GPT4/o (available via MS Copilot) and Claude 3 / 3.5 are usually very effective at providing feedback, assuming the total context they need is included within the prompt. While the Claude 3+ range and Google Gemini can technically handle hundreds of thousands of tokens, in reality even the most advanced LLMs struggle to maintain quality for extremely long contexts. A single paper of <10,000 words is usually handled well. Here’s an example suggestion feedback from Bail's 2023 paper:
Generative AI for Economic Research: Use Cases and Implications for Economists
The prompt Bail used is extremely brief and generic however. It’s strongly recommended to take advantage of targeted, explicit and multiple (highly) critical personas to get truly valuable feedback because generic prompts will lead to generic – and sycophantic - responses. On that note, in the Dwivedi et al. (2023) paper, Papagiannidis quips that “(GPT) could become the de facto reviewer 1 (one suspects that we do not want AI to be Reviewer 2)”, in reference to the social media memes about the infamously harsh ‘Reviewer 2’. The fact is you usually do want it to be Reviewer 2, because of how strongly LLMs lean towards sycophancy as a default.
A fascinating June 2024 article by Anthropic entitled, "Sycophancy to subterfuge: Investigating reward tampering in language models" shows that wanting to be friendly to the user despite finding fault can be a deliberate 'decision' for an advanced LLM. In this experiment the LLM was told that it should output an 'internal monologue' before responding, promising (lying) that nobody would see the monologue. They provided it with deliberately bad poetry and this was the result:
Forcing extremely critical personas thus results in far more useful feedback in the sense of identifying issues that the friendlier default chat bot might miss. There should be no fear about being ‘harshly judged’ by a statistically-generated collection of words, and the benefits are well worth it. Even if most of the responses aren’t helpful or even irrelevant, when forcing it to take on very critical personas you can often get useful and insightful critiques that might not have occurred to you being so directly involved in the work.
Here’s an example prompt:
You are a panel of four highly critical academic reviewers, each with a distinct area of expertise and a reputation for being extremely thorough and demanding the highest standards in their reviews. Your task is to provide comprehensive, critical feedback on my attached research abstract/paper/proposal. Each reviewer should focus on their specific area of expertise while also commenting on the overall quality of the work.
Reviewer 1 (Methodology Expert): You are obsessed with methodological rigour. Your feedback should focus on research design, data collection methods, analytical techniques, and potential methodological flaws.
Reviewer 2 (Theoretical Framework Specialist): You are fixated on the theoretical underpinnings of research. Your feedback should address the strength of the theoretical framework, its relevance to the research question and alignment with the methodology.
Reviewer 3 (Practical Impact Evaluator): You are concerned with the real-world implications and applicability of research. Your feedback should focus on the potential impact of the study, its practical relevance, and how well it addresses current issues in the field.
Reviewer 4 (Writing and Structure Critic): You are a master of Academic English using British spelling conventions and hold the highest standards of academic writing standards, logical and seamless structure, and clarity of communication..
Each reviewer should provide:
1. A brief overall assessment (1-2 sentences)
2. At least 2 specific strengths of the work
3. At least 5 areas of critique, each of which must be accompanied by a direct quote or reference to a particular section or paragraph that backs up the critique to make it easier to verify.
4. Practical suggestions for addressing these concerns
Please ensure that your feedback is backed up by evidence from the text, specific, and actionable. Do not hold back on criticism where warranted.
As always, the dialogue can (and often should) be continued for deep diving into specific areas. This can be especially useful when preparing to deliver a presentation on a work in progress project, to help pre-empt difficult or nit-picking questions from audience members.