By: Camille Lauren Russler, Julia Barrozo
Impressionism & Expressionism Digital Text Analysis Project
Integration of LLMs into the Bowdoin College Art History Department
Functional Description – Abstraction
Large language processing models are artificially intelligent models which take in vast amounts of textual data and employ algorithms to understand, generate, and manipulate human language with a level of complexity and nuanced differences. For example when an LLM such as “ChatGPT” does something like write an essay, what it’s essentially doing is just asking over and over again “given the text so far, what should the next word be?”– and each time adding a word.” (Wolfram 8) An element of variable probabilities of the next word is ingrained in the system to produce a degree of randomness so no response is exactly the same. LLMs are “fundamentally trying to do is to produce a “reasonable continuation” of whatever text it’s got so far” (Wolfram 8.) There are many possible next best word “tokens” produced and they are based on the different weights and temperatures encoded into the model of the LLM to produce variable responses. The essay On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? Highlights how we understand LLMs “to refer to systems which are trained on string prediction tasks: that is, predicting the likelihood of a token (character, word or string) given either its preceding context or (in bidirectional and masked LMs) its surrounding context. Such systems are unsupervised and when deployed, take a text as input, commonly outputting scores or string Predictions” (Bender 611). These techniques utilized within LLMs in essence make predictions and account for all the training data when choosing the next best word when the model is prompted.
Some of the techniques which are employed by LLMs are natural language processing models like imitation learning or interactive learning which incorporate human guidance, enabling agents to learn from real-time feedback. Some of these techniques include word embeddings are representations of words in a continuous vector space, capturing semantic meaning relationships. These possible relationships are determined “At the first step there are a lot of possible “next words” to choose from, though their probabilities fall off quite quickly” (Wolfram 11.) But where do probabilities come from? These probabilities come from the model assigning characteristics and associated probabilities based on given text examples, an example used in the Wolfram article on “What_is_ChatGPT_Doing” is how a model generates English text from one letter. At first in their model of producing random character associations they did not actually produce real words, to go further the model needs tangible restrictions “for example, we know that if we have a “q”, the next letter basically has to be “u”.” (Wolfram 15). Attention mechanisms allow models to focus on relevant parts of input, improving understanding of dependencies, particularly in natural language processing, NLP tasks. RLHF is a reinforcement learning framework where agents learn from human feedback, enhancing training efficiency and effectiveness, especially in complex environments. It illustrates the capabilities of LLMs to understand and generate natural language text using semantic relationships encoded within a large corpora of text data. Overall, these concepts play critical roles in modern machine learning, facilitating better understanding of language, improved model performance, and more efficient learning processes.
Artifacts
The LLM artifacts such as Gemini, Pi.ai, Claude, and ChatGPT are all programs and AI which now interpret text and provide feedback on virtually any language based prompt you provide the model. Essentially “The big idea is to make a model that lets us estimate the probabilities with which sequences should occur- even though we’ve never explicitly seen those sequences in the corpus of text we’ve looked at.”(Wolfram 18). ChatGPT, developed by OpenAI, stands as a central program for conversational AI, rooted in the language models from GPT-1 to the latest GPT-3 iterations. It aims to simulate human-like interactions, emphasizing natural and engaging conversation. Its prowess lies in its adeptness at understanding context, generating coherent responses, and seamlessly adapting to diverse conversational styles. Gemini, produced by Microsoft Research, addresses the need for AI systems to support software developers in tasks such as code completion and bug detection.(Gemini App) With its understanding of programming languages and syntax, Gemini excels in generating precise code snippets, thus significantly augmenting software development workflows. Pi.ai specializes in AI-driven content generation, catering to diverse domains like article writing, blog posts, and marketing copy. Using advanced language generation techniques, Pi.ai boasts the capacity to produce high-quality content at scale while maintaining consistency in tone and style, tailored to specific audience preferences. Claude, an academic-focused language model developed by researchers at a leading university, Claude is finely attuned to the nuances of academic discourse, offering invaluable assistance in literature review, summarization, and citation generation. Its integration with academic databases and tools ensures seamless access to research literature, significantly enhancing scholarly writing processes.(Claude’s Constitution)
Relevance-Aware Generative Models (RAGs) are a class of large language models designed to generate responses or outputs conditioned on specific queries which are now embedded into LLMs and AI such as Chat GPT. The basic way in which this functions is through uploading a specific corpus or file of a document or PDF into a LLM platform and then the AI can answer specific questions based on the article or paper presented by the user into the model. In the academic context, RAGs can be incredibly useful for, Literature Review, Question Answering, Citation Generation. RAGs can assist researchers in conducting literature reviews by generating summaries of relevant papers, identifying key concepts, and highlighting connections between different studies. Answer academic questions by synthesizing information from various sources and providing concise responses. This can be particularly helpful for students seeking clarification on objective art historical topics or researchers looking for insights. RAGs can generate citations in various citation styles (APA, MLA, etc.) based on input text or reference details. This streamlines the citation process for academic writing and ensures adherence to formatting guidelines.
Architecture
The advent of AI heightens the need to set principles and values for the development of AI and the training data sets which the models are trained by. Using equitable sources and data which adheres to copyright and intellectual property law is essential for the fair use and proper development of these models. Anthropic, an artificial intelligence company, created an AI model called Claude. Claude’s capabilities include advanced reasoning, visual analysis, code generation, and multilingual processing similar to other language models such as Chat GPT, Anthropic has taken deliberate steps to keep transparency through their development process and has incorporated the ideas of a paper that Anthropic published. The Claude model has its own constitution that puts into practice the ideas mentioned in the paper. Such as the company taking an open approach and incorporating feedback as a means of democratizing AI. They are doing this on their own, and it is not the same approach that other AI developers are taking. This raises questions about the democratization of AI and which voices are included in the conversation about how these models function.
This industry is unique in its far-reaching impact on society Anthropic’s approach to AI development, as demonstrated through the Claude model and its accompanying constitution, is characterized by a commitment to transparency, accountability, and inclusivity. By utilizing a constitution that outlines guiding principles for the model’s behavior attempting to ensure that ethical considerations are embedded into its decision-making process. However, there is no way to guarantee human oversight of these programs completely and account for everyone’s various moral principles through the training process. According to ‘Constitutional AI: Harmlessness from AI Feedback’ “During the first phase, the model is trained to critique and revise its own responses using the set of principles and a few examples of the process. During the second phase, a model is trained via reinforcement learning, but rather than using human feedback, it uses AI-generated feedback based on the set of principles to choose the more harmless output.”( Bai 2). The incorporation of AI-generated feedback during the reinforcement learning phase further enhances the model’s ability seemingly to mitigate biases and produce harmless outputs autonomously, but this is not completely verifiable under human understanding.
The methods these companies are using to gather data to train AI models raise considerations of the ethics of OpenAI developing programs such as Whisper to transcribe YouTube videos to gather data. The article ‘How Tech Giants Cut Corners to Harvest Data for A.I.’ highlights how there is a necessary “boldness” to allow these AI platforms to be as effective as possible and asks us to consider if this is a line that is not worth crossing. The methods employed by tech companies to gather data for training AI models, such as OpenAI’s Whisper, raise significant ethical concerns. The article quotes a professor of copyright law saying “At the end of the day, the issue is not whether these models will exist. It’s who will get paid.”(Metz 2) This article advocates for boldness in pushing AI platforms to achieve maximum effectiveness, but the quote from the copyright law professor underscores a critical point: the importance of fair compensation for creators. While boldness in innovation is necessary, it must be regulated to ensure peoples rights are not being violated. Crossing the line to gather data without proper authorization undermines the rights of creators. Ultimately, there is no necessary “boldness” which enables AI companies to violate intellectual property or copyright law and this is not a line worth crossing.
There seems to be a conscious effort to include diverse perspectives in the development process, as evidenced by the inclusion of principles aimed at considering values beyond “Western, rich, or industrialized cultures,” (Claude Constitution). This attempts to highlight a commitment to inclusivity however there are no specific examples given that produces a convincing argument, However, while these measures demonstrate a face value commitment to ethical AI development, questions persist regarding the democratization of AI and the breadth of voices actually included in the conversation. Moving forward, legal frameworks and educational programs on the real impacts of this technology are crucial in ensuring that AI development remains aligned with societal values and priorities while fostering a more inclusive and collaborative approach to decision-making. Legal models should be used to regulate and write AI constitutions into law while accounting for a broad range of populations and values which are county, city, state, and country specific that will aim to mitigate any harms or inherent biases within AI.
Agency
Teaching
LLMs can be implemented into the curriculum of Art History courses through programs’ ability to frame examples, gather feedback on the grammatical and objective characteristics of cataloging and essay writing. LLMs can also be implemented through organizational feedback of structures of understanding historical elements of certain symbols embedded within art works. Through the use of the RAG process and implementation of legitimate sources and pointed feedback on specific material the LLM can serve as a guide to help students and faculty create clearly organized art historical material that can add to the objective contexts of understanding known facts. It is important that limiting and outlawing any use of subjective arguments produced by the LLM is upheld when utilizing these virtual tools. Art is to be interpreted by and for humans based on their life experiences and the interactions people have with art and history through their personal understanding. There is no use purpose for an LLM to decide the subjective meaning of a work of Art and decipher the historically nuanced nature of symbolism and color which is produced through human creativity.
The DTA process can also be utilized in an art historical context to validate the results of programs such as LLMs, NYT API, and Google N-Grams, researchers can use statistical framework. This involves an approach integrating historical information and informed analysis. The process begins by clearly articulating the research question to ensure alignment between the chosen corpus and study objectives. The article “Text as Data” states that “we will know that our representation is working in a measurement model if the measures that we have created using that representation align with validated, hand-coded data and facts that we know about the social world.” (Grimmer 80).The impact of the corpus on results, researchers draw a parallel with the Federalist Papers, considering historical context and the specific population. Results are further validated by cross-referencing text occurrences with historical records, narrowing down analyses to specific book titles or influential writings, these are all processes which can be utilized in Art History. Grouping data based on different articles within the same subject area, researchers analyze the consistency of identified topics to show the validity of DTA results. In conclusion, the validation of DTA analysis should follow a systematic approach which enhances the reliability of findings.
Research
Art History professors can use and integrate LLMs in their own research through using LLMs for organizational, objective research questions, and formatting purposes. The article ‘Training language models to follow instructions with human feedback’ underscores three main concepts of reshaping the landscape of language model development. Firstly, it emphasizes the impact of integrating human preferences into the fine-tuning process, which elevates the model’s performance across various tasks. However, it acknowledges the pressing need for further enhancement of the safety and reliability of these models. Secondly, it delves into the importance of controllability, recognizing the crucial intersection between aligning model behavior with human intent and advancing control mechanisms. They combine Reinforcement Learning from Human Feedback (RLHF) with other methodologies, such as employing control codes or refining inference procedures. Thirdly, it confronts the challenge of designing an alignment process that is transparent and inclusive of the values and viewpoints common among the various elements impacted by language model technology. This endeavor seeks to ensure that the evolving technology resonates meaningfully with society which are all things to consider when utilizing LLMs for research.
There are careful considerations that must be made when utilizing LLMs in research endeavors especially for the subjective material and interpretations within an art historical context. As highlighted in the article ‘The Sleepy Copyright Office in the Middle of a High-Stakes Clash Over A.I.’, the copyright office relies on old rules and regulations that rarely change. The office has needed to adapt with the advent of cameras, records, the internet, etc. The Copyright Office’s reliance on outdated rules and regulations presents a significant challenge in adapting to the evolving landscape of technology and creativity. The emergence of AI undoubtedly marks a revolutionary moment for intellectual and copyright law, as it introduces complexities that traditional frameworks may struggle to address. There needs to be clarification on the ownership and attribution of AI-generated works. Establishing clear guidelines to determine whether the AI creator or the programmer holds intellectual property and copyright. Additionally, incorporating fair use policies in AI-generated content programs to foster innovation while still protecting the rights of original creators. Implementing mechanisms for tracking and monitoring AI-generated content to detect instances of infringement would be essential in upholding the integrity of copyright law in the digital age. Focusing on a proactive approach that balances the promotion of innovation with the protection of creators’ rights is essential for addressing concerns of AI and copyright law.
Students’ use
There is use for students to receive AI feedback on simple objective feedback, although there are many ways in which the training data and produced responses from AI are not completely reliable or verifiable. Beginning in the class of 2028 I would still recommend limiting any use of LLMs for Art History students, as artistic interpretation should be written and reflected on by students individually based on their own thoughts and visual understanding of the artwork presented to them. As evidenced by some AI companies turning to AI models creating text, images, and using this as training data, after running out of sources for training data. Some of the dangers of AI include creating “synthetic” data to train AI models on. The article ‘How Tech Giants Cut Corners to Harvest Data for A.I.’ describes this “synthetic code” as “not organic data created by humans, but text, images and code that A.I. models produce — in other words, the systems learn from what they themselves generate”(Metz 5).This will cause already encoded biases to become more ingrained in the system, therefore creating a model which is not being trained on real world information rather a representation of data that is organized and rearranged to reinforce the model. This would cause issues with “hallucinations” within the model and create an even less reliable information base because of all of the repetition and mixing of data within the AI produced content.
The alignment problem is highlighted in this article as a problem that encapsulates the need for large language models that are in sync with human intentions, values, and preferences rather than just optimizing for their trained objectives. The paper’s authors aim to tackle this issue by employing reinforcement learning from human feedback (RLHF). This method entails gathering data from human laborers demonstrating desired behaviors across various natural language tasks. A reward model is trained to predict outputs, which is then used to fine-tune large language models through reinforcement learning. However, several challenges loom over this process. The limited representativeness of labelers may skew the alignment process, perpetuating social biases. Moreover, optimizing for labelers’ preferences may endorse harmful instructions, leading to misinformation. Extending this approach to more advanced AI systems beyond current language models poses additional uncertainties. Addressing these concerns necessitates making the alignment process more transparent and accommodating to divergent values across various elements of the design. The authors acknowledge the limitations and open questions surrounding the alignment of language models and AI systems with the full spectrum of human values as technology continues to evolve. Keeping use of AI produced content and feedback for subjective interpretations of Art Historical content is what I would recommend for future implementations of these tools in the Art History Department.
Administration
In the department of Art History I would limit the use of AI programs into the educational process and only have AI tutors be utilized for formatting and organizational assistance. I think LLMs will have a major impact on jobs which have a core component of objective versus subjective interpretative skills. Understanding the affordability of mistakes is essential in comprehending the potential impact of Large Language Models (LLMs) on the job market and across all industries. As noted by Frey, LLMs are susceptible to hallucinations which means AI is prone to fabricating content and references, often referred to as “going off the rails” (Frey 3). Unlike other AI systems, there’s no easy fix as larger datasets to train the models effectively does not make the systems necessarily more reliable. Consequently, there’s a risk of transactional work being entirely supplanted by AI, given LLMs’ lack of ability in conceptualizing work and their excellent ability in remixing information. Moreover, Frey suggests that while LLMs may combine pre existing styles to generate value, they are more geared towards incremental adjustments rather than radical innovations (Frey 5). This emphasizes the importance of not solely relying on automation and the necessary and essential role of human input in the workforce and through subjective interpretations of material. This interpretation is a key focus in the study of Art History and a core component of the liberal arts education.
There must be an aligned mission of LLMs and the educational learning process in order for there to be any integration into the Art History department and other departments across Bowdoin’s campus. The paper ‘Training language models to follow instructions with human feedback’ delves into ethical considerations surrounding the alignment of language models with human values. It addresses the question of who these models are aligned to, acknowledging that they primarily reflect the preferences of a limited and non-representative group of labelers. This aspect aligns with the “Agency” component of the DCS 5As framework, emphasizing the importance of understanding the stakeholders involved and their diverse perspectives. The paper explores the potential for harmful applications of aligned models, highlighting the various ethical perspectives to mitigate such risks. This aligns with “Accountability” aspects of the DCS 5As, emphasizing the need to assess potential harms and establish mechanisms for responsible use. Efforts to mitigate negative outputs and ensure the safety and reliability of these models resonate with this aspect of the DCS 5As by proposing strategies for addressing ethical concerns and improving model performance while limiting its impact on the ability of people to hold their jobs and continue to learn in a productive way. Overall, the ethical tensions inherent in aligning language models should correspond with human values, emphasizing the importance of ethical considerations throughout the development and deployment of LLMs.
Works Cited
Abrantes-Metz, Rosa M., et al. “Can Exchanges of Anonymized Disaggregated Data Facilitate Collusion?” SSRN Electronic Journal, 2024, https://doi.org/10.2139/ssrn.4783532. Accessed 13 May 2024.
Bai, Yuntao, et al. Constitutional AI: Harmlessness from AI Feedback. 2022. Accessed 13 May 2024.
Bender, Emily, et al. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? Timnit Gebru * Timnit@Blackinai.org Black in AI Palo Alto, CA, USA CCS CONCEPTS • Computing Methodologies → Natural Language Processing. ACM Reference Format. https://doi.org/10.1145/3442188.3445922. Accessed 12 May 2024.
“Claude’s Constitution.” Www.anthropic.com, www.anthropic.com/news/claudes-constitution.
Frey, Carl. Generative AI and the Future of Work: A Reappraisal. drive.google.com/file/d/10BF28rYotpEznGPx6mh-CpYDYBRyDo7s/view.
“Gemini Apps Privacy Hub – Gemini Apps Help.” Support.google.com, support.google.com/gemini/answer/13594961?visit_id=638512059296332074-3954865167&p=privacy_help&rd=1. Accessed 13 May 2024.
Grimmer, Justin. “Text as Data: A New Framework for Machine Learning and the Social Sciences.” Contemporary Sociology, vol. 52, no. 4, SAGE Publishing, July 2023. Accessed 7 May 2024.
Metz, Cade, and Cecilia Khang. How Tech Giants Cut Corners to Harvest Data for A.I. 6 Apr. 2024.
Kang, Cecilia. The Sleepy Copyright Office in the Middle of a High-Stakes Clash over A.I. 24 Jan. 2024. Accessed 13 May 2024.
Oertner, Monika. “ChatGPT [Sammelrezension] Wolfram, Stephen: What Is ChatGPT Doing … And Why Does It Work? Wolfram Media, 2023. .” Info DaF. Informationen Deutsch Als Fremdsprache/Informationen Deutsch Als Fremdsprache, vol. 51, no. 2-3, De Gruyter, Mar. 2024, pp. 73–83. Accessed 30 Apr. 2024.
Ouyang, Long, et al. Training Language Models to Follow Instructions with Human Feedback. Accessed 13 May 2024.