Navigating the Vocabulary of Generative AI Collection (3 of three)

This is my 3rd and final post of this series ‘Navigating the Vocabulary of Gen AI’. If you would like to view parts 1 and 2 you will find information on the following AI terminology:

Part 1:

  • Artificial Intelligence
  • Machine Learning
  • Artificial Neural Networks (ANN)
  • Deep Learning
  • Generative AI (GAI)
  • Foundation Models
  • Large Language Models
  • Natural Language Processing (NLP)
  • Transformer Model
  • Generative Pretrained Transformer (GPT)

Part 2:

  • Responsible AI
  • Labelled data
  • Supervised learning
  • Unsupervised learning
  • Semi-supervised learning
  • Prompt engineering
  • Prompt chaining
  • Retrieval augmented generation (RAG)
  • Parameters
  • Fine Tuning


When it comes to machine learning, Bias is considered to be an issue in which elements of the data set being used to train the model have weighted distortion of statistical data.  This may unfairly and inaccurately sway the measurement and analysis of the training data, and therefore will produce biassed and prejudiced results.  This makes it essential to have high quality data when training models, as data that is incomplete and of low quality can produce unexpected and unreliable algorithm results due to inaccurate assumptions.


AI hallucinations occur when an AI program falsy generates responses that are made to appear factual and true.  Although hallucinations can be a rare occurrence, this is one good reason as to why you shouldn’t take all responses as granted.  Causes of hallucinations could be create through the adoption of biassed data, or simply generated using unjustified responses through the misinterpretation of data when training.  The term hallucination is used as it’s similar to the way humans can hallucinate by experiencing something that isn’t real.       


When it comes to AI, temperature is a parameter that allows you to adjust how random the response output from your models will be.  Depending on how the temperature is set will determine how focused or convoluted the output that is generated will be.  The temperature range is typically between 0 and 1, with a default value of 0.7.  When it’s set closer to 0, the more concentrated the response, as the number gets higher, then the more diverse it will be.


Anthropomorphism is that way in which the assignment of the human form, such as emotions, behaviours and characteristics are attributed to non-human ‘things’, including machines, animals, inanimate objects, the environment and more.  Through the use of AI, and as it develops further and becomes more complex and powerful, people can begin to anthropomorphize with computer programmes, even after very short exposures to it, which can influence people’s behaviours interacting with it.  


The term completion is used specifically within the realms of NLP models to describe the output that is generated from a response.  For example, if you were using ChatGTP, and you asked it a question, the response generated and returned to you as the user would be considered the ‘completion’ of that interaction.


A token can be seen as words and text supplied as an input to a prompt, it can be a whole word, just the beginning or the word, the end, spaces, single characters and anything in between, depending on the tokenization method being used.  These tokens are classed as small basic units used by LLMs to process and analyse input requests allowing it to generate a response based upon the tokens and patterns detected.  Different LLMs will have different token capacities for both the input and output of data which is defined as the context window.   

Emergence in AI

Emergence in AI will typically happen when a model scales in such size with an increasing number of parameters being used that it leads to unexpected behaviours that would not be possible to identify within a smaller model.  It develops an ability to learn and adjust without being specifically trained to do so in that way.  Risks and complications can arise in emergence behaviour in AI, for example, the system could develop its own response to a specific event which could lead to damaging and harmful consequences which it has not been explicitly trained to do.


AI embeddings are numerical representations of objects, words, or entities in a multi-dimensional space. Generated through machine learning algorithms, embeddings capture semantic relationships and similarities. In natural language processing, word embeddings convert words into vectors, enabling algorithms to understand context and meaning. Similarly, in image processing, embeddings represent images as vectors for analysis. These compact representations enhance computational efficiency, enabling AI systems to perform tasks such as language understanding, image recognition, and recommendation more effectively.

Text Classification

Text classification involves training a model to categorise and assign predefined labels to input text based on its content. Using techniques like natural language processing, the system learns patterns and context to analyse the structure from the input text and make accurate predictions on its sentiment, topic categorization and intent. AI text classifiers generally possess a wide understanding of different languages and contexts, which enables them to handle various tasks across different domains with adaptability and efficiency.

Context Window

The context window refers to how much text or information that an AI model can process and respond with through prompts.  This closely relates to the number of tokens that are used within the model, and this number will vary depending on which model you are using, and so will ultimately determine the size of the context window. Prompt engineering plays an important role when working within the confines of a specific content window.

That now brings me to the end of this blog series and so I hope you now have a greater understanding of some of the common vocabulary used when discussing generative AI, and artificial intelligence.

Leave a Reply

Your email address will not be published. Required fields are marked *