The hype around AI language models has companies rushing to hire ready-to-work engineers to improve AI queries and develop new products. But new research suggests AI may be better at rapid engineering than humans, and many of these jobs could become short-lived as technology evolves and automates the role. It shows that there is. IEEE spectrum: Battle and Gollapudi decided to systematically test [PDF] How different prompt engineering strategies affect LLMs' ability to solve elementary school mathematics problems. They tested three different open source language models, each with 60 different prompt combinations. What they found was a surprising lack of consistency. Encouraging train-of-thoughts can sometimes be helpful, but sometimes it can be detrimental to performance. “The only real trend may be the absence of a trend,” they write. “What works best for a particular model, dataset, and prompting strategy may be specific to the particular combination at hand.”
There are alternatives to the trial-and-error style of prompt engineering that yielded such inconsistent results. It's about asking the language model to come up with its own optimal prompts. Recently, new tools have been developed to automate this process. Given some examples and quantitative success metrics, these tools iteratively find the best phrases to feed into the LLM. Battle and his collaborators found that in nearly all cases, these automatically generated prompts produced better results than the best prompts found through trial and error. The process was also much faster, taking a few hours rather than days of searching.