DuckAI is an open and scalable academic lab and open-source community working on various Machine Learning projects. Our team consists of researchers from the Georgia Institute of Technology and beyond, driven by our passion for investigating large language models and multimodal systems.
Our present endeavors concentrate on the development and analysis of a variety of dataset projects, with the aim of comprehending the depth and performance of these models across diverse domains.
Our objective is to welcome people with a variety of backgrounds to cutting-edge ML projects and rapidly scale up our community to make an impact on the ML landscape.
We are particularly devoted to open-sourcing datasets that can turn into an important infrastructure for the community and exploring various ways to improve the design of foundation models.
Summary:
This paper proposes a new neural network architecture called the Transformer that is based solely on attention mechanisms, without using sequence aligned RNNs or convolutions. The Transformer achieves state-of-the-art results in machine translation while being more parallelizable and requiring significantly less time to train. Key contributions:
Proposes multi-head self-attention as a replacement for recurrence and convolutions in encoder-decoder architectures. Self-attention connects all positions with a constant number of sequentially execute
This paper proposes a framework called LLMs As Tool Makers (LATM) that enables large language models (LLMs) to create and utilize their own tools for solving complex reasoning tasks. The key idea is to separate the process into two stages - tool making and tool using. In the tool making stage, a powerful yet expensive LLM acts as the "tool maker" to generate reusable Python functions for solving demonstrations of a task. In the tool using stage, a lightweight and cost-effective LLM acts as the "tool user" to call these tools to solve new instances of the task.
U.S. District Judge William Orrick said during a hearing in San Francisco on Wednesday that he was inclined to dismiss most of a lawsuit brought by a group of artists against generative artificial intelligence companies, though he would allow them to file a new complaint.
An intriguing video discussing Falcon 40B, another LLM that seems to perform really quite well, especially given its much smaller size than models like GPT 4.
BabyCommandAGI, which is based on
@yoheinakajima
's BabyAGI, can now automatically create apps just by providing feedback.
The following example is for creating a Reversi game Flutter app.
Set the following OBJECTIVE and INITIAL_TASK, then wait for about 30 minutes.
OBJECTIVE:
"Please install the Flutter environment via git, implement a Flutter app to play Reversi with black and white stones, and make the Flutter app you created accessible from outside the container by running 'flutter run -d web-server --web-port 8080 --web-hostname 0.0.0.0'."
Lightweight language for controlling OpenAI Chat API generations - GitHub - opensouls/LMYield: Lightweight language for controlling OpenAI Chat API generations
Link Actions
LMYield enables you to guide OpenAI's Chat API generations into arbitrary output patterns, and is specifically designed to enhance chain of thought prompting for agents.
The motivating concept behind LMYield is that for a given context, an agnetic entity will spawn some number of ordered, related chain of thoughts, and they should be yielded as a subscribable stream.
Features:
Simple, intuitive syntax, based on Handlebars templating.
Rich output structure with speculative caching and multiple generations to ensure desired output structure.
Designed specifically for agentic chain of thought.
Typescript not python
Summary:
This paper presents a method to reconstruct 3D shapes from a single image in an end-to-end manner without
time-consuming optimization. Their approach consists of three main parts:
Multi-view synthesis: They leverage a view-conditioned 2D diffusion model, Zero123, to generate multi-view
images of the input object.
Pose estimation: They estimate the elevation angle of the input image to determine the camera poses of the
multi-view images.
News
I've seen the posts about SuperHOT and just recently, the paper from Meta which uses RoPE interpolation, and I've noticed an immediate improvement that can be brought to this method. Basically if you apply Neural Tangent Kernel (NTK) theory to this problem, it becomes clear that simply interpolating the RoPE's fourier space "linearly" is very sub-optimal, as it prevents the network to distinguish the order and positions of tokens that are very cl