tgy
July 30, 2024 at 14:04
20 mar 2024 Machine Learning

It’s hard to start this.

The first meaningful recollection of machine learning I have was probably something I’ve seen on George Hotz twitch channel. It was his based opinion on end-to-end ml. Something like “we should understand how people solve the task” and then instead of implementing heuristics and algorithms that suit particular task, let neural network do everything. Applied example was Tesla staying dead center in lane. Imagine autopilot is programmed just to understand cars position in space via image recognition neural network and move towards center of the lane using hardcoded algorithms. Once you start thinking about this “this is not how a human would do the task”, a human would try to stay centered, but if there is an obstacle they would deviate, if they see something outside of that lane, they might stop (example: braking/slowing down before animals on road). This particular autopilot ignores bunch of problems by design. What autopilot should instead do - get all possible information that human cares about: low-quality image with some form of depth perception (people gain depth perception from having two eyes 👀) and maybe some extra inputs like sound and light level; And get control over raw output: steering wheel, brakes and acceleration. This way autopilot will not just be more accurate, drive safer, but also it would work with different cars and adapt to different conditions. As a result we get this perplexing black box neural network. This level of complexity kept me off machine learning for a while - I was scared of wasting my time as a result of small expertise.

Secondly, I got into MIPT. Let me explain why this is crucial step. At MIPT with its unique system, there are only two chairs where you can do fundamental and applied informatics research, but a shitload 50+ AI labs. Considering university public image of fundamental research this is kind of strange. As I studied, it became too obvious that university is biased: choice of subjects is tailored to the needs of industry. AI research is inevitable for me.

Third, 2022 and early 2023 changed everything I knew about AI and ML. Stable diffusion, image generation, transformers, chatGPT, LLMs, llama, alpaca. Stanford alpaca proved, that now you don’t need ton of data and compute to build your own model powerful enough to compete with proprietary closed source openAI. AI-related startups started appearing daily. At this point, I was thinking like “I can do it. Why bother?”.

Fourth, summer 2023 Ronen Eldan “the TinyStories dataset: How small can language models be and still be able to speak coherent English?” https://www.youtube.com/watch?v=iNhrW0Nt7zs. This presentation is highly informative while still engaging, entertaining and easy to understand. Fall 2023 continued work on this concept by Sebastien Bubeck https://www.youtube.com/watch?v=24O1KcIO3FM. The worst thing ever - both of them work at Microsoft Research. Now I understand , there is a room for some innovation, there is a room for me. I am motivated.

So, my first attempt of doing something, convert TinyStories-1M to onnx and run inference in browser, failed. I have great excuse of Microsoft being silly in their documentation. Good luck running generative model when you don’t even include information about how Tensor object is stored in memory 👍

14 mar 2024
I implemented basic decoder only transformer neural network. After a lot of experimenting, reading papers and attending conference I have decent understanding of what is going on. I am content with my first experiment “taytay” trained on Taylor Swift songs.

In the end I am not satisfied with the results. I never wanted to do ML and by doing it myself, taking time to learn it, I proved the point.

Here is the repo if you wanna look at it: https://github.com/notTGY/taytay