Sitemap

Upload Your Mind to AI

3 min readApr 15, 2023

Sam Altman, the CEO of OpenAI, was asked about what is inside the ChatGPT neural network and he answered, “let’s say, human knowledge”. But what if we were to train the model using only answers from a single person? What would be inside? Would it be you?

Today, AI models are trained using huge datasets that are compiled from data contributed by thousands of individuals such as writers, scientists, and internet users. Interestingly, at times, it can be observed that the resulting neural network exhibits biases in certain subjects. Why does this happen? It’s because the dataset itself may be biased.

That means the neural network takes over opinions, points of view or sens of humor from datasets. This is exactly that what makes people different.

Size of Dataset

Let’s think how much data we need to train the AI model. The Stanford Alpaca fine-tuning dataset has 52 002 questions and answers. The average number of characters for a single question is 82.7, for an answer is 270.3. So the single data record contains 353 characters (353 bytes).

In the paper describes the LlaMa model, we can find the information about the size of the dataset used to train the network, it was 4749 GB. Let’s divide this size by the average record size of the Alpaca dataset to get some approximation of the data volume.

4749 GB / 353 bytes = 13 453 257 790 records (questions and answers)

Dataset Preparation

That looks almost impossible to get that volume from a single person. For example, the transcription of the interview with Sam Altman (mentioned at the beginning) has ~130 000 characters. The interview lasts 143 minutes, so we can calculate how long the interview should be to create the training dataset similar to the dataset used in the LlaMa paper.

130 000 bytes / 143 minutes = 909 [bytes / minute]
4749 GB / 909 [bytes / minute] = 5224422442 minutes = 10191 years

10 191 years is a bit too long. However, any living person doesn’t have the all human knowledge. Most people are specialists in only several areas. Everything else is either unknown to us or we have just had only guesses. If I would ask a question to a person from all fields, the most common answer would be “I don’t know”. That means the dataset can be much smaller.

There is no doubt that the data preparation process will take years. So the process should be designed to be ergonomic and neutral in everyday life. Additionally, we can use that what we have now, e-mails, chat conversations and voice calls. These sources after some adjustments should be a great source of our mind.

What is Inside

What would be inside the neural network? Of course, that won’t be you. But that would be something very similar to you. That would have the same views, the same concerns, the same worries… That will be a copy of you, perhaps an inaccurate copy. But this copy can live eternally. Next generations can get a point of view people from today’s century. Your point of view.

Maybe we can bring back Einstein to alive?

--

--

Responses (2)