Update to the previous post
You can read the previous post here
I received feedback from a few people I sent out the post to, based on which I’ve been rethinking where this could go if it potentially could and simplifying the idea
Summarised feedback:
- Interacting with an AI is old-school, telling your AI to continuously do your knowledge work is the future.
- Data security is paramount, no one wants to share their content freely anymore.
- Is the data scraping for just feeds of content? What will the output be? How can someone act on it?
- Companies want to train LLMs on their own data, how does this affect the idea?
- What if you need to retrain the LLM?
- Unclear what the moat is, if its not the LLM, and its not the data sources either, what is the moat?
Some definitions
- Feeds: Any stream of articles, tweets, content, email, documents images that contains valuable information that you want to draw insight out of.
- Insight: Anything that connects multiple sources of information to draw a conclusion that is novel
Here’s how I’m thinking about it from a systems perspective
- Define a set of feeds that the system must
listento, these can be- RSS Feeds
- Twitter profiles
- Docs that you mail to it
- Whatsapp messages you forward to it
- Emails that you receive
- Linkedin profiles that you want to monitor
- Any structured sheet that you might want to process from any source
- Define a set of customised prompts and output requirements based on examples you provide upfront.
- Along the lines of Custom Instructions: https://openai.com/blog/custom-instructions-for-chatgpt
To put it simply, you never interact with the AI itself, you just tell it what it needs to continuously do for you and send to you.
Examples of these could be
- I work in the financial sector and track renewable energy, read all articles published by the economic times, financial times, any other feed and then
- auto summarise the articles into a single email on a daily basis
- send me a separate email with only filtered content on changes in regulation of renewables
- I also manage a portfolio of companies in this sector ( A, B and C ), surface any relevant content from that and point out risks to me
- Auto download the financial statements of publicly listed companies in this sector and maintain a report in the form of a google sheet in this template: upload a CSV as a template and it should auto understand what the columns should be
- Generate content for an article for me on a weekly basis that is a summary of interesting things and send it to me as options on Whatsapp, after which you must auto generate a Linkedin post, a twitter thread and an Instagram story that I can then post on my feed later. Use my style-guide as a template for each ( define the colours, font etc )
- I work in Sales for a SaaS company trying to reach out to small businesses in Europe that currently use a competitors product, or engage with a competitors business
- I’ve procured a list of clients to send cadence driven emails to, but, it takes a while to customise the content per customer
- Setup a prompt to scan the company’s info, the individuals info through multiple sources, and generate a punchy email that speaks like you do and auto send it to them and keep replying to them as well, until an indicator of interest is visible at which point it escalates to you automatically with a summary of the entire conversation
- Provide the prompt with all the information it needs to answer questions that the customer could have
- Link it to your calendar so it knows when you’re available and can automatically confirm/deny bookings as well
- It also keeps procuring more contacts based on prompts that you setup and runs the cadence mails itself
- I’m the operations head at my company and given how large we are we operate with a lot of what is now called ‘organisational knowledge’
- This creates silos of excellence based on who has done what, which presents a risk when a particular skillset isn’t readily available
- keep the llm listening on all ops issues and resolutions
- auto float a survey internally any time as issue gets resolved and ask for how it got resolved
- it keep building the knowledge base up and updating it
- I run a data services company and I have to keep a tab of all the MoUs, LoIs signed and published and any disclosures that certain companies make
- Enter a list of sources
- Enter a list of filters
- Setup prompts to pull out data for each type of source
- Tabulate and store it
- Setup an automated report and mailer that can be sent to your clients
- Along with the ability to edit the data
- Potential use-cases for newsletters, editors as well
- I’m still trying to think of a use-case for a large enterprise, but, my sense is its a combination of a lot of these things along with the implementation cost, as is more often than not the case.
Coming to the feedback itself,
- Don’t interact with an AI often, just tell it what to do and let it keep doing it. Give it feedback from time to time and refine the prompts, switch between different LLMs and A/B test across LLMs for optimum results as well.
- Data security is very important, when it comes to public sources this is not an issue, but, there are 2 types of data that could be misused
- meta data - what sources you as a user are listening to
- This could be your own personal way to clearing out the noise from the signal
- private source that you share
- these could be DRM protected docs
- these might be pirated docs
- these might be sensitive docs that open you up for libel if revealed that you have access to it
- these might be docs that someone else was not supposed to share with you
- information that cannot be shared outside, the inverse of (b) I agree with this as a whole, imo, the model training and prompt setup is something that should be self hosted but serviced by someone ( which obviously means some infra costs but in exchange for secured data ) but with a licence fee for the tool itself
- meta data - what sources you as a user are listening to
- 2 parts to this one
- Data scraping: I think in my previous post i used the term feed too loosely, I don’t mean just scraping from somewhere, i meant listening on information that the user wants listened to as explained under the systems section earlier
- Output: You define what the output needs to be by giving it examples of what an ideal output looks like, this is what LLMs are great at
- Companies wanting to train on their own data, i think is ideal and should be the way forward, this is linked to #2 above
- Retraining: yes fully agree there is a risk of a better model etc, which is why I think it should be licence + serviced model
- What’s the moat? It’s not the LLM, its not the data, so what do you bring to the table. I’m unsure of what the moat here could be.
This is where I am with this one so far, would love more feedback and any thoughts on what use-cases large enterprises could have for this?
Sainath