Transcript#

This transcript was generated automatically and may contain errors.

Welcome. Excited today to kick off the APAC workshop here through RnPharma. We just had a good, gosh, probably about six to seven hours of talks this morning. We'll be posting those videos on the YouTube RnPharma in a couple weeks from now. Now for the APAC session, we've got an exciting hands-on workshop with Garret Grolemund. Garret's going to take you through working with Positron, working with AI, and doing some exciting work with querychat.

And so feel free to ask questions. This is going to be a hands-on workshop with Garret. He's a colleague of mine at Posit and leads up the educational practice. And so we're very excited to have him. It's a unique opportunity to work with Garret. So Garret, I will pass it over to you and I'll be backstage. And if you have any questions, you just let me know and I can jump back to the main lobby.

Thank you so much, Phil. And thank you everyone attending and maybe watching later. Let me make things a little visual for us by sharing some slides. As Phil said, my name's Garret. I've been working alongside Phil for, I guess, over a decade now at a company called Posit. Before that, we were known as RStudio, and we do a lot of work creating open source packages for data scientists. And that's why I'm here today to talk about some more recent work in the world of AI that we've done.

I have some data just to sort of provide a macguffin for us to work with this. Now, it's more healthcare related, so it might be a little further from what some of you who are primarily in pharmacy are doing at this conference, but this talk will mostly be about the tools that we use to touch this data. And those tools, I'm very confident, will be useful for you no matter what it is you're doing with the data.

But what is this data that we're going to use today? It's just a simulated data set of incident rates for different types of cancer in one of the states in the United States. This is the state of Georgia. And it's the sort of data that you might use to find hot spots of different types of diseases. This is an actual snapshot of the data, but as I said, the data is not going to matter as much as the tools today. We're going to use those tools in this data set to build something that looks like this over the next two hours.

This is an interactive app that includes a map, a table, and a plot, and it can include anything else you want to build with R. But what it uniquely includes on the side here will be a chat bot powered by an AI model, something like Anthropic Cloud or something like Amazon Bedrock or ChatGPT, that sort of thing. And researchers will be able to use the apps that you make like this to ask questions of the data. The components of the app will update based on what they're searching for. And since they're talking to an intelligent chat bot, they could ask questions of the data that would be really hard to sort of formulate by moving sliders around or selecting things from drop-down menus.

And since they're talking to an intelligent chat bot, they could ask questions of the data that would be really hard to sort of formulate by moving sliders around or selecting things from drop-down menus.

Now we got an hour, well, 115 minutes to get there. So we'll start simple with R, and then we'll progressively build more and more sophisticated ways of reporting the things that you make with R to non-R users, finishing at what I'm going to call a querychat app, which is what this is.

So for the couple of you who are here, this is Zoom, and there should be a green pencil icon down at the bottom corner. And I want to ask you a few questions that'd be hard to sort of coordinate through the chat. So look for that pencil icon. It's an annotate icon. Click it, and that should give you a pencil that you could actually use to draw on my screen and see if you could use it to circle, like the green text that says circle me. And then I'll know this is working and we can proceed.

I don't see any annotations happening. So I'm going to assume either people aren't interested in interacting right now or maybe something about the Zoom webinar prevents annotations.

Yeah, Garrett, I was looking backstage. I don't know if we have the annotate capability for the audience because I think it has been disabled with Zoom. So I don't know if there's a way for us to add it here for the participants. So I think the main way to communicate with them is through the chat built in to Zoom.

Yeah, that'll work. Okay, thanks. Zoom comes in a variety of flavors, so I never really know what to expect. But this is actually the Zoom conference platform, so it's even different than regular Zoom. So we can't circle anything here, but I do want to know what IDEs you use so I could speak to that. Well, actually, there's not many of us here right now. So if you could take a moment and just type in the Zoom chat, which I know people have access to, the names of the IDs here that you're used to.

Getting set up in Positron

All right. RStudio and Positron. Positron. Okay, how about that? All right. So it sounds like people are pretty familiar with Positron. I'll try to point out the essential things as we go, but I won't spend too much time explaining this ID that you may already have used. I put right here in the middle between RStudio and VS Code because it borrows from both of those. And so if you use any of those three IDs today, or if you use them, what you see today will look familiar to you.

All right. So let's get our own instance of Positron. This is meant to be an interactive workshop, and you can interact with it by going to this URL. I'm going to put it in the chat, so you could just click on it. And while you're doing that, I will go there as well. It's workshop.posit.team. And this is an instance. This is an instance of some hosting products that Posit makes that we'll be able to use to access the data and also to access some AI throughout the workshop. So go there. It's like a sandbox that you could get messy in and not have to worry about the consequences.

When you go to that URL, you'll see a screen like this. These are the three products in Posit team. And we're going to use those today just for simplicity. And we're going to use the Workbench product. So the first thing I want you to do is click Posit Workbench if you're following along. And that will launch Posit Workbench, which is a server product that spins up sessions where you can write code, like we'll be doing today.

Now, when you go to Posit Workbench, it'll ask you to sign in. Once you click Sign In with OpenID, this is your first time at this server, I anticipate, although maybe you were here earlier in the day. But what I recommend you do is go to the bottom of the signup where it says Register, and we'll all just create new free accounts. So if you click Register and take a minute to fill out this form, it doesn't matter so much what you put in here. But remember what you put in here, because we will use it later to host our apps that we make.

Once you fill out your information, there's a little bit of a dance at the beginning, but hopefully it'll go smoothly for us. You can click Register. And then the most likely thing is you're going to get this misleading page that says, we are sorry, we don't recognize you. That's because the timing's a little off and we're actually trying to get into the product before it finishes making our accounts. There's two different pieces of software involved here. That's okay. Once you see the we are sorry sign, fix it. If you got in, that's great. Don't do anything. Just wait for me. But if you see a sign, go back to where it says sign in and then just sign in with the account you just made.

And then that should take you through. If it takes you to this screen with the scary red message, just ignore it and click sign in. And then against all odds, you'll be in here. Getting through the scary warning signs at the beginning is intimidating, but you should be here. If you're trying to get here and you haven't made it, let me know in the chat and I'll try to help you get through.

All right. So I'll keep an eye on the chat as we go forward, but it looks like everyone is either content to watch or having success. And those are both great outcomes. So now that we're inside Posit Workbench, what we want to do is create a session where we can write code. We'll try out all the tools we have in store for us.

And here we go it's opening up a new session on the server and this is the Positron code editor. So if you haven't seen Positron before this is what it looks like. It is a code editor that's free and source available so you can download this if you google Positron or go to positron.posit.co and you could use it on your desktop and there's no charge involved. It's free software. It is an instance or a fork of the open source Visual Studio Code editor that Posit has developed a lot of features out of the box for data science that come with it including an integrated Python console or if you like an integrated R console.

Many other features we'll touch on some today but I like to think of as a combination of Visual Studio Code and the RStudio IDE. Now the first thing that happens whenever you open a Positron session is it wants you to choose a folder to work from. This is just like in R whenever you use R it's working from a working directory on your computer. That's the place where it looks for files. That's the place where it saves files and that's where it assumes file paths will start from. Well Positron is exactly the same and the first thing we should do is choose a folder to work out of.

My folder for you is actually on github so don't do whatever I just did. Instead click new folder from git. If you haven't clicked anything yet that'll be one of the options here on the side or in the middle but we want to get new folder from git and this will ask us for a url to paste in here and let me fetch the url of the git repo for us.

This is the repo it has the content we'll use and I'm going to paste it into the chat so you could take it. I actually recommend visiting the website versus copying it from zoom because I've been noticing a lot of formatting issues that zoom's inserting. Once you get to the website you can copy it from there go back to Positron and give it to Positron. All Positron needs to know is the url of this git repo and if you paste it in here don't adjust what you see in here.

It's downloading the class contents and here we have that georgia cases data set that I mentioned before and we have an R session loaded now and it's ready for us to write code down here or to start a document up above.

Now in the code files that come in this folder there's something called miscellaneous.r — m-i-s-c.r — it's an R script with some code in it and if you click on it it'll open in the source editor part of Positron. If you click on it one time it opens Positron will open as a ephemeral file so you could look at it but as soon as you click on a different file that will go away but if you double click on it it'll be permanently there which is what I did. You'll notice it's no longer italicized and it's there until you close it. When I was new at Positron that was very frustrating so I like to point that out for folks.

About Posit and the tools we're using

All right so Positron is an IDE for data scientists it's a combination of RStudio and VS Code. The difference between it and the RStudio IDE that Posit also makes is Positron is multilingual — we can use it for Python, we can use it for R — and it inherits a lot of the newer features that have come about in the 15 years since we made RStudio. So it's a place where we could create AI tools very easily while we work on backporting them to the RStudio IDE.

If you're unfamiliar with Posit we're a company that likes to make free and open source software, we're a public benefit corporation and we support the open source pledge. We make free stuff for scientists, scientists use it for free, but some companies employ scientists and whether they're scientists using free software or paid software those companies are spending a lot of money to get the scientists to collaborate and work together efficiently. So we do also make products that help scientists who work in companies using open source products to collaborate more efficiently — those are professional products — and we take the money we make from those and reinvest it into the open source cycle.

So what does that mean for us? Well I want everyone to be very clear that the Positron code editor and the packages that we're using today are all the free and open source side of things. We are going to use AI — AI is not from Posit and it's also not free — you'll probably have to pay Amazon Bedrock or Anthropic to use it but I think that will be obvious when we get there. Today we're sort of like a team of scientists trying to all work in the same environment and coordinate so just out of convenience all the free stuff we're using I'm hosting on the server which comes from the professional side of things we call Posit Workbench.

Workshop overview and exploring the data

All right wow that was a lot of setup but let's get started. We're going to look at coding with R, making reports out of that code, then dashboards, interactive apps, and then those AI apps with querychat like I showed. Along the way you'll learn how to use AI tools to do data science in that free Positron editor. You'll learn how to publish what you make to Connect or Connect to cloud and if you publish there you'll also learn how to configure those platforms so they can support the AI that will appear in your apps.

So starting with R — this is the R and Pharma conference so I think we're probably pretty fresh on how R works. Let's take a poll — see if there's any — in the chat room which languages do you use. Oh heck there's like five of us — if you don't mind just tell me what languages you program with in the chat. Okay great — one and two — okay that's good so on everything we do today with R you can do with Python as well as R if that happens to be your preference one day.

All right let's forge ahead so we have the script misc.r — I would like you to open that and run each line of code in the script and see if you could tell what it does. We'll recycle that code over and over again today. One of those lines of code will return an error so when you see that don't be surprised you haven't done anything wrong. I'm going to start the — we'll spend about two minutes doing that and that's the perfect time to ask questions that you've been saving or other things.

Okay so let's take a look at what this file does. The first few lines are just loading packages so let's see if we have those. It's great if you did this and it said you need to install the packages — that probably means you probably selected the wrong image when we launched this session and all you have to do is install those packages. The second line reads in the georgia cases data set that we were looking at before and once we read it in now that it exists in memory we see it in our variables pane and if you want to take a closer look at it you can use the data explorer icon next to its name in the variables pane that'll open it up over here.

And then if you're new to Positron I'll probably go use these three buttons at the top very often — they just hide and bring back parts of the IDE. So this makes it a little easier to look at the data if I remove the rest of the IDE. And we see that we have different demographic groups in different counties of Georgia, the number of cases for different types of cancer and then the number of people in those groups so we could calculate a rate if we want to. And then over here on the side there's sort of a top level view of each of the variables in this data set and the distribution of values. So that's the data we'll be using.

The next lines go source helpers.r — helpers.r is a file that defines a couple functions. One of those functions is create incident map that makes a leaflet plot — the choropleth map we saw in the slides. The next function makes an interactive table of the georgia cases table. It's kind of a big data set so it takes a few seconds for the table to load but this is you know sort of a tidy way to let someone interactively explore the raw data. And then the final line of code returns an error and there's a reason for that. We're calling a function called plot incidents by demographic and I meant to write that function in helpers.r but somewhere between writing it documenting and writing it something got erased and so we have an error — that function has not been defined.

Using AI in Positron

This gives us the perfect opportunity to put AI to the test so let's look at how AI works here in Positron, how you could bring AI into your workflows, and then we'll ask it to write that missing function. Inside of Positron there is an AI assistant called Posit assistant and what that means is — think of it as a portal into your session for a large language model. It doesn't have its own large language model — Posit doesn't make large language models, that's quite a big business and we're just not into that. And also we know that if you're working at an institution or a company or anywhere you've probably negotiated the ability to use large language models from one model provider or another based on some things that are very important to you like security arrangements or validation or even just budget negotiations with that model provider. That's a lot of hard work to get to the point where you can use a model and we don't want to get in the way of any of that.

So Positron allows you to take the models that you want to use and use them inside of Positron. It looks a little like this — when you first try to use it it would say choose a model and it take you to this window and that window might be configured with more models than this — this is just the way I have it set up locally. I could go to the model provider I want to use that day, when I click on it I'll have whatever credentials they ask for so I could sign into my account with Anthropic or Bedrock or Gemini and from that moment on those models will be like the batteries that power my coding session when I type something into Posit assistant.

It will send the query to the model provider I'm logged in with, that model will provide the response and they'll come back to me in Posit assistant and I'll get to read it there. But it's not like I'm opening Anthropic in the window beside Posit assistant — if I were to do that and I ask you about you know what's going wrong in my R session or can you write some code that works with my table you would say yes I'd love to but you got to tell me more about this R session because to models your R session or your Python session will be a black box — it just doesn't know what's happening in there. But Posit assistant is a way to bring it into your R session where it can see what's happening so it can help you be productive inside your R session.

Posit assistant is a way to bring it into your R session where it can see what's happening so it can help you be productive inside your R session.

We also upskill the models as we bring them into Positron because we know you're doing certain things with R and Python and data and we could teach the models how to do those things so they will be better data scientists than they will be out of the box as it were.

All right so this is what I mean by once you load a model into Posit assistant it can see the scripts that you're writing, it could see your console, your R session, your Python session if you're using Python. You can see the plots you've made and most importantly you can see what exists in the memory of your R session. That's where a lot of bugs happen and that's where whenever you write code whether it's code that fixes a bug or it's just going to work with everything else that's going on you need to be aware of those variable names, what tables does the code touch, what are the names of the columns in that table. It's not really so helpful to have AI write dummy code that you then need to edit before you can use — you can go so much faster and that's the job of Posit assistant.

So let's take a look at Posit assistant and how it lets us do that. On this server one of the reasons I'm having us use it today is it's pre-configured with AI that we could try out for the workshop. Posit assistant itself appears in this blue ribbon on the side — it looks like two diamonds overlapping. If you hover over that for a second it'll say Posit assistant. If you're zoomed in which you might be it might be collapsed inside of these three dots — in that case click the three dots and see if something says Posit assistant and then select Posit assistant.

And then it's going to say um you know use a slightly outdated version of Posit — that's fine just dismiss this message. Now we can start chatting. I have it set up so you don't have to sign into a model but if you were signing into it it would take you — there'd be a bright blue button — it'd just take you right to where you'd go. If you go to the gear icon configure LLM providers you don't need to do any of this — we're signed into Amazon Bedrock — do not click sign out that would be very counterproductive. But if you look at some of these other models you could see oh if I wonder with Anthropic today I'd have to provide an API key and click sign in and if I'm over with GitHub Copilot I click here and it asks for a GitHub PAT or something — OpenAI — every model has its own way of sort of letting you log in and that's how you get to those models.

But today we already have a model at our disposal — it's Amazon Bedrock — so I'm just going to close this. We could start using this and what we can do is we could type to it as if it were a chat bot down here. We could select which model we wanted to use — Amazon Bedrock provides all these different models so depending on what I'm going for I might ask Opus to do the work or I might stick with Sonnet.

or use haiku for everything we're doing today claude sonnet 4.6 is just fine that's the most tested version that i've done this with and even though we won't go into all the bells and whistles of posit assistant you could see um you can decide how much effort you want to put into thinking for any of your queries with this light bulb the paintbrush is very interesting that's a data cleaning mode if you were just using positive system to clean data click that mode on ask it to clean your data it will clean it up and it will save the work as an r script that you would then put like maybe in your data folder next to the raw data the sort of reproducible script that you want to keep around and then the last one is a plan mode which is just a typical plan mode that you see with most models these days thank you phil for the heavy lifting you're doing in the background there in the chat i see that

all right so let's put positive system to the test i have a prompt for us to use you don't have to use this prompt but i'll paste it into the chat so you can copy and paste if you want the whole idea here is we're going to ask posit assistant to write the missing function so i'm pasting it in there and then you do that and press the up arrow it will have a bit of a dialogue it'll ask you for permission to do things today i encourage you to just experiment and always allow it to do things but see if you could get that function to appear and then rerun miscellaneous.r and see if that function works at the end

so let's let's spend a few minutes doing that all i will also check in on the chat to see if everything's going smoothly there

if you're feeling curious about anything or have any questions feel free to type them in the chat i'm happy to answer them if i can

all right so we're at the 30 second warning i'm going to try to catch up to you guys because i haven't run mine yet

How Posit Assistant works

all right so i have my my prompt here uh the reason this prompt is going to work is inside of helpers.r i've actually done a lot of work describing how this function should work uh ai is not a mind reader so if i hadn't written this out here i should probably add that to my prompt to tell it how i expect the function to work but between these two things it should be able to get the job done so i'll send this over and we'll get to see a little bit of how posit assistant works it is going to be powered by the models you signed up and logged in for but we teach those models to behave in a certain way inside of Positron

so first it it's very deferential it wants to ask permission to do the things that's going to do right now it's telling me that it feels like it needs to read helpers.r to do what i asked it to definitely does so i could click decline here i could click allow if i click decline i could use this pencil to edit it i'm not going to do that right yet but i will later or if i'm if i want to allow it i could allow it just this time or i could use the drop down menu to say you could do this sort of thing for this whole session today or you could do this sort of thing anytime i'm working in this project whether it's tomorrow or the next week and i'm going to do that just to be permissive today so we don't see that this allow step as so often okay so it's reading that file if i want to see what happened in there i could you know see what what it was doing now it says okay i've read that file let me check the structure of georgia cases because i told it to try it out georgia cases so here it's calling something called execute code

So here it's calling something called execute code, and what it did is it sent this line of code, it wrote that code, and sent it down to Positron on my computer to be run. And then Positron ran it against the data, which lives on my computer, and then it sent back the response to the chatbot to read. And this sort of round-trip is the basic mode for Posit Assistant. It never wants to send its data across to the model because that would burn up tokens very fast and be quite expensive. Also, that doesn't really play to the model's strength. LLMs are really good at writing code, so it asks the model in the background to try to answer your question by writing code, and then it'll send that code into Positron to be run. Your R session is very good at running code. And it will send the results of that code back to the model. And those results are going to be so much more concise token-wise than the actual data set. So we'll see that pattern play out a few times.

It never wants to send its data across to the model because that would burn up tokens very fast and be quite expensive. Also, that doesn't really play to the model's strength. LLMs are really good at writing code, so it asks the model in the background to try to answer your question by writing code, and then it'll send that code into Positron to be run. Your R session is very good at running code.

Now let's look at the data. It's read my helper's file. It feels like it has everything it needs, and it's asking for permission to edit my helper's file. And it's going to show me here what it wants to do. These are the edits it wants to make. And the way to read this is like reading a diff on GitHub. The pink lines are lines it wants to delete. And the green lines are new lines it wants to write into the file. And it's writing the definition of a function. I can't tell if it works just by glancing at it, but this is what I asked it to do. So I'm going to allow it to do that. And once again, I'm going to be permissive today. And here I can see it wrote it into my file. Now the last part of my prompt asks it to try out that function. And I think this is good practice. AI can hallucinate. So if you're asking it to write code, if you can, you should have it immediately run that code, because that will catch a hallucination. If the code produces the result you're after, it works, and you'll know it. If it doesn't, you'll fail fast, and then you can just iterate with AI and say, oh, that didn't work. Please debug it. AI is very good at debugging code.

All right. So here I have this plot, and it's testing it out, once again, by running code on my computer. I see it come across into my console. I see the results over here. The results also come back here. It looks like it did its job. And if I go to miscellaneous.r now, I could resource helpers.r to get the updated version with the second function definition. And if I run this code, that should work. And it's just going to recreate the same plot I saw before, because I tested it out with exactly this code. But you know, when I did this, I thought the bar chart might go up and down. So this is a little unexpected. And that's the perfect place for me to just iterate with this conversation. I could keep it going. So I could say, please change the function. So that if I look over here in the function definition, I want to switch the axes. It's put a bar on the y-axis. So I'll just say such that bar is on the x-axis.

And that is completely something AI can do. So let's go make that change. If I thought the plot would do better with a different theme or some annotations or whatever, these are all sorts of things you can ask it to do, and it will be able to figure out how to do that with R and put it into your code very well. OK, so now I have the bar chart I was looking for. I think I'm all set up to go. We've used AI to write some R code. And those principles will apply to any R code that you want to write. Now let's start producing a product with this code.

Introducing Quarto reports

What do I mean by a product? So we have our code. We looked at Positive Assistant, how to access it, how you'll need to sign into some LLM to use it, to power it, and how it will use that LLM to do data science for you. If for some reason you didn't get that to work and you couldn't add that function, there is a folder that you could dip into any time during this workshop by going to the file browser. It's the solutions folder. And all the waypoints we're going to visit today are available here. You can just open one up, copy and paste the code out to the actual files we're working with. I will caution you, don't try to run any of these solutions from within the solutions folder, because all the file paths they use are designed to work out here, outside of the solutions folder. They won't work if you run it from inside of solutions.

All right, so Quarto. What is a Quarto report? This is the way I do my work, and I recommend Quarto strongly to you as R users. If you've ever used Sweave or Knitter or R Markdown, Quarto is the same type of thing. It's a way to do literate programming to combine text with your code and to produce reports from it. There is a website called quarto.org that explains how to use Quarto. It's very, very well documented. We can see right here from the screenshot of the title page that you can use Quarto to build websites, dashboards, articles, presentations, books, knowledge repos, and also just simple reports, whether those are in PDFs or HTML or even Microsoft Word. Let's use it to make a report.

If you're an R Markdown user and you're wondering why you'd want to use Quarto, Quarto was written by UAZ, who made R Markdown, who is the next version of R Markdown. We just had to change the name because R Markdown, you use R, but in Quarto, you can actually use Python, Julia, Observable, and R. You can use any language. The R Markdown name didn't really stick, so instead of calling it R Markdown 2.0, we called it Quarto. Other than that, the syntax is very, very similar, and you basically already know how to use it since you're an R Markdown user. Let's go over here to this. Let me open up some files here.

Let's go close all the files, make things simple. I'll reopen miscellaneous.r because I'll need that in a second. I'm going to double-click it so it sticks around, and then I'm going to open report.qmd. Report.qmd is a Quarto file, so we can take a look at it and see what Quarto looks like. Make some space to look at. Quarto files have an optional YAML header that contains some metadata we'll get into in a second, and then these have text format in the Markdown syntax, so you might recognize things like a second-level header. I use a lot of those. A blank line, some italicized text because it's surrounded by asterisks, but the most interesting thing about Quarto document is in between the text, you can insert code cells, and you could run R code from these cells, so if I'm working on some R script, I could write my code here. I'll make a histogram, and then I could put my cursor there and hit Command-Return or Control-Enter to send that code down to the console and run it, and I've just made a histogram. Or I could click the Run Cell button, and it'll run every line of code that I add in here. I could put as many lines as I want. It'll just run them all for me.

So this is a perfect development environment. That's how I use it, and if you develop your code in a Quarto document, you'll make the next step of your life even easier, which is reporting. Because Quarto documents are designed so you could click this Preview button at the top, and it will generate a report from your document. This document becomes source code for what it will make. In this case, it's set up to make an HTML file, so let me make some space over here on the side and click Preview, and what will happen is R will create an HTML file. It'll run every code chunk in this document, and it'll insert the results of those code chunks into the document where the code cell lives. But these code cells are all empty. Ideally, they'd be inserting plots here throughout this readable document, and we have the tools to make that happen. What I'd like you to do is go to miscellaneous.R, take all this code, and move it all over to report. Every single line should appear there, but it'd make more sense than not to put different lines in different code cells, and I think you could figure out how you'd want this to look just by reading the text or reading the labels for the code cells. But that's our next assignment. Copy and paste the code in miscellaneous.R to the various code cells within report.qmd, and then preview the report by clicking that Preview button.

It wouldn't be cheating. Actually, it's worth trying it when you practice with the AI. I think that's the sort of low-hanging fruit that I don't think is necessary to use AI. It's interesting. If we do that, people may get different reports. They might, but if you tell it to just stick with the code in miscellaneous.R, it should. It should be pretty consistent.

Did you mention at the beginning the data for the Georgia cases? Is that synthetic or data you found online? It's simulated. It's based on real trends, but they do not show up exactly in there. It's not perfect.

All of this code from line 11 up doesn't produce anything, but nothing below it will work unless this is run. That makes it setup code. I like to put setup code in its own setup chunk at the start of the document. Then each of these things create something, and there's three code cells remaining here, and they each sort of ask for something. The text says we're going to show them a map, so I'm going to put one that creates a map in the cell that goes with that text. Similar for this plot. Next thing it says is create a chart about income, so I'll put plot here. Actually, it says race, ethnicity. That's a poor label. Maybe I should fix that. Then finally, the data table line creates a table, and that would be appropriate for down here. Now that I have all this in here, I know all that code runs because we've already run it as a script. I'll just click preview. I'll be able to run each of those code cells with no error messages, and it will regenerate my HTML file. I'll see how it looks here in a second.

Now instead of just having the text, it's including the results of these code cells. The code cells don't show up because I can tell Quarto not to echo back the code. I could have the code in there if I want, but if I just go give this to a colleague who doesn't write code, they probably don't want to read the code. They just want to see the plot. I'm also telling it not to pass through any warning messages or just regular messages the code might generate. Then this HTML file now lives right beside report.qmd, so if I wanted to host this somewhere, I know where to go to get the file to host. That could be my personal blog, that could be something my institute makes in Reports of the World, or that could be something like Posit Connect or Connect Cloud.

But this is a pretty attractive-looking report now, and if I gave it to someone and they said, hey, that's great, but I want this plot a little different, maybe different variables or something, or I want to see something new. Instead of having to remember where I wrote the code that made that plot and how I set up the environment to get that code to work right so I could recreate it and copy and paste it into my Microsoft Word document, it's all right here, self-contained in this Quarto document. All I have to do is open the document, make the change in the code, and then rerun it from the top, and I'll have a new HTML file to replace the old one with. It'll keep my life a lot easier and cleaner as I manage the reports I put out there. So those are Quarto reports. And again, I recommend to you quarto.org is the place to learn more about that.

Building a Quarto dashboard

And one thing you will see at quarto.org is using Quarto lets you make more than just those reports. So we're already halfway to making, believe it or not, a website or a slideshow. And this website will show you everything you need to know about the different things you want to make. Depending on what you want to make, you can click through to it. But what I would like us to do next is make a dashboard. A dashboard is basically a website. It's a very visual way to display data. And we can do it with what we have here.

But one thing I'll point out is dashboards usually don't have that much text in them. So we're going to want to get rid of all the text that's in our Quarto report. They also have a nice layout to them. So let's learn about how to make a layout to turn that Quarto report we just looked at into a dashboard. Now, everything we're doing is to go start building it. So it's going to be easier and easier to do what comes next. To make the dashboard, we will change the format from HTML to dashboard. Once you do that, I recommend stripping out all the text in your document and even all the headers in the beginning, leaving just the code cells. Each code cell will become its own card in the dashboard. That means it will just have a self-contained box, keeping everything in that code cell together. And those cards by default will be laid out one on top of another. So each card sort of makes its own row in the dashboard. And that's a fine layout, but not particularly sophisticated.

You can control this layout, and the way you can control it with Quarto is by putting headers back into your document. The text of the headers will not appear anywhere, but the fact that there is a header there will change the layout. Each second-level header you add to your document will start a new row in the dashboard, and everything will be arranged in that row until you tell it to make a new row at this point. So here if I add these two second-level headers in these two places, those four cards would now be arranged like this into two rows. I'm skipping over first-level headers. You can put those in there too. Those make different tabs in your dashboard. It's a website thing, but I don't have enough content to worry about tabs. If you want to break it up even further, you might want to take one of those rows and put columns in it. You can do that with a third-level header.

So within the row defined by the second-level header, I can now make two columns defined by these third-level headers, and everything in a column will be arranged going down in the column until I start a new column. That's how I get this layout. Now, by experimenting with those two things, you can create half of the dashboard layouts that you imagine. The other half will be layouts that begin by splitting your dashboard by columns, not by splitting it by rows. You can get to those by adding this second value up here, orientation columns. It just flips everything. So now the cards are laid out one column at a time. Now second-level headers start a new column. And now third-level headers start a row within the column. So it just exactly flips what happened before. Between those two things, you now can spend enough time with the fiddly details, make any sort of arrangement you want.

The last bit of power that's missing is right now Quarto is going to split everything in equal proportions, but sometimes a good layout isn't split equally. If you want to change the width or the height of one of these elements, you can do that when you set the header. And you can either change the height or the width with this syntax. It's borrowed from CSS. Use braces, no quotation marks, and no spaces. You can set the height to some percentage of the thing it's occupying. So this would make that row only 30% of the total column height that it's in.

Okay, that's a lot in like three or four, five minutes to go through, but it's actually not that much conceptually. And if you sit with it for a while, like 10 minutes, I think you'll find that you can master this pretty quickly and start laying out things into web page dashboards that you can then share with your colleagues. And that's what I'd like you to do with report.qmd. I want us... Sorry, this is what that looks like in situ. I want us to turn the report into a dashboard. It'll look a little like this dashboard, which you'll notice is halfway towards our shiny app. And I want you to do that with these steps. Go to report.qmd as it exists. Change format HTML to these three lines here. Notice that each line is indented two more spaces from the one above it. And then remove all of the text and headers. You can keep them in there and preview it. It'll make some weirdly laid out dashboard with text everywhere. But to get where we want to go, you're going to want to erase all that. And then purposefully go back in there with column and row headers to create the layout you see here. And spend four minutes doing that. In the last minute, I'll swoop in and do it with you. And if there's any questions, let me know.

So these are new packages I haven't used before. What's the Tigris one? Maybe I just haven't come across it. I think it's a new version of something we might have done with census. I think it's getting the shape files to make that map. Shape files. Got it.

I was impressed with the map popped up so quickly. Like in the MISC file. It was like create incident map. That's just lethal. It's using leaflet. I wrote that function so it's easy to call. If you look in helper, you can see the actual code for it.

You went with Georgia. I thought you'd go with your hometown. I was at a conference in Georgia. Georgia is an interesting state. It was there where I had the Waymo driving at me in an Uber. You can see on the map right there, that white spot at the top, that's basically Atlanta. Georgia is like the suburbs of Atlanta to the ocean. You and I were there back in March. We had a conference there last year. It seems like a long time ago. My hometown is not too far from the Georgia border. I'm practically in Tennessee. I live near Chattanooga, which is right over the state line.

It's a nice big state. It's a long drive to get to Florida. Unfortunately, it is. There's a new truck stop everyone likes to hit on the way down. Buc-ee's. There's one at every truck stop. There's something special. My kids like them. I think Indiana meets grocery store meets restaurant meets theme park. Kids seem happy there too. Kids love it. It's good branding too. I think Indiana is going to start getting them. We don't have them yet.

I made a shiny app for another workshop. How did you get the data for that? I think the workshop was about scraping data from the web. I used to have Starbucks data for that. I loved it, but they stopped maintaining the latitude and longitude for the data set I was pulling. Buc-ee's would work though.

Arpit has a question in the chat. Quarto looks like a great way to organize an update report and has an audit trail.

We often use MS Word or Excel for sharing results and reports with stakeholders. One benefit is that it allows stakeholders to add comments and provide feedback within the document. Does Quarto reporting capabilities allow for similar functionality? Any other recommendation on how you may have handled something like that?

You know, it's interesting, Arpit, because I remember JJ Lair building in commenting functionality to Quarto, but I'm not sure that went anywhere. If it didn't, I'm pretty sure the reason is because people do what I do, which is your Quarto documents are just text files, so they're very easy to host on GitHub or Git repo and look at differences and see things there. It's not like JSON or some weird HTML format or anything where you can't really tell what's going on. It's literally what you see in your Positron or RStudio is what appears on GitHub, and then people leave their comments on GitHub as a pull request. That's what I recommend doing, but I could see how you may be interacting with people who are quite fine leaving a comment on Microsoft Word, but perhaps not on GitHub. I don't think I could help you there.

Then Daniel says, you can always render a Quarto document in PDF format and then use annotations and comments in the PDF. That's true, and then I'm being very short-sighted because not only can you render in PDF, you actually can render a Quarto document to a Microsoft Word format and then use the Microsoft Word commenting. I guess that would take you full circle.

Converting the report to a dashboard

Speaking of full circle, let's turn this into a dashboard. Here it says format HTML. If you wanted to turn this into that PDF we're talking about, it's format PDF, and if you want to turn it into a Microsoft Word document, I think it's format Microsoft Word. You can learn about both of those on the Quarto website. Both of them do require you to have other bits of software installed on your computer, like maybe Types or LaTeX to make a PDF, and probably Microsoft Word to make Microsoft Word.

Today, we want to turn it into a dashboard, and not only that, I want to add an extra value under dashboard saying orientation columns. I go to the next line, and I'm going to indent two more to show that this is a sub-value of dashboard orientation, and the orientation will be columns. Then this would work as a dashboard, and I'll preview it so I can run while I'm doing other things, and there's pretty much rhyme or reason to how it just comes out by happenstance based on what I wrote for the report.

I'm going to erase all of this, but I'm going to keep every single code cell. Now it's loading those files. Sometimes you can't really tell what's going on until you make this bigger, or there's a pop-out button here that you can use to launch the whole thing in a webpage so you can see better how someone else would see it. It still looks like the report, actually. It doesn't look like much of a dashboard at all. I'm going to change that by working with the layout here.

I still need to erase a couple more lines of text and a couple of headers. Now that that's all gone, if I preview this, it's orientation columns. Every code cell should be its own column minus the setup code cell, which just won't appear because it's not producing anything that could appear. Okay, so we achieved that. Everything is its own column, three columns. It's not really the layout I'm going for, but now I know that it's working.

Now I need to think about this. I want to divide it into two columns, and I can do that with the second-level header. The name I choose for the header won't show up, but if I use column, that'll be a good reminder to me of what I'm doing. The first column, I just want to have this Georgia map, so after the Georgia map, I'll start a new column, a second column. That will combine these two things into one column. There it goes. It's remaking the app.

This app looks pretty good, and if you get to here with this app, that'd be fine. Maybe you want to keep going forward. The only thing that I'd tweak is maybe I want that plot to be a little smaller. In real life, the more I do this, the more I realize I probably don't, but I do want to know that I could do that if I wanted to. To do that, I need to put it in a row so I could add the height thing to that row. It is in a row right now, but I have nowhere to really type height equals 30% or whatever I want.

To have a place to type that, I'm just going to manually, explicitly put it into a row with a third-level header. That third-level header will have that. If I just left it like this, these two would both be scooped up in the same row, and I don't even know what I'm doing. I definitely want a row up there. I better put this in the right place. I'm going to put it down here at the start of this column. I don't want those to end up being back in the same row, so I'll also put this in its own row. I won't bother giving it a height criteria because it'll expand the fill to 70% left over. Let's preview that.

For those of you who are getting bored at home, this is exactly the sort of thing you could ask Positron Assistant over here to do and then see how it does. I think it's another task where if you know the syntax for laying this out, it might actually be easier to do it yourself than try to explain this visual idea with English words to the model and see if it gets it. Here I have a nice small plot. I have a large area for my data set. I have my map. This is a nice looking dashboard, and I could host it like any other web page because it's an HTML file.

If I come back to the file browser, it's still the report to HTML file since I've been updating reports just overwriting this. I see I have some tag-along files now for this dashboard that I'd probably want to upload with that report if I'm actually putting this on the Internet so it all looks nice on the web.

Those are dashboards. The things to pay attention to there are changing the format at the top and then using the second and third level headers wisely and intentionally. The O2 dashboard file and solutions will give you what we have here. But we're about to jump files from report.qmd to app.r, so it's not that important to be caught up at this point.

Let me just walk back everything I just said. I guess actually you should be caught up here because we're going to try to publish this. Right now we have the last static file available to us. We're going to try to publish that while we're still in a relatively simple space of static files. That way when we get to these interactive apps, which we're going to build next, publishing them won't be such a large leap. The way we're going to publish them is we're going to share them with push-button publishing right from Positron to Posit Connect.

Publishing to Posit Connect

Posit Connect is the sharing platform that we have available to us on the server because we have all three of these on there. It's something that your institution or your company might have for you to share things with non-data scientists and decision makers. It's also available as a freemium service called Connect Cloud, which means if you don't have Connect, you can still share things for free on Connect Cloud. I say freemium because if you want to pay for a lot of compute resources or make it private and stuff, then you could pay for a higher level service there.

I'm going to go back to my app. I have a couple different ways to share this dashboard that I've made. When I say a couple, I mean exactly two. Two big ones to think about. One is if you use version control, specifically Git version control in Positron, we inherit the VS Code Git client. It looks like this branching icon. Right now there'll probably be a big red number in front of it.

This directory came from a Git repo. This number is telling us the number of files that we have changed since we last checked into that repo. 56 files. Most of them CSS for this dashboard. This is what the Git client looks like. You have a place to see the things you've changed. You could stage them in a GUI sort of way so I could add helpers to a stage commit. I could write a commit message. I could even use AI by clicking these circles here to write the commit message for me. It's telling me it added this new function. I could also do all the GitHub stuff you do, like push and pull and fetch and all that.

Down below I see the commit history for this repo. I could dig into any of these and see what the file looked like in that version of the file. It'll open in Positron and it'll show me that Git diff we were talking about earlier. In this case it really is just a Git diff. We'll close that up. Down here at the very bottom I can see which branch I'm on and I can click on this to create new branches. I'm not going to go into Git, but if you're interested in version control, I do think it's the way to go. This Git client here comes from Visual Studio Code, which is made by Microsoft, who owns GitHub.com, which is the largest collection of Git repositories. VS Code probably has the largest collection of Git users using it. I think we'd all feel pretty confident this is as good as it gets for Git clients and it's here available for you to use.

That would be a way to share your code somewhere, so it's kind of like you're publishing your code. But how would you publish that dashboard? Well, for that let's use the second method, Posit Connect. We have push-button publishing to connect through the Posit Publisher icon at the top of your file. It looks like a circle with an arrow going into it. You'll find it at the top of your file and you can click on that to start the process of publishing. You could do it with me or you could just watch if that's where you're at.

I'm going to click it and I'll show you what it looks like. If you want to follow along, go ahead. If not, it's OK. The first thing it asks is, do you just want to publish these static HTML files as they are or would you prefer to publish the source code? I'm going to select the source code and that's going to have some benefits once it's published. I'll show you what those are. I'm selecting source code and then it asks me for a title for this when it lives on Connect. I'm just going to click enter. That looks like a great title. It borrowed it right from my metadata. Now it says, do I want to publish to Connect Cloud or Connect? Well, on the server we have an instance of Connect.

Source code, Cancer in Georgia, Connect. There we go. It found the Connect instance. It scanned my server and said, oh, look, there's an instance of Connect. Is this the URL of Connect that you want to publish to? Yes, it is. Say enter. Now it says, okay, I just need to make sure you have permission to publish this instance of Connect. Connect won't let just anyone publish their stuff here. It's very easy to prove that. I'll click token authentication. It says, okay, I'm going to open up Connect. Can you sign in? Here's where you need to remember your sign in from before you sign in with the account you made. That shows that, well, it tells Connect who you are.

All right, signed in correctly. And then connect says, hey, someone claiming to you is trying to publish this account. Is that actually you? If so, let me know by pressing connect. I will. It says, okay, that's great. I'll let it through. And then I can go back over here. It says success.

And now we're ready to organize what we published. So before it sends everything over to connect, it stops. It says, why don't you collect the files you want to put together to give connect? And our dashboard here has a couple of files that it uses. So it's not exactly self-contained. It's reading in this Georgia cases file and it's sourcing this helpers.r file. And connect will get what I send it and try to rebuild this dashboard. And if I want that to work, I better include those two files because they're going to be necessary.

So over here in the sidebar that it opened for me, I could just select Georgia cases and helpers.r. Report.qmd is already selected. And then it's selected some files that I created to tell connect how to rebuild the packages I'm using in the version of R I'm using. So connect can run this code for me. Once that's already gone, go click deploy project. Then it's packaging those things up. It's going to send them over to connect. Connect is going to run the code to make this dashboard. So it will have its own version of the dashboard to give to the world. And then it's going to host it at a URL that I could give to folks that I want to visit the dashboard. They will not need to have R or Python or anything like that running. They'll just need to have a web browser and a login to connect if it's a private asset.

Now it tells me the deployment was successful and I could click this to view content or over here in the sidebar it also is giving me a view content link. So I'll click that. I'll open up and connect. And now if you're following along, hopefully you'll see what I'm hopefully going to see which is our dashboard. There it is. This is what it looks like when you're the author. You have some settings up here that you can use to configure this. People who visit it will see just the standalone version which looks like the app without that stuff up there.

So you can give this URL to someone and they could come here and they could use this to monitor cancer hotspots in Georgia or what have you. There's some interactivity here because Leaflet is an interactive HTML powered map and data tables is an interactive HTML powered data table. ggplot2 is not interactive. So nothing happens with that one. But for all these, the nice thing about dashboards is you can expand it and sort of see it on a bigger scale.

But let's go back to the author view. With the settings, we could do things like decide who specifically has permission to see this or we can make it open to the world. We can also, if we don't like the random component of the URL, we could give it some sort of stub that it will use here instead. And then the cool thing about publishing the source files, we'll discard those changes, is it now has all the code it needs to rebuild this whenever I want it to. And so I could ask it to rebuild this dashboard on a schedule, which makes sense if my code is reading in a data source and it gets updated from time to time.

So for example, I could ask Connect to maybe every week on Monday or maybe Sunday night, I don't know, to rebuild this. And then it will collect the most recent version of that data file when it rebuilds it. And so when people come in Monday morning, all that week, they'll be seeing the most updated data for that week. And the next week it'll update again. But I will never have to do anything with that because I've handed it off to Connect and made it automatic. Connect will monitor that till I tell it to stop. I can even ask it to email me or someone else each time it does it, so I could see the report that it makes or see if it ran into any trouble or something.

But I will never have to do anything with that because I've handed it off to Connect and made it automatic. Connect will monitor that till I tell it to stop.

But let's come away from Connect. We'll be back here a little later to configure AI stuff, but let's get to the AI stuff.

Introduction to Shiny

Let's talk about Shiny. Shiny is a wonderful package for R. You've probably heard of it before. Maybe you've used it before. If not, I'll tell you what it's all about. In the past, I'd spend half a day to a whole day teaching you how to build your first Shiny app. In the age of AI, that's no longer necessary. Shiny is also available for both R and Python for those who said use Python. But today, we're just gonna use R.

So like Quarto, Shiny has a wonderfully well-documented webpage at shiny.posit.co. Anything you wanna dig into deeper here or things we don't cover about Shiny today, you can learn here at shiny.posit.co. And what we'll focus on today is understanding the concept of Shiny and being able to interpret and check up on what AI writes when we tell it to write Shiny.

So the way you think of a Shiny app is it is a webpage, a webpage that you're gonna build only by writing R code or Python code. And that webpage will have two classes of things. It'll have what I'm gonna call inputs. And these are widgets your visitor can manipulate with their mouse to provide values to the app. Those are things like dropdown menus, sliders, calendar date pickers, check boxes, the things you've done on the web with your mouse, you could do them in Shiny. The second class of things are outputs. These are things that you can build with R and they will appear for the user. The idea is the outputs will use the inputs that the user provide to give that user a custom experience.

And we're gonna make that make sense. It comes together in actually a pretty elegant way, I think. So if the user changed the dropdown menu to stomach cancer, it would show the hotspots for stomach cancer in the map. But if they change the dropdown menu to an other site, that's just like a catch all other types of cancer, the map would update to show that data. Same map, same place of the app, but it's just changing every time the user changes their input.

Building a Shiny app with AI

How do we do that? Well, we're going to do it by starting with this code. I like to tell people to think of this code as a boilerplate or a template for making a Shiny app. Every Shiny app you write will start with something that looks either exactly like this or very similar to this. And I've made it available to you in the app.r file. Let's go visit the app.r file and run it and see our first Shiny app.

So if you come to Positron, click on the file explorer up here and then find app.r and open it. You can start closing the other files if you like. This is the same code that was on the slide. And I consider this to be like the minimal Shiny app. When you have a Shiny app open, you can run it and preview it in Positron by clicking the play button at the top of the file. So press that play button and let's see what it does.

All right, that's what it does. It makes a blank Shiny app. But notice the one thing it doesn't make, it doesn't make an error message. This works, this is a viable Shiny app. This is what it looks like as a sidebar with nothing in it, as a main panel with nothing in it. It's waiting for us to add things to it. But one thing I wanna draw your attention to, there's a red square here, that is a stop button because there's now a R process on our server or our computer watching this app, waiting for the user to change an input so the computer can then change the outputs for the user. And every Shiny app is built like this. There is a computer watching the app, ready to serve the user by updating the outputs. We call that the server. And oftentimes it is a literal web server, but when you do it locally, it's your own computer.

What I found is working with Shiny apps, sometimes you do this and you forget that's running and you try to run some code and it doesn't run because where you try to run it, it's busy looking at the Shiny app. So it's good practice to always stop this after you've previewed the app. And then you know everything's ready to do new stuff.

So you know how to make a blank Shiny app, but that's nothing to write home about. Let's, we just did that together. Let's dissect this and sort of see how to add things to that app. There's two parts of this file where we will be adding stuff or expecting AI to add stuff. And that is here in this UI object. This is where you put the things that will appear in your app. So that's where you'll put your inputs and your outputs. And then down below, this is where you'll tell the server computer how to build those outputs from the inputs.

And I like to think of it with a restaurant analogy up top this UI part, that's like where you create a restaurant for people to come in and order things off a menu to also put there for them. Then down below, this is like where you write the cookbook to give to the kitchen staff. So when someone orders something off the menu, they could look up how to make it.

So there's your crash course on Shiny. The rest of the course, we'll do better if we have an app. So let's ask AI to build a Shiny app for us. I'm going to copy and paste this prompt into the chat and you can copy and paste it in Positron Assistant, which I will do too. So what I want you to do is find Positron Assistant again and we could keep the same session going and we could just paste this in there and let it start working. And then I think that'll take about three minutes. So I'm going to check in on the chat again.

So I now have a Shiny app that is interactive and I can look at the different types of cancer one type at a time going through this Shiny app. So that's pretty cool. Let's just take a little peek to see how that's wired up so you could understand what the AI did and if it did a good job.

There is a solution in 03app.r that's where I'm going to draw from for the next few slides. But to be honest, I have no idea what code my AI wrote or your AI wrote. It is different every time but not probably too different. So let's start with the UI side of things. It put an input in there. How does that happen? Well, you can put inputs in the UI by using a set of functions that all begin with something followed by input provided by Shiny. And for each of these functions, we're going to pass it the name to save a value app. Pay attention to that. We'll look for it in the code.

This is a group of functions provided by Shiny for adding different types of input. And you can see they each added, you know, a different way of interacting with the app. We're going to use select input. That's the dropdown menu. And then you might find other R packages that add additional input functions you can use. Each input function follows the same basic syntax until it gets to arguments that are specific to that sort of input. But the most important thing will always be the first argument because that's the name that's going to save the user value under. You get to choose that name. You'll have to look it up later. So here I chose site. So when that dropdown menu is set to stomach, the value of site in something called input will be stomach. Shiny will collect all the inputs, as many as you want, you can add to the app. We'll put all their values in the same list named input. So here input site is how I look at that value. If I change the dropdown, the value of input site will change as well.

We don't have annotation working. So look at this for a second and see if you could spot where the select input function appears. I'll show you, it's right there. And what I want you to pay attention to is it's inside of the sidebar function in our template. For this UI stuff,

it's really simple. Where you put the function that calls the input or the output determines where that input or output will live in the app. If you put the function inside the sidebar portion of the template, it will live in the sidebar. If you put it outside of the sidebar portion, that'd be that function sidebar there. It will live outside of the sidebar.

So our dropdown menu in this case will be in the sidebar and it's collecting a value called site. All right. So the next thing that we know in the UI is outputs. And it's almost the same thing, just wash, rinse and repeat. But this time we're using a set of functions that are called output functions. They also go in the UI. In this case, we'll probably put them outside the sidebar functions so they're not in the sidebar. And they also take the name of an object. This case will be an object we define somewhere else, but that's how they're gonna know what object to put in that plot space they're making.

So over here, this is a list of the standard output functions that come with Shiny. You can learn all of this on the Shiny website if you wanna look any of these up later. And they all have the same syntax again, some sort of output name, the name of the object to place in that position. And then some of them take extra arguments based on what they're doing.

Right, so here are output functions. We have three of them. They're all outside of the sidebar. And you can see there's some layout functions that AI has put in here. I'm not gonna go into that. AI can do that better than a human can. It's very tedious, but straightforward. And you can learn about it on the Shiny website.

Now look, each of the outputs is gonna play something with a different name. AI has unimaginatively named these things map, plot, and table. That's good. That'll let us see exactly what it's talking about. Now let's look at how map, plot, and table get built because on the server side, we're gonna build these things. So here on the UI side, there's something for leaflet output and plot output and DP output to put into the app.

Server-side instructions

All right, the server instructions. This is what these look like on the app. On the server side, we're building up a list called output. And each of the elements of that list will correspond to something that we're placing into the UI. The UI is gonna look for this output list and those elements to build out the app. What we're putting into those elements won't be objects per se, like you do with regular R code. There'll be recipes of code that gets stored there for the Shiny server to pull up and run when it needs to run them.

And if I asked you to guess which of these elements here would store the code that makes the output object named map, you could probably make that guess. It will be the output map one. The one that stores the code that makes plot will be output plot and so on. So the names that we choose up front we're gonna use throughout the app. They're gonna be what connects everything together.

Now, how we package those code recipes up involves one other set of functions. They're called render functions. We write whatever code we want, but we wrap it in the render function and save that into the list. And the render function we use will depend on the type of thing we're saving. And we actually already saw the render functions. They were on the side, we didn't see them, but there's a render function that pairs with every output function. So if you're gonna place a table, you're gonna render a table first. If you're gonna place a plot, you're gonna render a plot. And then you have weird ones where the names don't match up, like render print and verbatim text. I can't tell you what that's about, but those things go together.

All right, so this is what our code looks like. We have these render functions. They're stored into the output map and inside those render functions, it's just the familiar code we've been using all along that makes these things. But notice the dataset they're all calling or using is filtered. And filtered is something that we're also making on the server side.

If you think about what filtered might be or how that app works, you might strike upon it pretty quickly. We're using the dropdown menu to filter the dataset. And then we have three separate things that work with that filtered dataset. So it's a good idea to just make the filtered dataset and have those three things borrow it. But that filtered dataset, bare and of itself doesn't appear in the UI. It's just something that floats around on the server side so we can make the three things that do appear in the UI. And that's what filtered is here. It's an intermediate object that we're making for the things downstream to use. And this object uses that input value. So here's where I'm looking up the value of the dropdown menu. And then the things downstream are gonna call this object like a function. That's just how Shiny works.

But when they call it as a function, they're gonna end up running the code that's in it, which is gonna call input site. And it's gonna look up the current value of input site. And then that will be used to build the new value of map. And that's how everything always stays up to date. When it gets rerun, it's scooping up the current value that the user supply.

The magic here with Shiny is, as long as the code that you use to make map, plot and table depends upon one of the input values, either directly or through a reactive intermediary, Shiny will notice that and keep track of it. And then it will take care of all the rest. It will make sure it rebuilds the things that depend on an input when that input changes. And if something does not depend on the input that changed, it won't bother rebuilding it, it's very lazy. But every time the user makes a change, everything that depends on it will be updated. And the user will be quite happy with this fast, nimble, reactive experience. And that is how you build an interactive website with only our code, in our case, and like half an hour max.

And that is how you build an interactive website with only our code, in our case, and like half an hour max.

Adding an AI chatbot with querychat

Imagine doing that 10 years ago, it'd be crazy. So we now have this Shiny app, it's interactive. What we wanna do now is the masterclass of adding an AI chatbot to it. And it's going to be easier than you think. It's certainly much easier than learning to build a Shiny app would be.

So we have an app, it has a dropdown menu. Our goal is to take the dropdown menu and replace it with a chatbot. The chatbot can do the filtering better than the dropdown menu and the rest of the app can respond to that. The package that makes this easy, that lets this happen, is called querychat. It's an R package, it also exists as a Python package. It's designed to work with Shiny to make these chatbot-powered apps.

Like everything else that Posit makes, it has its own documentation page. And everything we're gonna do here is actually written up as articles on this documentation page. So it is a great resource to go back to. And I will put these links at the end of the talk so you could take a picture or screenshot of them. You don't have to worry about them right now.

Here's how querychat works. I think it's pretty nice. It's going to talk to an LLM, like Anthropic today or some other LLM some other day. And it's going to tell that LLM that it wants it to just write SQL queries. It'll give the LLM everything your user asks for, but it will also instruct the LLM to just write SQL queries that answers the user's questions. So it doesn't give the LLM your data set. It takes your data set and it puts it into an on-disk database, either made with DuckDB or RSQLite. And so there's a database that's now sitting right next to your Shiny app. When the LLM writes a SQL query, querychat will give that SQL query to the database, which will then run it on the table. The database is going to be just great at running SQL. And the LLM is actually going to be pretty terrific at writing SQL because that's one of the tasks that LLM's large language models have been the most trained on, so they're quite good at it. Database runs that query. If it doesn't work, it tells the LLM to try again.

When it does work, it returns the filtered data set, the data set filtered by that SQL query. And then everything downstream can use that filtered data to do their job of showing a map or a table or whatnot. And this is how you let the LLM drive your app without sending your data back to the LLM because that would consume a lot of tokens.

This architecture has some real benefits. One, it's quite reliable. The database is running SQL. The LLM's writing SQL. If anything hallucinates, it just tries again before it messes anything up. It works pretty well. I mean, it's well-tested at this point. Two, it's safe for a number of reasons. One, querychat won't let the LLM run or it won't let the database run destructive queries. But even if it did, that database was spun up at the time the Shiny app was launched. The next time someone launches the Shiny app, a new database will be spun up. Like, there's nothing permanent that could be destroyed here. Third, it's reproducible. You can give, your user will have access to that SQL code. You can even feature it inside your Shiny app. So if they wanted to, they could then collect it and rerun it, use it later.

And then four, it's kind of private in that the LLM doesn't ever really see the raw data, but it can query the data to give the user responses in the chat. And that can certainly reveal things about the data. So it's definitely not, like, secure in and of itself. This is a situation, a lot like Positive Assistant itself, where you would want to trust the model that you are using. And what we've discovered working with clients, working with AI for about a year now, is there's no way around trusting the model. If you don't trust your model, don't use it with your data. And it is very easy these days to secure a use of a model that you can trust. The model providers want to make the models trustworthy and secure, because they know no one would use them otherwise. So normally people just arrange with Anthropic, whatever, to have their data not be used.

Setting up environment variables

So here's what the articles look like, but I'm here to tell you how it works. First, how is querychat going to connect to a large language model? It does this through something called environment variables. You may have used these before, you may not have. They're variables that you set in your R session or your Python session. And your code, in this case, querychat's code, knows to look them up by their specific names. You can share that code on GitHub, and what's in these environment variables is not in the code, so it's safe. If someone takes that code and tries to run it on their computer, it will look up these environment variables on that person's computer. If they've set them with a value, it will find those values and their code will work for them. If they haven't set those values, it'll say I couldn't find that environment variable. But it's a way for you to write code that uses secrets that will never get shared with the code because the secrets kind of stay with your computer when you share the code.

So the first environment variable let's go look for is called querychat and querychat client, I should say. And that's going to be set to a model provider slash model pair, and that tells querychat which model provider and which model it's going to use. So that's pretty straightforward. And then the second one will be the credential for that model provider that proves that you actually have an account of the model and that the queries will be sent to that account.

I have a key I'll share with you in a second. We'll use Anthropic today because I find it to be one of the more straightforward ones. All of this is piggybacking on another R package called ellmer, which is more low-level tools for using chatbots and LLMs inside of R or Python. And to look up your options and your details here you'll need to go to the ellmer package, it's Google it, go to its documentation page. And then it has many functions, models, underscore, and some model provider. Just look through the reference for the model provider you're interested in using and then read the details about how to work with that model provider. And it will tell you what sort of Anthropic API key or GitHub pat variable or what variable you have to set to work with it.

So today we're going to set this up. We're in the final 10 minutes. So instead of setting up once for our local session and once on connect, we'll look at how querychat works and then we'll just set it up on connect. If you want to set these environment variables up in your R session, I think the best way to do it is to create a .r environment file. That's a hidden file. And if you have one on your computer under your username, every time you start a new R session, this file will be run. It's a great way to set up all the environment variables you depend on. And when you're working with Shiny apps, every time you launch a new Shiny app, it gets started in a fresh session, but this R environment script will be run at the start of that session too. So it gives you a way to insert some variable definitions in there before the Shiny app really starts going. It's a hidden file. So it's kind of a pain to work with, but luckily there's a package called use this, which comes with this lifesaver function. It's called edit R environment. If you run that code at the R command line, it will find and open your .r environment file. If you don't have one yet, it'll just make one and save it in the right place. And that's how you could shape your variables.

Integrating querychat into the app

This is what we'll set. So I'll come back to that in a minute, but you can see I have an Anthropic API key, which will work for at least today. So we'll be good. By the way, don't share your Anthropic API keys. That's a bad way to lose money. Also, I found out the hard way, if you put them on GitHub, Anthropic's actively scanning GitHub and it immediately disables it as soon as it sees it there. So you can't share them with students over GitHub.

Before we get there, let's talk about what querychat's going to do. So you could set up your apps to use it. There's five lines of code you have to pay attention to, just five. One side is our app, the way it may look right now. And the other side is the solution file of querychat. And we're going to play spot the differences because we'll be able to spot the five differences.

So if you scan through it, the first difference you'll probably see is in line five. We're going to use functions from the querychat library now. So we're going to need to load that package with library querychat. So first difference, load querychat as you load your app. The second difference, it's on line 10. We're going to set up querychat with this querychat function. We're going to save the output to something to call later. And this is what querychat does. We pass it the table to put in the database to start filtering. And we can also set up a greeting that's just waiting there for the user when they first log in as a chat bot, you know, it should say something to them just, and that's what the greeting would say. Now that's going to return a list of elements that we're going to use throughout the app. So we want to save that list. And we do all this before we even set up the UI and the rest of the app. This is sort of like sourcing a file. And so we do it in the same place.

What's inside that list are three main things. QC sidebar is a function that sets up the chat bot sidebar. QC UI sets up an isolated chat bot that you can insert elsewhere in the UI if you like. We're not going to use that today. And QC server creates a new list of things to use over on the server side. So we'll come back to that. Let's use sidebar right now. So here, all the stuff, you know, I actually do have my tools, so I'm going to use this. All the stuff over here in the sidebar, we don't need that anymore. querychat's going to take care of it. So we're just going to replace all of that by just send the sidebar equal to querychat, QC dollar sign sidebar. That's how you get the chat bot into your UI. Bam, taken care of.

Now let's look at the server side. So this is the bottom half of the app server side. We're going to do two things over here. First, we're going to call that QC server, which will give us a list of new values to use on the server side. And I have to call it inside the server function because this is something that's going to run each time we launch the app over here in the server function. What's inside of QC Vals are three more things. One, there's the filtered data frame. So this is how I could call it from within the app. QC Vals or whatever I save this output as, dollar sign DF. And then there's actual SQL query if I want to display that within the app. It will get displayed in the chat side anyway, so I'm not going to use it today. And then title is a shot that the AI takes at writing a title that describes whatever filter is in place right now. So you could maybe set the title of your plot or your table to that, and it will kind of intelligently give it a title that seems to explain the data it's showing. I'm not going to use that today either, but we will use QC Vals, dollar sign DF. And the way this app is set up, it's so simple. Everything is just pulling this filter data set. So up here we have filtered and these things just call filtered. So I'm going to let them keep calling filtered. I'm just going to change how I make that filter data set. I'm just going to make it by calling the filter data set that querychat gives us back. Once you make those changes, that app will work.

Deploying to Connect

All right. So our job is just take that code that's already been written in querychat 4, paste it over what's in our app and run it. And let's do that together, and then we'll put it on Connect. All right, so stopping this, which I forgot to do earlier, closing that. I have App.R, about to paste over all that. Finding solutions. solutions, and the fourth one, querychat app. I'll just go select all, copy, find app again, select all, paste. So now it's set up to use querychat. And then if I were to run it right now, I don't think it would work because I haven't set up the model. I'm going to immediately publish this to Connect. I'm going to click Connect, or Publish. I'm going to go through the process. This time it remembers who I am. Sorry, it's signed into Connect, so I don't have any trouble. But I do need to remember that it uses helpers and it uses Georgia Cases. I'll deploy my project. And if you're following along at home, I mean, feel free to go through this if you like. But at this point, I'd say sit back and let's watch it together.

So it's running everything, sending the bundle over. It's taking maybe 30 more seconds. There, it's successful. If I view it over here on Connect, we have a nice-looking app with a nice-looking sidebar chatbot. But if I were to do something like show me cancers of the digestive tract, it's just going to run forever because I haven't set up a model for it. And this is what it means to set up environment variables. It's being run in this Connect environment. It can be run in a local environment. Whatever environment it's running in, querychat is trying to ask the environment, hey, what value do you have on this server for querychat client? Right now, Connect's saying, I don't have any value for querychat client. But if I go to settings here in advance, there's a section where I can add environment variables. This is how I'll do it. I'll add querychat underscore client.

Now I need to find the value for it, which is conveniently in the slides a little ways back. So I'll tell it to use Sonic 4.5 this time. And I'll paste that in there. So I add that. Once you add a value to Connect, and I guess I suppose anywhere else that you're adding values that's worth its salt, no one can read it. The code can read it, but it's just not accessible anymore to humans, even my collaborators. They could come back and edit it by overwriting it, but it's truly a secret. And that is good because I'm about to put something that bills to my employer's account in here. Go add that, and then I'll save these two variables.

So now it's relaunching the app, but this time when querychat in the background is trying to look up these values, there's a value that it will find. And if my IT friend gave me a real live API key, this should work. So I'll say, show me only cancers related to the digestive tract. Okay, it's working. It's holding my breath. And this is what querychat looks like. So it's showing me, you know, this is the SQL query that it wrote. So it decided, like, if it's related to the colon, rectum, esophagus, stomach, all this stuff, that's related to the digestive tract, that would have been hard to do with a dropdown menu. And it filtered the data, everything updated to show that. And then it's even saying, hey, based on what you're looking at, maybe you want to, like, look at the difference between males and females. It's giving me suggestions. I don't have to take them, but I can go pretty far by taking them.

So I'm going to take those suggestions. And now we see the next thing querychat can do. In addition to updating the data, show me the total number of cancers by digestive tract. It can compute simple things about the data and show me those in the chat here. So here it's running a SQL query to make this little table and shows me this. Okay, so there's a lot of fun there. And if you have this setup, you could play with yours over the break or whatnot. I know we just barely got to the cool thing. We're not going to have a chance to play with it. But we got to it. And you saw what was involved in all of this.

Okay, so there's a lot of fun there. And if you have this setup, you could play with yours over the break or whatnot. I know we just barely got to the cool thing. We're not going to have a chance to play with it. But we got to it.

And what I want to make sure I give you is this slide here. So these are all the resources we went through today. And if you want to learn more about them, go to those websites or take a screenshot of this. And this is where you'd go. And then I should put this on the same screen. But if you want to go even past where we went today, this is definitely the next step. The low-level tools for R about how to work with AI. Now, I was told to stop at 55 past the hour. And look at that. It's 55 past the hour. So thank you so much, everyone, for your attention and going through this with me. I hope there's something in here that sparked an interest that you can build on afterwards. And yeah, so I'll pass it back over to Daniel. I think he'll lead us into the break.