Peace, mercy, and blessings of God be upon you. How are you all doing? I hope you're well. I'm Engineer Wafaa, a student of Control Systems and Systems at the Faculty of Engineering, Mansoura University. Today, God willing, we'll continue with Chapter 2 of the Hands-on Learning book. This is part number T of the project. In the name of God, here's the code. We now have our data, and we want to do some bartering. How? It's not natural to give the model the complete table with its data and results (the house price). If you then ask it to tell you the price of a specific house, it will just look at the table, cheat, and give you the output. It won't learn. So, we want to separate the medium house value from our table. The first code we have now is: House Equal Street Trin Tester Six drops, so I'm deleting the part about the Median House Value and Access Equals 1, which means it's telling me to create another column. This won't delete from the original table. If it were going to delete from the original table, I would have written it here. The In Place option equals True, but we didn't write it. So it told me it won't delete from the original dataset. Now, Housing, for me, is all the data except for the house price or average house price. So, the second part, Housing Labels, is where we'll put the part about the Median House Value and start copying it from the original dataset. So, we've separated it. Now we have the part about the test, which is the part about the house price ( result), and this is in a separate table. Housing also contains the rest of the data except for this part. So, things will be a bit more organized while we're working on the algorithm. Things are sorted out for us, and we'll see a few more things here in the code. We want to find the part of the table that doesn't have data. We also noticed that we looked at the total room's call and found that its record is lower, so naturally, there's a part without data. So, in this code, I say `sample in complete rose equals houseine`, and then I say `houseine dot is nol`. This `dot is nol` will start checking if it's a nol or not, meaning if it's empty. If it's empty, it will give me `true`; if not, it will give me `false`. Then it tells me `dot any access equals one`, meaning for any column, you'll start working. You won't work across the entire row, but rather across our column horizontally. So, it will start working. Its function is to look across the row horizontally. So, this was in, for example, this: does the entire row actually have a nol, or just a few? One cell in it and the head in. Now we will show the first five so that we can start to see our output. We notice here that the total basement has the not- nun. So now we are doing this step, why? We're doing this because, in the description, we found that during spot checking, we discovered a NANN. This is likely in the Total Bedrows column. So, we created this section to clarify and pinpoint exactly which cell it's located in. This is an option to help us solve the NANN problem. The first option is this code, which is supposed to remove rows containing NANNs. Therefore, in this code, it's supposed to use `drop NANN`, which is a drop of any data belonging to the library of the PANDAS column. Here, it's supposed to say `Install Equals Total Bedrows`, meaning it's telling it to delete the rows where the Total Bedrows column contains NANNs, not just any NANNs it encounters. They must belong to this column so that the data we're working with is correct. So, we've now deleted the rows containing NANNs in the Total Bedrows column. The total basement, then we'll move on to option 2. First, option 1. We only resort to it if the percentage of missing data is very small. So what is option 2? Option 2, for me, means I can get rid of the entire column, which is the total basement. For example, I tell it that this total basement contains missing data, so delete it completely. Is this preferable for us? No, of course not in our case, because the percentage of data loss is very small, so this would affect the outputs and our work. However, we might resort to this option if, for example, 70% of the data is missing, or if this column is generally unimportant to me. So, we won't work with this option, but we're just familiarizing ourselves with the available options in case we need them in any project later. Now we'll move on to option RE, which We can do this option by setting the values for our values, which are the non-digital data. So now, what will I do? I'll start by selecting the medium. I'll make the total of our drop-downs medium. Why did we choose medium? Why did n't we choose the average? Of course, if we have, for example, a value that is very high compared to the other values (the other values being normal, for example, 20, 25, 26, 30), but there's a value of, for example, 100, then in the case of the average, you'll find that its value will come out wrong. Of course, it won't be the actual average of the values. But in the medium, it's sensitive; it can understand that these are outliers, so it starts to get the average of the values as a whole. So, this is the first code I have to calculate the medium of this column first. And then, from the second code, I tell it to make a file for me. This file is supposed to If we put my name or the empty space, we put my field in its place, and we place an equalizer. We agreed previously that we put it, meaning we are modifying our original dataset. So, option 3 is considered the most suitable option for us right now. Option 1, as we said, depends on the data being very small, and it's not ideal either. The second option, deleting the entire column, is also not ideal because the column is a very important feature for us here. Therefore, option 3 is the most suitable option for what we need from our dataset. Here, we will use the AcLearn library, specifically the input section. From this input, we will take the simple input. What is the function of this simple input? Notice here, for example, that I have a strategy that equals The medium means that the emulator will tell the model that any empty cell in the table will start filling it with the medium of that column. It calculates the medium and then fills it accordingly. Why did we use this? We used a simple emulator, but it used more than the Fill Na in the previous code. You'll notice that Fill Na does the same thing, but the simple emulator is distinguished by the fact that it now saves the parameter values for all its columns. Therefore, any new value that appears in an empty cell, it already knows how to handle and fill it with the parameter. This is a unique feature of the emulator and will be more useful in datasets. If more than one new value appears, the problem will be solved permanently, God willing. This is another advantage of the emulator. Of course, this helps us with the pipelines. Now we can easily perform data cleaning and conduct tests and exercises in our work, resulting in a single pipeline. However, the filler is a manual process; we can't use it for the pipeline section. This code prepares the input/input work, but here, I'll work with the filler section, which we'll see shortly. We'll then move on to the drop section and see what happened. We'll notice that it dropped the ocean proximity data in the same row. This is because the code is supposed to output the houseine (the part related to the number) only, so it removed the ocean proximity data. We saw that it was a ram and will only work on that part in its process. Another thing he put here is that I can use the `housen.select` type. This is supposed to mean that it selects any column of type, for example, number, and leaves everything else. So, any data number, just get it and leave the data. Of course, I use this second method if I have a very large dataset with many text columns, and I don't know them all. Therefore, I only select the numerical data that I can work with later. So, we created `import.after` House Number. This is supposed to execute the code above, specifically its execution. This code is supposed to iterate through all my columns, calculate the access to each one, and execute it. As for `import.stats`, this is supposed to display an array containing the entire media set, which is the media of all the columns. I can see them and the output in front of me, and I can see what these values represent. The code that Under this, we obtained the houseine.number, then we started with the median.n values. This is to calculate the median manually. You'll notice that what the computer got is the same as what the computer got manually with the median. You'll find the same values: 118, 51, 34.26, 26. You'll find it's exactly the same data. This means the computer didn't make a mistake. Everything is clear, and the values are exactly the same. Then we say x equals the computer. Transform houseine.number. This is considered as if there are no values because there are no spaces at all. We start putting in all the values we memorized. We'll notice that this x is actually a number by array, which is the array containing the number. It might not have column names or the recognized table format, so this is better, of course. But of course, because the model prefers to work with a well-organized table, we... Now, in this code, we're creating a housing TR where we'll store the data by converting it to a table. The data frame handles this, and for example, the columns are equal to housing name. This retrieves the names of the columns that were already there, and it starts adjusting the names. Here, we also preserve the row numbers, which are the indexes, so things do n't look messed up. Next, we'll start working on the location. You'll notice here that the housing TR, which is the part that's already complete, is a number and doesn't have any numerals or anything. It's looking for the sample index. So, if we look at the table that had a numeral, like this one, we'll notice that it's not included in the table's core, and my output is now a number. The table is also organized and looks good. The world is with me, there won't be any problems while working on wages. After that, we will find something called Inputr Tote Strategy, so if later you forgot the strategy that the Inputr based on, then while you were installing it, as soon as you put this code, it will start showing you that you worked in Medium, or in Median, or in what exactly, so now of course we are, because we were working in Medium, and it came out for us. After that, I have a housing house TDR that is equal to a BD dot data frame. We agreed that the data frame converts the views that are the sleep bye into a table, so it starts adding columns and also a index. He said to me now, show me the header part of the house TR. So he started showing me the first part and we started to see. We found all of my vein numbers, so that's fine. We will also remember that we were We took the Ocean Proxomite and made it in a column on its own because it contained The object, so now it created a housing cutter and placed it in a separate section. It started displaying the first 10 rows of it, so we started to see its data. It's actually an object, as we see it now. Now we're supposed to move on to the second method. Notice here that in this method, it used the Ordinary Encoder, and this is the first method. So now I'm supposed to work on the part we created, which is the housing key. I'll start by taking my values, for example, 1 or ocean, inland, near bay, or near ocean. I'll start numbering these things. For example, 1 or ocean, in this case, it puts zero in its place, so now I see a value that is zero, and so on. So I put my values from 0, 1, 3, 4, depending on each word. I put a specific number in its place. This is a way I can work with it. Now I've transformed the object. My number is the part of the Ordinary Encoder. `categories`, so it shows me my array, which I created as zeros 1, 2, 3, and 4. These were their initial names. It starts telling me that I already have, for example, a number I put in place of `one` or `ocean`, so this is considered zero. `inland` is considered one, `island` is two, `nearplay` is three, and `near `och` is four. These are the values it created in its array and replaced with 0, 3, and 4. Based on this, it starts putting the number into the object, and everything becomes a mess. What objects do we have now? It's all numbers. The second method is that we used `encoder`, meaning we want to encode only one value, and the rest take another single value. For example, we put one in `l` and all the rest in `l`. Therefore, you will find Our arrays will initially be converted to 1 and 0, and this is sometimes preferable because, of course, there is no number greater than another. So, the ANZ is the one far from the coastal areas, while Near Ocean, Our Ocean, or Near Bay are all close to the coastal areas, so we set them to zero. Consequently, there are no longer different values or a value greater than another. We divided them into only two categories. Here, we notice, for example, the Housing Kit and Hot. This is what we supposedly created the Fit Transformer part in. The Housing Kit is where we created the One Hot Encoder part. So, if we were to present this part and tell it to create an array, it would convert it to an array already. But we already have something called Spares. Spares, if, for example, when I create the One Hot, then now, through the Spares part, it can get my complete array. It's also present, and of course we use it to transform any regular array. For example, we can't see it with our eyes, but we can use it in large projects. For example, we can't do it, meaning we can't provide a large amount of RAM, so we'll use this better. Then the category line shows me the order that my machine or model follows. So, for example, I have the island here, I have the Aninland, I have the Or Ocean, and I have the Nerri and the Need Ocean. All the things we worked on, for me here, only work with 1 minister, depending on what you specified for the ratio. So, it should be close to the coastal areas with one or zero, and the second value will be the opposite, depending on what you specified, depending on what the output values will appear to you. Yours is in what came out through the array, which is 1 or 0 in front of each word H
ch2 in hands-on machine learning (project code)—part 2