Peace, mercy, and blessings of God be upon you. How are you all doing? I'm Engineer Wafaa from the Computer and Control Systems Department, Faculty of Engineering, Mansoura University. God willing, we'll continue our code for Chapter 2, this is part 3. The code we have now is the first line, which is from Scan.base. We have a baseestimator. This baseestimator is considered the main interface for AskLearn. Its main purpose is to automatically create hyperparameters. It can also create parameters, and these parameters are also a special part of it. In the optimization stage, we use tools like Grid, Search, CV, which we can use to access the class's features, such as Add, Bed, Room, and Spy, like this part. This, of course, allows us to change their values automatically and also to perform a test on our acuity model. And of course, these tools are very important for our work. If we try to use external tools to view things or modify the settings of our class, we can't use anything other than the part we're working on, like the base estimator. Then we'll see the Transform Mermix. This is the class that provides functional extensions for your class. For example, it's based on the principle of "don't repeat yourself," which is short for DRY. So how does it work? It tells you that we need to perform two consecutive operations. If we need to, naturally, work on the data first and then transform it. So instead of writing a third function to manually combine the Fit and Walter Transforms, it tells you that the Transform... The Transformer Mixin is a transformer that performs automatic implementation of our fit transformer. One of the most important aspects of the Transformer Mixin is that it provides efficiency in your performance and ensures that the class follows, for example, the standard API for your library. This allows for very smooth integration within the B-line, making the process much simpler. For example, now we have created a new class. This class should enable me to use additional features such as ` get parameter` or `set parameter`, which are very important because they automatically adjust my hyperparameters. For example, `for example`, `for example`, `grid search`. The Transformer Mixin gives me the ability to use a function within the transformer that can perform automatic transformations. At the same time, it's performing a data transformation. So, for me, I've now prepared the data and also performed a transform once I defined its function. This line of code, which is the room, `x`, up to, for example, I also have the house, `household`, `x`, `3`, 4`, 5`, 6`, 7`, 5`, 6`. This is supposed to perform a selection or sort my columns in my Lampi array. For example, I already have that the transformer here is what it does. It deals with the Nd array, and therefore it only deals with the data frames. It only uses my arrays, and therefore it uses my column numbers instead of me mentioning, for example, the column names. I only mention the column number. Then this line says `define initial for self-advertisement, per-rooms, and equals true`. What does this mean? It means that I am now dealing with, i.e., making a definition. I have only one parameter, and its purpose is to control when I add a new feature, like the Bedrams in the room. I want to see if, as we know, this is called a hyperparameter. I use it to prepare the data, and it helps us later to test whether this feature will improve the model's performance and increase its efficiency. Oh, here I also find, for example, the code that is Definite Fit, and I have Self, X, and Y equals Non-Retard Self. What does this do for me? This is supposed to be the transformer; it performs the calculations without needing any data parameters like the mean or median. This function is supposed to do nothing but return the object itself. It exists to fulfill the conditions of the ASCII program (ASCII Learn). We'll also find, for example, that the transform function can calculate new ratios using my grid division. So, for example, I can calculate the number of rooms per residential unit, or the population density per unit. It can also help me calculate the ratio of bedrooms to the total number of rooms. This is done optionally based on the Boolean values I have during operation. We notice, for example, that in the code, ` lnb.c`, it's supposed to perform a combination or merge between the x values. The original data and the new columns that appeared and were calculated and displayed in my array. Here, in the excisation stage, we'll notice that we created the object while disabling the addition of the feature for the percentage of bedrooms. We were also able to retrieve the housings and pass them through the basic array of my original data to the Transform function. We put them in the Transform function so we could produce our new array containing the added features. After that, we entered this code, which is supposed to provide some security for my part. Its aim is to avoid errors that result from manually specifying data locations. We can find, for example, column names. Here, we were able to... We define the names of our columns or features that we want to perform the calculation on. Instead of dealing with numbers, we deal with strengths, which are the names of the actual columns in our data frames. So now I've said `equals Total Rooms`, `Total`, ` Pollination`, and `Household`, which are the four features we're working with. Next, we'll calculate an index using `Household`. `Column` gets the location of the column. What does this do? It gives me the index value of the column. So, for example, if I have the Total Rooms, which is column number three in the table, it will return the number two. The index starts from two. The value for me will be `in` if I want the Total Rooms to be number three. Of course, we used this function. Or this function allows us to handle each parameter or feature. It can handle its index. We'll notice that the code here is better than last time; it avoids the hard coding. What I did was, last time, we used three, four, five, and six, so we manually selected the locations. Of course, when you add a new column at the beginning of the data or change the order of an existing column, the code might miscalculate its features. Since you manually entered them, a change will naturally occur. For example, the code might calculate the features based on the incorrect columns you initially set. For instance, it might calculate the number of rooms based on the longitude, so you haven't done things correctly. This code is much better in this regard, and the flexibility is also higher here. We used GetLocation, allowing us to specify the column location and implement its usage. We actually implemented it in runtime, and of course, this is all based on its name, so we didn't change the column order or encounter any problems. Our code will continue to work with higher accuracy, and our course here will be high. We can search for the correct column here by simply using its name and extracting its position. Of course, in the pipeline, as we know, the Combined Tertiary Address handles NomPy Arrays, so this will be very helpful for us. This line is considered the link between the Bands (names) and the NomPy world ( our pointers). Here, we find that the Combined Tertiary Address is supposed to be a NonPy Array. Also, here we find that this array 's data is actually row data, and of course, it needs a column name, such as metadata or row number, and so on. That's why we use the data. The frame, which is the pt.data frame, allows us to restructure our data to be more organized and transform it from an array to a table. Here, we're defining our new table schema. We're calling the housing column, for example, its list. We've also added the new features that will be inside the table, and these should be retrieved by the transformer. We also need to ensure that the number of names in this list matches the actual number of columns in my output array, otherwise, a significant error will occur. In the code, I kept the finished part, the housing.index, inside the index, and then displayed it to see what happened. So, we'll start to notice here that we have the new table. We'll notice that this is the head section, meaning the first five rows. The code in front of us designs the production line, like our pipeline. We design the software so that we can process our digital data in the pipeline. This is supposed to be our pattern design, allowing for the sequence of operations so that the output of each stage is supposed to be the input of the next stage, and so on. And we'll notice here that it's now starting to mention our pipelines. You'll find that your pipeline states—we agreed before that there are three: simple, combined, and standard. The simple one is supposed to process my missing data, the combined one is where I can get engineering features to work on, meaning other features, and the standard one is where I... I'm using the values or metrics of my feature scaling, and then we used V Transport and performed two transforms on it. It performs a VET, then a transform to the first station, then passes its output to the next station, for example, calling the VET, then a transform, and so on until it reaches the final stage. Why did we use this design, or what are its strengths? First, we can create a module. The point here is that I can use this entire NumP line with any new data, with the same settings, just by pressing one button. Second, I can also maintain it. For example, if I want to change the strategy, fill in a specific data, or make a media shift to the right, or make a change to a single line within the door line, it's not the same. This will affect the entire topic, not just all my output. The issue will also be flexible because I have a leak prevention feature. I now ensure that all transformations are learned only from the training data and applied to my test data. This is the basic safety standard in machine learning projects. We mentioned the `nom` function we previously used in the code where we created the `vector` and `transform`. So, when we run it now, it should give me the `nom` by `in-ray`, exactly as we see it. Of course, you'll notice that the numbers here have changed from the previous table. So, things have changed a bit from the previous numbers. Why did it change from my original table? For example, here I have the score, or the part about values, which is the distance between the original value and the mean divided by the standard deviation. Therefore... Here we'll start by noticing that the values are shifting, for example, from negative to positive. This means that my data has been successfully scaled, and all features now have the same quantity and importance in our algorithm. Consequently, I've reached completeness; there are no missing values, and no numerals are present. The computer, of course, filled them all in before passing them to the next steps. Now, we'll notice that I have an AskLearn combo. I can now, for example, take a form equal to the list number. I've taken the columns containing the numbers and the columns containing the text, separating them. The pipeline code equals a column transformer. We've divided the column into two parts. What we have now is a process that will handle each part separately. So, here we have the full pipe line. I've already divided the columns into a number section and a text section. Instead of performing a sequential process on everything, it will process each part individually. Then it started preparing the housing. This will be my output array; it will be comprehensive and will include the number process and the convert text. Then it started displaying the result. You'll see it's converted, and everything is working fine. After that, it worked on the housing with Bearing.Chop. It started showing 16512 and 16. We'll notice that the number of columns is included, which is normal because we added a set of features to measure everything. This code is published on the Custom Selector Module, and this is from the versions. The old IceKeyLearning model did n't have the ability to automatically select specific columns in the data frame. So, we had to build the entire class to enable the gates that allow us to pass specific columns to each production line, as we do now. This is where something called a Hertense Pattern comes in. It's supposed to ensure that the class is compatible with ASCII, just like we did with the base estimator and transformer mix. We have the fit transform, and so on. The first code I have is the Define Initial. Its function here, in general, is to perform property analysis, passing the column names we want to extract, whether numeric or textual, to the template. We store it in the `self` parameter. Then we go to the function of the `self` function, which equals ` n`. So, the `data` function should perform a non-operation. It should perform a test on the columns. Since it doesn't require learning from the existing data, it only uses the existing data to maintain its structure. After that, we go to the `definite transform` function. We should now say that it performs data extraction, taking the entire data frame and indexing it using the list of names we defined earlier. First, we define the type of our function so that it transforms the selected part of the data frame from the pandaz to the numeric array. This step is very important for the subsequent stages in the pipelines, such as inputting or scaling. For example, you might expect NumPy arrays to become inputs later. Also, why are we studying this part? This class, the Old Data Frame Selector, was supposed to be the only way to process data and send all types of pop-ups to different pipelines. However, we're currently using the Column Transfer, which we explained earlier. This is supposed to perform the selection and merging tasks in one step, much faster. We understand this code so that if we encounter older systems at work, we understand the architecture they were built on and that they were manually separated before processing, allowing us to work with them. Of course, in this classic method, we also build two pipelines: one number and one text. So, for example, we've created an appendix list of house names and also an auxiliary exclusivity list. We started by saying the old pipeline, which is the pipeline I have, contains a selector, importer, return output, and est.d scaler. We started by combining them, and then we created a ground line, which includes the selector, CAT, and CAT encoder. In both cases, we worked on the numbering system and the object. After that, we'll move on to the feature union. This is also part of the ASCII. The feature union's function is to perform the connection/resolution of my operations. Here, it doesn't execute one step at a time, sequentially, but rather it creates the pipeline numbering system and the kit pipeline simultaneously. For example, when we finish a certain production line, a specific stage in its output, it starts creating the feature union. It combines it with the horizontal alignment, placing them side by side. Here, I'm telling it to use the `from-iskandar.pipeline- import-feature` function. Then I said the whole full pipeline is complete together, and the feature of the Transform Six of Pipelines and the Walk Pipeline is created. So, we combined them both. After that, we said that the Old House equals the Old Full Pipeline Transform House. So, we created the part for the House in the Transform, and then for the Old Full. Then, we put everything into the Old House, and then we started displaying it. The output appeared as shown. Then, in the next part, which is here, we compared the two and found that they are similar. The same elements that are here are also here. So, we finished the preparation stage and started moving on to our training stage. Here, we will notice the first thing: the Select and Train Model. In this stage, we are supposed to start preparing different algorithms to see what is most suitable for the nature of the data. Our model, for example, has code like this. This is supposed to be the baseline model, which uses the Housing matrix that we prepared earlier to train the model on the Housing predictions. I can calculate my errors. I can also calculate my existing errors using the RMSI, and this is its code. We'll notice here that it has an error matrix, which is called the RMSI. It's supposed to give us the range, meaning the deviation of the model's predictions from the actual prices in dollars. We'll often find that the error is large, and this is called underfitting. Of course, the error has a large value, so this is underfitting for us. After that, it will start entering the verification or cross-validation stage. This is the part of the cross-validation that will make my evaluation better by using it. Of course, I'm supposed to use the quelled cross-validation. Here, we divide the data into ten parts, multiply, and then... We'll set the parameters to nine and one, and repeat the process 10 times to get a specific average performance for our model, taking into account any potential coincidences. Next, the code will adjust our tabs. I'm doing a "find your model" function. After selecting the best model ( likely Random or Forest Select), we need to fine-tune its internal settings for maximum accuracy. This is the code for the Grid Search. My concept here is Hyperparameter Optimization. I'm having Python automatically try different permutations and compatibility settings. In this optimization process, it will find the most ideal configuration with the fewest errors.
ch2 in hands-on machine learning (project code)—part 3