In this chapter we will briefly walk through steps for preprocessing raw data with pandas and converting them into the tensor format.
- Read dataset
- Handling missing data
- Conversion to the tensor format
- Deletion, Imputation, coversion to Tensor
1. Read dataset
Before practice reading .csv file make artificial dataset.
Type of data is pandas dataframe, not tensor.
2. Handling missing data
NaN is missing value. Let's first split the data into inputs and outputs and fill input's missing data with other value.
For numerical case, we fill out the blank with mean value.
In the case of Alley column, we consider NaN value as category. Pandas automatically convert this column to Alley_Pave and Alley_nan.
We call this imputation which replaces missing values with substituted ones.
3. Conversion to the tensor format
Now we changed all entries of data into numerical values. Converted to the tensor format, they can be further manipulated.
4. Deletion, Imputation, Coversion to Tensor
Make dataset and delete the column with the most NaN entries.
Replace all NaN entries.
Convert it to the tensor format
'ComputerScience > Machine Learning' 카테고리의 다른 글
Deep Learning - 1.4 Calculus (0) | 2022.08.03 |
---|---|
Deep Learning - 1.3 Linear Algebra (0) | 2022.07.29 |
Deep Learning - 1.1 Data manipulation (0) | 2022.07.27 |
AI - 13. Convolutional Neural Network (0) | 2021.12.10 |
AI - 12. Deep Learning에서 발생하는 여러 문제들 (0) | 2021.12.04 |