본문 바로가기

ComputerScience/Machine Learning

Deep Learning - 1.2 Data Preprocessing

728x90

In this chapter we will briefly walk through steps for preprocessing raw data with pandas and converting them into the tensor format.

  • Read dataset
  • Handling missing data
  • Conversion to the tensor format
  • Deletion, Imputation, coversion to Tensor

1. Read dataset

Before practice reading .csv file make artificial dataset.

Type of data is pandas dataframe, not tensor.

 

2. Handling missing data

NaN is missing value. Let's first split the data into inputs and outputs and fill input's missing data with other value.

For numerical case, we fill out the blank with mean value.

In the case of Alley column, we consider NaN value as category. Pandas automatically convert this column to Alley_Pave and Alley_nan. 

We call this imputation which replaces missing values with substituted ones.

3. Conversion to the tensor format

Now we changed all entries of data into numerical values. Converted to the tensor format, they can be further manipulated.

4. Deletion, Imputation, Coversion to Tensor

Make dataset and delete the column with the most NaN entries.

Replace all NaN entries.

Convert it to the tensor format

 

728x90
반응형