- Use cases
-
1. Preprocessing
- SageMaker Object Detection preprocessing
- Rekognition Object Detection preprocessing
- SageMaker Kmeans preprocessing
- Autopilot preprocessing
- DeepAR preprocessing
- Personalize preprocessing
- Select, drop or extract Columns
- Split dataset to Train and Test
- Upload to s3
- Forecast preprocessing
- Rekognition Classification preprocessing
- SageMaker Image Classification preprocessing
- Xgboost preprocessing
- Blazingtext preprocessing
- Comprehend custom preprocessing
-
2. Training
- SageMaker Object Detection training
- Rekognition Object Detection training
- Forecast training
- Personalize training
- BlazingText training
- DeepAR training
- SageMaker Kmeans training
- Comprehend custom training
- Autopilot Training
- Xgboost Training
- Autogluon training
- Rekognition Classification training
- SageMaker Image Classification training
-
3. Inference
- SageMaker Object Detection inference
- Forecast inference
- Rekognition Object Detection inference
- Comprehend custom inference
- Personalize inference
- Autopilot Inference
- BlazingText Inference
- Custom SageMaker model Inference
- DeepAR Inference
- Rekognition Classification inference
- SageMaker Image Classification inference
- SageMaker Kmeans inference
- Xgboost Inference
- Contribute a use case or contact us for help.
- Frequently Asked Questions
Select, drop or extract Columns
I am using a SQL database
Use SQL select before dumping out files for your ML project
SELECT column1, column2 ,column3 INTO OUTFILE '/tmp/out.csv'
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n'
FROM my_table;
Or, use Athena as shown here
Files are in a folder, and I like linux commands
The easiest way to do this is using the linux cut command. Suppose you have a file called in.csv
or a directory of files that look like
Folder
├── in1.csv
├── in2.csv
├── .
├── .
└── in2000.csv
… and assuming the delimiter used is a comma (,), and you want to select the first three columns, do
cut -d "," -f1-3 Folder/in*.csv > outfile.csv
If you need a specific list of columns, do:
cut -d "," -f1-10,20-25,30-33 Folder/in*.csv > outfile.csv
You can also use awk
on linux and MacOS to select and reorder columns:
awk -F "|" '{ print $1","$3","$5 }' Folder/in*csv > outfile.csv
Here, we read from a pipe-delimited file, select and reorder columns, and then write a comma separated file.
To remove additional characters, like a double-quote, add a {gsub(/\"/,"")};
before the print keyword.
How about in Python?
import pandas as pd
df = pd.read_csv('in.csv')
#Select columns named column1 and column2
df[['column1', 'column2']].to_csv('out.csv')
#Or select using column number
df.iloc[:, 0:2].to_csv('out.csv')