thảo luận [Python] Thread dành cho anh em Python

jiangcristian · May 30, 2023

Zayt__ said:
có gọi mà thím, thử curl là biết. Mình đoán nó là cái script này

up script lỗi nên thay bằng ảnh z
View attachment 1865424

Để em thử xem, cảm ơn thím.

thedino · May 30, 2023

jiangcristian said:
Em đang làm 1 cái hàm download file mà gặp cái url này không biết nên xử lý ntn, bình thường mọi khi gọi response = requests.get(url) là nó sẽ trả về file giống như mình tải file trên trình duyệt, đôi lúc bị chặn thì thêm 1 vài params vào trong hàm get đó là ok mà với cái url này

https://capaiankinerja.presidenri.go.id/arsip/lima-tahun-maju-bersama

em dùng trình duyệt vẫn tải về bt mà dùng requests để tải thì nó lại trả ra 1 trang html, status code vẫn 200 nhưng k có nội dung gì cả, các thím nhiều kinh nghiệm cho em biết tại sao với ạ

thử dùng postman chay coi ra dc ko bác

jiangcristian · May 30, 2023

thedino said:
thử dùng postman chay coi ra dc ko bác

Dùng postman thì nó ra trang như này bác, em thử lấy reponse.content thì kết quả cũng là html giống như thế kia

thedino · May 30, 2023

jiangcristian said:
Dùng postman thì nó ra trang như này bác, em thử lấy reponse.content thì kết quả cũng là html giống như thế kia
View attachment 1866050

vậy là web họ dùng gì đó chặn request truc tiếp rồi. để tối mình về coi thử

thedino · May 31, 2023

jiangcristian said:
Dùng postman thì nó ra trang như này bác, em thử lấy reponse.content thì kết quả cũng là html giống như thế kia
View attachment 1866050

sao link ko vô dc nhỉ

jiangcristian · May 31, 2023

thedino said:
sao link ko vô dc nhỉ

Cái link đó là link download file mà thím, click vô là nó tự động tải file về luôn.

Lucianz · May 31, 2023

peppermint said:
Cho e hỏi là trong python thì những trường hợp nào mình nên sử dụng class nhỉ ? Bình thường e code chỉ sử dụng hàm thôi thấy vẫn ổn, đây có phải thói quen xấu không?

Bao giờ cứ sờ vào datetime là 1 với mấy cái làm tròn rounded là 2 thì hiểu liền

Lucianz · May 31, 2023

jiangcristian said:
Em đang làm 1 cái hàm download file mà gặp cái url này không biết nên xử lý ntn, bình thường mọi khi gọi response = requests.get(url) là nó sẽ trả về file giống như mình tải file trên trình duyệt, đôi lúc bị chặn thì thêm 1 vài params vào trong hàm get đó là ok mà với cái url này

https://capaiankinerja.presidenri.go.id/arsip/lima-tahun-maju-bersama

em dùng trình duyệt vẫn tải về bt mà dùng requests để tải thì nó lại trả ra 1 trang html, status code vẫn 200 nhưng k có nội dung gì cả, các thím nhiều kinh nghiệm cho em biết tại sao với ạ

Cái này chắc có JS xử lí ở phía server rồi. Vì dùng request nó trả về trang 404 thế kia dễ là redirect có chủ ý. Dùng selenium thôi mai fen. Có thể dùng headless để giảm thời gian xử lí

thedino · May 31, 2023

jiangcristian said:
Cái link đó là link download file mà thím, click vô là nó tự động tải file về luôn.

ko có site thi mình chịu ko coi dc, mà như hình thì bi xử lý điều hướng link tải rồi

jiangcristian · May 31, 2023

thedino said:
ko có site thi mình chịu ko coi dc, mà như hình thì bi xử lý điều hướng link tải rồi

Site này thím

https://capaiankinerja.presidenri.go.id/arsip

Miracle-- · May 31, 2023

sao giờ python intern ít quá ạ , e lên search mấy trang web thấy có mỗi 1 chỗ còn đâu toàn tuyển từ fresher đổ lên
có apply thử mà chưa biết khi nào mới phản hồi
bác nào biết chỗ tuyển intern python chỉ e với

Violet_7 · May 31, 2023

Miracle-- said:
sao giờ python intern ít quá ạ , e lên search mấy trang web thấy có mỗi 1 chỗ còn đâu toàn tuyển từ fresher đổ lên
có apply thử mà chưa biết khi nào mới phản hồi
bác nào biết chỗ tuyển intern python chỉ e với

H phải hỏi từng cty có nhận intern ko. Nhiều khi ngta ko đăng nhưng vẫn tuyển :look_down:

thedino · Jun 1, 2023

jiangcristian said:
Site này thím

https://capaiankinerja.presidenri.go.id/arsip

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager

# Set Chrome options for running in headless mode
options = Options()
options.headless = True

# Initialize Chrome WebDriver
driver = webdriver.Chrome(ChromeDriverManager().install(), options=options)

url = 'https://capaiankinerja.presidenri.go.id/arsip/lima-tahun-maju-bersama'

params = {'behavior': 'allow', 'downloadPath': '[folder path]'}
driver.execute_cdp_cmd('Page.setDownloadBehavior', params)

driver.get(url)

web no điều hướng them cai ten file pdf nữa đó. Muốn nhanh thím tìm cách lấy ten file đó ra rồi chay cmd như dưới là dc

wget https://capaiankinerja.presidenri.g...019_Laporan-5-Tahun-Jokowi-JK_small-1--1-.pdf

BanhXe0_ · Jun 1, 2023

jiangcristian said:
Em đang làm 1 cái hàm download file mà gặp cái url này không biết nên xử lý ntn, bình thường mọi khi gọi response = requests.get(url) là nó sẽ trả về file giống như mình tải file trên trình duyệt, đôi lúc bị chặn thì thêm 1 vài params vào trong hàm get đó là ok mà với cái url này

https://capaiankinerja.presidenri.go.id/arsip/lima-tahun-maju-bersama

em dùng trình duyệt vẫn tải về bt mà dùng requests để tải thì nó lại trả ra 1 trang html, status code vẫn 200 nhưng k có nội dung gì cả, các thím nhiều kinh nghiệm cho em biết tại sao với ạ

Gwet gay curl vẫn down dc bthg nha fen , fen coi thử trong request có params gì đặc biệt ko .Chứ mình ko nghĩ là nó block requests của python đâu.

BanhXe0_ · Jun 1, 2023

Lucianz said:
Bao giờ cứ sờ vào datetime là 1 với mấy cái làm tròn rounded là 2 thì hiểu liền

Mình cũng ko hiểu lắm, fen cho mình ví dụ dc ko , hỏi thật á.

jiangcristian · Jun 1, 2023

thedino said:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager

# Set Chrome options for running in headless mode
options = Options()
options.headless = True

# Initialize Chrome WebDriver
driver = webdriver.Chrome(ChromeDriverManager().install(), options=options)

url = 'https://capaiankinerja.presidenri.go.id/arsip/lima-tahun-maju-bersama'

params = {'behavior': 'allow', 'downloadPath': '[folder path]'}
driver.execute_cdp_cmd('Page.setDownloadBehavior', params)

driver.get(url)

web no điều hướng them cai ten file pdf nữa đó. Muốn nhanh thím tìm cách lấy ten file đó ra rồi chay cmd như dưới là dc

wget https://capaiankinerja.presidenri.g...019_Laporan-5-Tahun-Jokowi-JK_small-1--1-.pdf

Cảm ơn thím đã tâm huyết, nhưng mà task này của em là chuyển từ selenium sang dùng request do selenium của dự án em nó hay bị lỗi k khởi tạo được.
Em cũng tìm được cách giải quyết rồi ạ :sweet_kiss:

ColdBailey · Jun 1, 2023

jiangcristian said:
Em đang làm 1 cái hàm download file mà gặp cái url này không biết nên xử lý ntn, bình thường mọi khi gọi response = requests.get(url) là nó sẽ trả về file giống như mình tải file trên trình duyệt, đôi lúc bị chặn thì thêm 1 vài params vào trong hàm get đó là ok mà với cái url này

https://capaiankinerja.presidenri.go.id/arsip/lima-tahun-maju-bersama

em dùng trình duyệt vẫn tải về bt mà dùng requests để tải thì nó lại trả ra 1 trang html, status code vẫn 200 nhưng k có nội dung gì cả, các thím nhiều kinh nghiệm cho em biết tại sao với ạ

Thêm cái user-agent vào là xong

Code python

import requests
url = "https://capaiankinerja.presidenri.go.id/arsip/lima-tahun-maju-bersama"
payload = {}
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36',
'Cookie': 'ci_session=75ad162f8132f8e4b3dcbdd312be52a80cdb6bbf; presiden_c_key=c167b9b366c797791426052b201a9746'
}
response = requests.request("GET", url, headers=headers, data=payload)
print(response.text)

caothuphu2013 · Jun 3, 2023

SENIOR PYTHON - Min 2000$
Làm ở Viettel Building Q10

You have 4+ years of experience in backend development using Python and have worked confidently within teams delivering data driven systems.
You have knowledge of scaling practices such as in-memory databases, load balancing, caching, etc.
You’re comfortable with both NoSQL and SQL databases.
You have 3+ years experience in deploying systems on GCP or AWS and knowledge of containerization (e.g. Docker)
You can give and take in discussions and make tactical decisions to reduce risk when delivering functionality, which is the primary measure of progress.
You’re a team player who is solutions oriented.
Experience in leading a team or mentoring younger developers will be highly regarded
You have deep exposure to ETL and ELT.
Having wide exposure to multiple technology stacks is an advantage, we love full-stack engineers.
You’re proficient in code review, code refactoring, Unit Testing.
Backend security knowledge is a plus
Good communication in English is a plus.

dhuyan · Jun 3, 2023

Ở đây có bác nào có bộ sách nào nên đọc nhất về python không các bác, trình độ python của em chỉ mới tới mức là nhập xuất dữ liệu ra file thôi các bác ạ

BanhXe0_ · Jun 6, 2023

caothuphu2013 said:
SENIOR PYTHON - Min 2000$
Làm ở Viettel Building Q10

You have 4+ years of experience in backend development using Python and have worked confidently within teams delivering data driven systems.

You have knowledge of scaling practices such as in-memory databases, load balancing, caching, etc.

You’re comfortable with both NoSQL and SQL databases.

You have 3+ years experience in deploying systems on GCP or AWS and knowledge of containerization (e.g. Docker)

You can give and take in discussions and make tactical decisions to reduce risk when delivering functionality, which is the primary measure of progress.

You’re a team player who is solutions oriented.

Experience in leading a team or mentoring younger developers will be highly regarded

You have deep exposure to ETL and ELT.

Having wide exposure to multiple technology stacks is an advantage, we love full-stack engineers.

You’re proficient in code review, code refactoring, Unit Testing.

Backend security knowledge is a plus

Good communication in English is a plus.

ETL , ELT ? làm pyspark à ?

thảo luận [Python] Thread dành cho anh em Python

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Đã tốn tiền

Đã tốn tiền

Senior Member

Đã tốn tiền

Senior Member

Senior Member

Đã tốn tiền

Similar threads

Share this page