Write data from Excel to SQL Server –no need to create table

I have moved many tables from excel to SQL server. But, most of time, it is done by insert and it is done after the table has been created in SQL server. I did not know that there is an easier way.

On the other hand, I have done more data from our Hadoop platform to sql server many times without creating table. One day last week, when I needed to more data from excel file again, I had two choices: 1. use Hadoop and PySpark to read the excel file and then write to SQL server; 2. create table in SQL Server and inserted data. Then, I asked myself, “could you find the way to simply use python?” I tried and as always, with help of my best friend GOOGLE, I found the way.

import pandas as pd

df = pd.read_excel(“path/filename.xlsx”, sheet_name = “Sheet1”)


from sqlalchemy import create_engine

server = ‘server\name’

database = ‘database_nm’

sqlcon = create_engine(‘mssql+pyodbc://@ ‘ + server + ‘/’ + ‘?trusted_connection=yes&driver=ODBC+Driver+13+for+SQL+Server’)

df.to_sql(‘table_name_in_SQL_server’, con = sqlcon, schema = ‘schema_name’, if_exists = ‘append’

done. that is it.

Since I mentioned PySpark, it is convenient to write PySpark here too.

from pyspark import SparkConf

from pyspark.sql import SparkSession

conf = SparkConf()

conf.set(‘spark.yarn.queue’, ‘***’).set(“spark.yarn.executor-memory”, “10G”).set(“spark.executor.memory”, “10G”.\

set”spark.yarn.driver-memory”, “10G”).set(“spark.driver.memory”,”10G”).set(“spark.debug.maxToStringFields”, “120”).\

set(“spark.yarn.executor.memoryOverhead”,”10240″).set(“spark.port.maxRetries”, “50”).set(“spark.network.timeout”, “800”).\

.set(‘spark.sql.parquet.writeLegacyFormat”, “true”).set(“spark.driver.extraJavaOptions”, “-XX:MaxHeapSize=21474836480”).set(“spark.driver.extraJavaOptions”,”-XX:ConcGCThreads=90)

spark = SparkSession.builder.master(‘yarn’).appName(‘name_that_I_like’).config(conf=conf).enableHiveSupport().getOrCreate()

msSqlServerdriver = ‘com.microsoft.sqlserver.jdbc.SQLServerDriver’

msSqlServer_prod_url = ‘jdbc:sqlserver://server_name;databaseName = database_name;user=USerName;password=password’

sql_command = ”’ ”’

#read from sql server

dataframe = spark.read.format(‘jdbc’).option(‘url’,msSqlServer_prod_url).option(‘dbtable’, sql_command).option(‘driver’, msSqlServerdriver).load().cache()

##many lines to get to result_df omitted here

#write to sql server

result_df.write.format(‘jdbc’).option(‘url’,msSqlServer_prod_url).option(‘dbtable’, ‘schema_name.table_name’).option(‘driver’, msSqlServerdriver).save()

I know that I could have special blocks for code. but I can’t recall details now. I will read my own old blog posts to get the piece back. it is so good to write what I have done.

I want to see GitHub post here for more efficient code formatting.


Leave a Reply

Your email address will not be published. Required fields are marked *