Python again

{ “cells”: [ { “cell_type”: “code”, “execution_count”: null, “metadata”: {}, “outputs”: [], “source”: [ “\”C:\\Users\\jingh\\Documents\\resultFile.xlsx\”\n”, “\n” ] }, { “cell_type”: “code”, “execution_count”: 5, “metadata”: {}, “outputs”: [ { “name”: “stdout”, “output_type”: “stream”, “text”: [ “Index([‘addresses’, ‘account’], dtype=’object’)\n” ] } ], “source”: [ “import pandas as pd\n”, “\n”, “df = pd.read_excel(‘C:/Users/jingh/Documents/resultFile.xlsx’, sheet_name = ‘Sheet1’)\n”, “\n”, “print(df.columns)\n”, “\n”, “from sqlalchemy import create_engine\n”, “\n”, “server = ‘LAPTOP-S1V0PRKL’\n”, “\n”, “database = ‘pythontest’\n”, “\n”, “sqlcon = create_engine(‘mssql+pyodbc://@ ‘ + server + ‘/’ + database + ‘?trusted_connection=yes&driver=ODBC+Driver+17+for+SQL+Server’)\n”, “\n”, “df.to_sql(‘table_test20200207’, con = sqlcon, schema = ‘schema_test’, if_exists = ‘append’)” ] }, { “cell_type”: “code”, “execution_count”: null, “metadata”: {}, “outputs”: [], “source”: [] } ], “metadata”: { “kernelspec”: { “display_name”: “Python 3”, “language”: “python”, “name”: “python3” }, “language_info”: { “codemirror_mode”: { “name”: “ipython”, “version”: 3 }, “file_extension”: “.py”, “mimetype”: “text/x-python”, “name”: “python”, “nbconvert_exporter”: “python”, “pygments_lexer”: “ipython3”, “version”: “3.7.3” } }, “nbformat”: 4, “nbformat_minor”: 2 }

<a href=""></a>

&lt;!-- /wp:paragraph --&gt;

&lt;!-- wp:preformatted --&gt;
&lt;pre class="wp-block-preformatted"&gt;&lt;strong&gt;import&lt;/strong&gt; &lt;strong&gt;pandas&lt;/strong&gt; &lt;strong&gt;as&lt;/strong&gt; &lt;strong&gt;pd&lt;/strong&gt;

df = pd.read_excel('C:/Users/jingh/Documents/resultFile.xlsx', sheet_name = 'Sheet1')


&lt;strong&gt;from&lt;/strong&gt; &lt;strong&gt;sqlalchemy&lt;/strong&gt; &lt;strong&gt;import&lt;/strong&gt; create_engine

server = 'LAPTOP-S1V0PRKL'

database = 'pythontest'

sqlcon = create_engine('mssql+pyodbc://@ ' + server + '/' + database + '?trusted_connection=yes&amp;amp;driver=ODBC+Driver+17+for+SQL+Server')

df.to_sql('table_test20200207', con = sqlcon, schema = 'schema_test', if_exists = 'append')&lt;/pre&gt;
&lt;!-- /wp:preformatted --&gt;

&lt;!-- wp:paragraph --&gt;

import pandas as pd

df = pd.read_excel('C:/Users/jingh/Documents/resultFile.xlsx', sheet_name = 'Sheet1')


from sqlalchemy import create_engine

server = 'LAPTOP-S1V0PRKL'

database = 'pythontest'

sqlcon = create_engine('mssql+pyodbc://@ ' + server + '/' + database + '?trusted_connection=yes&driver=ODBC+Driver+17+for+SQL+Server')

df.to_sql('table_test20200207', con = sqlcon, schema = 'schema_test', if_exists = 'append')

now let’s see what I got

Write data from Excel to SQL Server –no need to create table

I have moved many tables from excel to SQL server. But, most of time, it is done by insert and it is done after the table has been created in SQL server. I did not know that there is an easier way.

On the other hand, I have done more data from our Hadoop platform to sql server many times without creating table. One day last week, when I needed to more data from excel file again, I had two choices: 1. use Hadoop and PySpark to read the excel file and then write to SQL server; 2. create table in SQL Server and inserted data. Then, I asked myself, “could you find the way to simply use python?” I tried and as always, with help of my best friend GOOGLE, I found the way.

import pandas as pd

df = pd.read_excel(“path/filename.xlsx”, sheet_name = “Sheet1”)


from sqlalchemy import create_engine

server = ‘server\name’

database = ‘database_nm’

sqlcon = create_engine(‘mssql+pyodbc://@ ‘ + server + ‘/’ + ‘?trusted_connection=yes&driver=ODBC+Driver+13+for+SQL+Server’)

df.to_sql(‘table_name_in_SQL_server’, con = sqlcon, schema = ‘schema_name’, if_exists = ‘append’

done. that is it.

Since I mentioned PySpark, it is convenient to write PySpark here too.

from pyspark import SparkConf

from pyspark.sql import SparkSession

conf = SparkConf()

conf.set(‘spark.yarn.queue’, ‘***’).set(“spark.yarn.executor-memory”, “10G”).set(“spark.executor.memory”, “10G”.\

set”spark.yarn.driver-memory”, “10G”).set(“spark.driver.memory”,”10G”).set(“spark.debug.maxToStringFields”, “120”).\

set(“spark.yarn.executor.memoryOverhead”,”10240″).set(“spark.port.maxRetries”, “50”).set(“”, “800”).\

.set(‘spark.sql.parquet.writeLegacyFormat”, “true”).set(“spark.driver.extraJavaOptions”, “-XX:MaxHeapSize=21474836480”).set(“spark.driver.extraJavaOptions”,”-XX:ConcGCThreads=90)

spark = SparkSession.builder.master(‘yarn’).appName(‘name_that_I_like’).config(conf=conf).enableHiveSupport().getOrCreate()

msSqlServerdriver = ‘’

msSqlServer_prod_url = ‘jdbc:sqlserver://server_name;databaseName = database_name;user=USerName;password=password’

sql_command = ”’ ”’

#read from sql server

dataframe =‘jdbc’).option(‘url’,msSqlServer_prod_url).option(‘dbtable’, sql_command).option(‘driver’, msSqlServerdriver).load().cache()

##many lines to get to result_df omitted here

#write to sql server

result_df.write.format(‘jdbc’).option(‘url’,msSqlServer_prod_url).option(‘dbtable’, ‘schema_name.table_name’).option(‘driver’, msSqlServerdriver).save()

I know that I could have special blocks for code. but I can’t recall details now. I will read my own old blog posts to get the piece back. it is so good to write what I have done.

I want to see GitHub post here for more efficient code formatting.

Pick up blogging again

Time really flies when I am having fun. I can’t believe that it has been more than one years since my last post. I had a lot of fun during this period of time: implementing python and PySpark in my daily work, picking up data science codes, learning graph, giving presentation at local SQL server user group meeting… … As a result, I haven’t get anything done with this blog.

Now, I have many things need to write here. I want to organize useful python code used in sql server ETL. Also, I want to record my Azure learning as I am trying to get an Azure certificate soon. I also want to learn and blog more about data science services in SQL server. Let the fun begin!

Trick SQL Server to use the best Execution Plan

What is the two most important words that one should look for in SQL Server Execution plan?  Scan ! Seek!

SCAN a big table is a disaster. The rescue is: try your best to let SQL  Server SEEK a big table.

(William Wolf taught me this many many years ago at a SQL Saturday meeting. This knowledge is so powerful and it makes one of my query work so good that I want to share with you.)

Continue reading “Trick SQL Server to use the best Execution Plan”

Look at me! I made the fire!

In the movie Cast Away, the character (by Tom Hanks) danced and jumped around a bonfire on beach and shouted many times: “look at me! I made the fire!”, even though he knew that nobody could look at him at that isolated, far away island. He simply could not stop celebrating his exciting achievement. That is exactly same feeling on a late Friday afternoon, after I figure out how to solve two of my problems at same time. Those two problems are: first. let SSRS pass a parameter to a query in DB2, and second, that parameter could be one or several account_ID.

Continue reading “Look at me! I made the fire!”

Scheduling SQL Job with SSRS

Imagine this: you are  at an technical interview of a SQL developer position. The interviewer asks: tell me someway to schedule an SQL server job without using SQL Server agent. If I am at this kind of situation, I will be surprised by the question.  I probably will miss the question and only focus on find out the reason not to use SQL Server Agent.

The fun thing is: I am going to share our unconventional ways to schedule SQL jobs (so that you won’t be surprised when you are asked about it)

Continue reading “Scheduling SQL Job with SSRS”

Dynamic! Dynamic!

Dynamic is one of my favorite search words. It is the opposite of “static”. A lot of time, SQL statements requires static input, such as column name and table name in SELECT.  However, Dynamic SQL gives a way to overcome this requirement.  This post I will share with you other two usages of ‘dynamic’.

Continue reading “Dynamic! Dynamic!”